Neurales and Neuroevolution
Neuroevolution is the application of Genetic Algorithms (GAs) to the creation of neural networks. For supervised learning, suppose we are given a dataset. We start off with a population of random models with random hyperparameters + training initializations. Using a GA with an “elitist” selection approach, each generation, we train each model or “chromosome” and the best performing ones are deemed the most “fit”. These fit models carry over to the next generation and the least fit models are replaced with new random models. This process continues each generation until we achieve the desired fitness or the maximum number of generations is reached.
Neuroevolution, depending on how it is applied, can partially or completely remove the need for an engineer to design and train models. When engineers select a model with hyperparameters, they are simply taking a possible configuration out of millions of possible configurations and testing it out on a problem. If this model is highly performant, and is in the search space of the GA, the GA could have simply “found” this model with time. What makes GAs so helpful, is that they completely remove the human bias towards a certain region of the search space. To make this less abstract, say we wanted to design a model for MNIST. Most engineers would look to make a CNN, likely with batch normalization layers and ReLU activations, culminating in a softmax or log-softmax layer at the end. We can achieve superior performance using “non-standard” architectures (with fewer layers) that an ML engineer may not have thought to try. Below is a highly performant chromosome on MNIST.
Notice that we only needed two layers for 98.7% test accuracy, and a 0.987 F1 score (meaning test-accuracy is highly accurate across all classes. Second, notice that we use an instance normalization layer instead of a batch normalization layer after the first convolution. This type of norming layer is not as popular in image classification with CNNs. Third, new also use a 5 x 5 kernel in the second layer rather than the more common 3 x 3 kernel size (which is used in the first layer). Switching focus to our optimizer, the chromosome used Adam with a learning rate of 0.001, but notice that the momentum “gene” is not expressed here because the Adam optimizer doesn’t use momentum. None of this required any effort on the user’s part. All we need to do is specify which metric we want to optimize (precision, F1, recall, or test accuracy) on MNIST and let neuroevolution handle the rest. Users can also select between using an elitist selection approach or a crossover-mutation approach in the beta, with more sophisticated evolutionary strategies in the full release.
Neuroevolution has a lot more potential than just discovering highly performant non-standard architectures for image classification on a toy dataset. Suppose we had a dataset with four classes and only 50 samples. Following the standard approach of training a CNN by minimizing cross-entropy loss, we won’t get high accuracy because deep learning models require at least an order of magnitude (hundreds) of examples to have hope of learning well via this approach. Despite this, we are able to get 9/16 correct on the test set (or 56.25% accuracy), which is likely the highest test accuracy we could hope for under these constraints.
Let’s look at a small dataset that comes in the neurales package with ants, bees, butterflies and moths.
Notice that we were at least able to find a model that overfit to the training data, and had reasonably high generalization to the test data for such a tiny dataset.
Neuroevolution is not limited to evolving CNNs for image classification. We can generalize neuroevolution to any deep learning use case. Sticking to computer vision for now, we could replace the classic CNN training with few-shot learners. The architectures and loss function for few-shot learning is specifically set up to handle image classification on smaller datasets. With neuroevolution, we can optimize the model architecture, the number of epochs, the batch size and any hyperparameter an engineer would normally have to manually fine-tune.
Extending beyond image classification, we can extend to many other areas of computer vision. We can evolve object detection models, segmentation tasks, autoencoders, and even GANs. GANs might be the most exciting subdomain to explore because training GANs manually is a black art. Neuroevolution can “simply” evolve performant GAN models.
In Neurales beta, the user has the option to evolve CNNs for image classification using the standard cross-entropy loss on pre-loaded datasets (MNIST, FashionMNIST, CIFAR10, CIFAR100, and ImageNet) or their own custom datasets. They also have the ability to evolve GANs on the same group of pre-loaded datasets or their own custom ones. What makes Neurales stands out is how we set up our GAs. Unlike many GA packages, Neurales doesn’t evolve just the models. In the beta release, Neuroevolution evolves the models PLUS optimizers, because two optimizers running on the exact same model architecture can have significantly varied performance. Neurales beta treats the model hyperparameters and optimizer parameters as one individual.
Using the GA vocabulary, the “genome” of each chromosome includes activation function per layer, the number of channels/filters (from the model hyperparameters) along with optimizer choice (SGD, Adam, or RMSprop in the beta release), learning rate and momentum associated with each.
Neurales also extends neuroevolution to traditional ML models. In the beta, users will be able to evolve SVMs and Gradient Boosters on tabular .csv data. User also are able to see a sneak peek of the neuroevolution tools that will be present in the full release. With these traditional models, the GA evolves pipelines which includes feature extraction + scaling in addition to the model hyperparameters. Data preprocessing is a frequently overlooked step in model optimization. In the full release, there will be computer vision pipelines, NLP pipelines, and even RL pipelines to evolve several, rather than just one, component of the training process.
Despite the clear benefits of neuroevolution, the evolution process is traditionally notoriously time intensives as there are four for loops that we need to iterate over. First, we iterate over each batch of training data, then we iterate over each epoch we train the model, then we loop over the number of chromosomes in each generation, and finally, we loop over all the generations. Since each loop is required, unless we eliminate batching by training a model on all the data at once like we do in the traditional models, efforts to reduce run time are focused on making each loop as short as possible.
There are a multitude of strategies we could employ. A simple one is hardware acceleration. Users who are in the higher tiers have access to GPU training of each model, which can easily net a 10x or greater training speedup. Other strategies are more subtle, and are specialized for the ML task we are interested in. In the case of GANs, we have to choose between a discriminator and generator architecture, an optimizer (w/parameters for each), the noise distribution the generator input is sampled from (again with parameters) batch size, epochs, and even the loss function itself. To reduce the search space, we could split the problem into two different GAs running in parallel, with one designed to efficiently optimize one aspect of training (model hyperparameters) and another GA to optimize a second aspect of training (ex. Data augmentation options). In this case, the fitness function could be weighted sum of objectives we want to optimize.
Using GAs in parallel allow for constrained optimization where we may only want to train models for a maximum of 5 epochs and use a maximum number of layers. If we restrict our search space, we still want to find highly performant models quickly.
As an example, the full release of Neurales has identified a search space which can evolve GANs to produce high quality images (as measured my metrics like FID, PSNR and others). This is achieved by modifying the fitness function to incorporate stable training dynamics for GANs, so that GANs which exhibit partial or total mode collapse, or produce a strong discriminator at the expense of a poor generator are removed from the population. This is exciting because both the academic researcher and practitioner will now have a set of tools that they can use to achieve stable GAN training which leads to high quality images. Neurales will offer proprietary solutions for accelerated training of GAs for all the ML tasks we offer in the beta and more.
Sign up at www.neurales.ai
Follow us at