Generative Adversarial Networks – when AI gets creative
Since Frank Rosenblatt introduced the Perceptron in 1958, neural networks have significantly evolved and taken the world by storm. Their ability to model complex, non-linear relationships that exist in data, led to novel neural network architectures, able to outperform humans in various challenging tasks like face recognition, disease prognosis and playing video games.
However, even though computers are currently very good at executing deterministic tasks, they have been lacking one key element that defines human intelligence - the element of creativity. A major step to overcoming this was made in 2014, when Ian Goodfellow came up with the idea of Generative Adversarial Networks (GANs), while drinking with his friends in a pub in downtown Montreal. He then coded the first working version of the algorithm that same night, providing us with what Yann Lecun has described as “the most interesting idea in the last 10 years in ML”, as well as with another proof of the validity of the Ballmer Peak theory.
A short intro to GANs
The main concept behind GANs is that of an adversarial game between two agents which, in the vanilla version of the algorithm, are two fully-connected feed forward neural networks being trained to perform two competing tasks. The first network, called the Generator, is being trained to generate fake, but realistic-looking data, similar to those of an original dataset. The other network, called the Discriminator, learns how to distinguish between fake data produced by the Generator and real data coming from the original dataset.
The two networks are being trained in parallel. In each iteration of the algorithm the Discriminator is tasked to classify data samples produced by the Generator. If the Discriminator gets fooled by misclasifying these samples as real, it receives negative feedback that forces it to improve by adapting its weights accordingly. In the next iteration, the improved Discriminator will likely be able to correctly classify the Generator's samples as fake. This will trigger a negative feedback to the Generator that will in turn try to adapt its weights to produce more realistic data samples.
Those who have studied zero-sum game theory have probably realised by now that this training framework corresponds to a minimax two-player game and as such, it can be proven that it has a Nash Equilibrium - a state where neither one of the networks can further improve its performance by making any changes to its parameters. In our case, this is achieved when the data samples produced by the Generator are so indistinguishable from the real ones that the Discriminator, even though properly trained, has no confidence in its decision of whether these samples are real or not. This is the ultimate goal of adversarial training.
Using GANs to create synthetic food images
What makes GANs even more versatile, is the fact that we are not constrained by the need to provide labelled data when training them. One can simply pick a random unlabelled dataset and the Generator will eventually learn how to model the various distributions in the dataset and produce new realistic samples. Interesting applications of GANs include anime character generation as well as music composition. So, I decided to take advantage of this level of freedom and experiment with teaching a GAN something interesting – how to create images of food.
Initially, I put together a dataset of 1000 coloured 64x64 images of food to be used as my training data. The size of the dataset is relatively small, but proved good enough to lead to some interesting results. Finding the right parameters and design for my GAN was a very challenging task, as even minor architectural changes can lead to one of the two networks overpowering the other, leading to failure mode. The architecture I ended up using is that of a Deep Convolutional GAN (DCGAN), which due to its translation invariance and parameter sharing properties, yielded meaningful images without requiring too much training (~1.5 hours on an NVIDIA Titan X). You can find the code with the full network architecture on this github repository.
So, without further delay, let’s move to the main course (pun intended) . The images presented below are some examples of the Generator’s output. Keep in mind that these are not the results of combining or modifying existing food images. The Generator has learned frequently occurring motifs in real food images and is now using these motifs to generate new images that never existed before.
Evidently, the network has learned the entity of the plate and the fact that it can come in different colours. Also it has picked up the fact that pictures of food can be taken from various different angles and distances. Regarding the actual food, even though fine details are not particularly clear, one can observe the existence of different textures and colours we usually see in everyday food. Using a little bit of imagination, you could probably spot some pasta, a piece of roast ham and a green leaf salad. Maybe even a beef stake tartare.
A sample of the real images used for training is presented below for reference.
Epilogue
If the synthetic food presented above doesn't look particularly appealing/convincing to you, I suggest having a look at some other examples of awesome GAN applications as well as some seminal papers (you can find a nice collection here).
GANs have been proven to be the state-of-the-art generative approach for a plethora of scenarios and despite the fact that they can sometimes be more of an art rather than a science, considering how notoriously unstable their training is, they are here to stay. Given the amount of attention GANs are currently getting within the machine learning research community, I believe that it won't be long until they reach maturity and start being widely embedded in real-life applications.