Generative Adversarial Networks – An Overview

What is GAN?

Generative Adversarial Networks if a Neural Network-based learning mechanism without needing heavily labeled training data. GAN achieves the learning with two competing Networks. Though it started with Supervised Learning, GAN is very effective mostly in Semi-supervised and Unsupervised modes.

Two competing Networks of GAN are named as Discriminator and Generator. GAN creates a mini-max two-person zero-sum game.

Generative – Significance of Generative in GAN is that the purpose of the algorithm is to generate new data based on the input of a similar kind of training dataset.

Adversarial – Significance of Adversarial in GAN is that the algorithm leverages game-like competition of two networks, Generator and Discriminator. Generator generates the fake data from the training set of real data and Discriminator tries to distinguish the generated fake data from the real data.

Networks – GAN models are based on Neural Networks.

Concept of High Dimensionality of Image Data

GAN has gained huge midshare within the academic as well as industry researchers because of its implicit modeling of high-dimensional data. Though we generally talk about 2D or 3D images, in practice, from AI/ML standpoint, images are high dimensional as an image is composed of lots of pixels. Every color image pixel consists of Red, Green and Blue characteristics of the pixel.

If we consider n images, each with dimension a pixels by b pixels, each pixel having three dimensional rgb color representation, mathematically the representation of the image data moves into high dimensional space.

How GAN works

 

(Reference of Image: Comparative Study on Generative Adversarial Networks by Saifuddin Hitawala https://arxiv.org/pdf/1801.04271.pdf)

Generative Adversarial Networks (GAN) seems like a two-player minimax game. Generator starts the process of generating the fake images with the initial data input (also known as noise vector). Generator tries to generate the fake images that the discriminator fails to identify, whereas discriminator’s goal is to properly identify which image is fake and which one is real.

Concept of Latent Space

In GAN architecture/process flow, from the dataset with real image samples, a fake image can be generated. As the image ideally is in high-dimensional space, changing any characteristic of that image needs to modify the image characteristics in high-dimensional space, and that is obviously more complex than changing it after dimensionality reduction. In latent space, the Image data is represented in compressed manner. In latent space, it is easier to understand and analyze the data nature/feature/characteristics of data. Tweaking image in latent space is relatively easier. GAN leverages this feature of latent space in its algorithmic representation.

Taxonomy of Generative Modeling

(Reference of Image: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow, https://arxiv.org/pdf/1701.00160.pdf)

Generative modeling is an important area of study to represent/study/analyze data in high-dimensional space. Generative modeling along with Reinforcement Learning is very important in several critical application areas.  Reinforcement Leaning is divided into two areas – Model-based and Model-free. Model-based Reinforcement Learning algorithms utilize the concept of Generative modeling. In Semi-supervised learning environment, Generative Modeling can generate missing data.

Generative modeling like GAN is an example of multimodal algorithm. It saves huge computational cost and increases accuracy. By definition, multi-modal optimization algorithm can find multiple global and local optima in only one execution.

(For details on Generative Modeling, please refer to: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow, https://arxiv.org/pdf/1701.00160.pdf)

Overview of Primary Variants of GAN with Learning Type

  • VANILLA GAN – Supervised
  • CGAN -Supervised
  • LAPGAN – Unsupervised
  • DCGAN – Unsupervised
  • AAE – Supervised, Semi-supervised and Unsupervised
  • GRAN – Supervised
  • INFOGAN – Unsupervised
  • BIGAN – Supervised and Unsupervised

Why is GAN so hyped?

GAN has huge advantages over existing generative algorithms like Boltzmann Machine, Autoencoder etc as GAN does not use Markov chains that require huge loads of computation.

“In high-dimensional spaces, Markov chains become less efficient. Markov chain approximation techniques have not scaled to problems like ImageNet generation. Moreover, even if Markov chain methods scaled well enough to be used for training, the use of a Markov chain to generate samples from a trained model is undesirable compared to single-step generation methods because the multi-step Markov chain approach has higher computational cost. GANs were designed to avoid using Markov chains for these reasons.”  (Reference: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow, https://arxiv.org/pdf/1701.00160.pdf)

They can generate samples in parallel. GAN does not need the probabilistic distribution of data to operate. GANs are empirically regarded as producing better samples than other methods.

Applications of GAN

Applications in Areas Dealing with Images 

  • Generation of High Quality Images – GAN algorithms like LAPGAN, SAPGAN, BigGAN, GAWWN, PPGN etc cater to this functional area.
  • Image Impainting – Image Impainting is the process of reconstructing the missing portions of the image for restoration, for example, SN-PatchGAN.
  • Super-restoration – Generation of high resolution image or video from low resolution image or video. SRGAN, ESRGAN etc fall in this application area.
  • Person Re-Identification – Identification of the person in different camera views. Important for security/surveillance/forensic applications. PTGAN, PN-GAN, IPGAN, SinGAN etc are a few examples of GAN that execute this functionality.
  • Object Detection – Finding instances of real-world objects, for example, face, building etc. Important for image retrieval, security, surveillance, driver assistance etc. Perpetual GAN etc are in this category.
  • Video Prediction and Generation – MoCoGAN is used for video generation.
  • Facial Attribute Manipulation – For modifying detailed characteristics of human face. CFGAN, SG-GAN, ModularGAN etc can contribute in this functional area.
  • Anime Character Generation – For Game Development and animation, CartoonGAN etc are used often.
  • Image to Image Translation – CycleGAN, SingleGAN, GAWWN, pix2pix, PAN (Perpetual Adversarial Network), ID-CGAN etc implement this functionality.
  • Face Aging – Look of the face with aging and future look prediction. Age-cGAN is used here.
  • Human Pose Estimation – CR-GAN, FD-GAN etc are used for this functionality.
  • De-Occlusion – Occlusion is defined as the effect of an object getting blocked by another 3D object from the view. De-Occlusion is the construction of the full image removing the blocking. DCGAN is used for this purpose.
  • Image Blending – Mixing of images. GP-GAN is used here.
  • DeNoising – IcGAN is an example of GANs that implement DeNoising functionality.
  • Text to Image – StackGAN, GAWWN etc are the examples of GAN that generate Image from Text.

Applications with Sequential Data

  • Speech – Variational Autoencoding Wasserstein GAN VAW-GAN serves this purpose.
  • Music – SeqGAN, ORGAN used GAN along with Reinforcement Learning are used to generate music.

Applications in Retail and Fashion

  • PixelDTGAN to take clothes from Celebrity Images that other customers can opt for. Disco GAN suggests new products that can well match with other products for a particulr customer.

Applications in Media and Entertainment

  • SRGAN generates high resolution images. This is an important functionality in Media and Entertainment space.
  • DeOldify GAN generates color movies from old black and white movies.

Applications In Autonomous Vehicle

  • DeepRoad is a driver assistance application developed using GAN.

Applications in Education 

  • For kid’s education, generating images from kid’s sketches using pix2pix will inspire the kids for drawing.

Applications in Telecom

  • WaveNet, GANSynth generates synthetic audio with improvement in customer service IVR applications.

Applications in Medicines

  • Health Data Generation – medGAN, Fila-SGAN
  • Medical Image Segmentation – SEGAN
  • Anomaly Detection – AnoGAN
  • Mammogram with Supervised GAN
  • Alzheimer’s Disease Diagnosis
  • Covid Vaccine predicting mutants
  • Molecule development in Oncology
  • In Drug Discovery – ChemGAN

Challenges of Generative Adversarial Networks

  • During training with GAN algorithm’s competing networks, the convergence of the algorithm to the optimizing equilibrium (Nash Equilibrium) is difficult to reach and can become unstable. Development of Wassertein GAN (WGAN) along with tuning parameters helps the convergence.
  • In applications that need multi-modality feature of GAN algorithm, mode collapse problem of GAN can be difficult to resolve causing lack of diversity of the outcome.

Conclusion

Generative Adversarial Networks found huge applications across industries due to its ability to learn highly non-linear relationships from latent space to data space and vice versa and its ability to get value out of huge storage of unstructured, unlabeled data. But, on the contrary, GAN fueled the prevalence of DeepFake, the menace of modern-day Internet. When the world will look for getting the fullest benefit out of GAN’s potential, it is very important to prohibit the misuse of this kind of disrupting algorithms like GAN.