We are wondering how the Covid-19 vaccines are developed so quickly. We are also being skeptical about its effectiveness. Just to have our understanding in order, let’s discuss how Information Technology collaborated with Science to make it happen. It is not a phenomenon of a day. It is the outcome of long sacrifices of scientists, industry professionals along with government and private investments.
Vaccine Development Process
Generally, the vaccine development process is of three types:
- Traditional: Traditional vaccines (development process is shown in the above diagram) work through exposing the body to the epitope originated from the pathogen. These are the vaccines for diseases like measles, mumps, rubella, seasonal influenza virus, tetanus, polio, Hepatitis B, cervical cancer, diphtheria, pertussis, etc.
- mRNA-based: mRNA vaccines contain the strands of mRNA that act as an instruction/information to the cell, which in turn generates the antibody increasing the immunity.
- DNA-based: DNA vaccine that induces an adaptive immune response, uses a DNA plasmid that encodes for a protein originally from the pathogen.
Pros and Cons mRNA vaccine:
Pros:
- mRNA Strand never enters the cell’s nucleus, so never affects genetic behavior of human.
- Quicker to develop as every vaccine would use the same lipid nanoparticle carrier, the only thing that would change for different viruses is the nucleotides.
Cons:
- DNA vaccine is more stable. mRNA vaccine needs to be stored at below minus 20 degrees Celsius.
Contributions of Computer Scientists and Professionals
Let’s discuss more specifically the contributions of each area of Information Technologies.
Vaccine Development – How Cloud Computing helps
Genomics research generates huge volumes of data. Those data also need to be shared across different research centers across the globe. Cloud Computing provides the storage requirement of this huge data. Moreover, as Cloud Infrastructure is accessible from anywhere in the world over the Internet, share problem gets resolved.
Another interesting development happened that expedited the whole process enables the scientists to their research and computations in the Cloud Infrastructure itself. So, the need to download the data, do the computing and research and upload it to Cloud for sharing is eliminated. Now, Cloud enables the researchers of the world to collaborate staying close to Cloud-hosted data and doing the computation/research over the Cloud. Scientists only need to do their laboratory work in the physical labs.
Nowadays, Computer scientists and professionals are able to provide a Vaccine Research platform. A Vaccine Research platform can provide a plug-and-play environment for scientists where the Type of Virus will act as input the Desired Antibody will be the output of the platform, other infrastructures are remaining more or less the same. As mentioned earlier, mRNA vaccine would use the same lipid nanoparticle carrier, the only thing that would change for different viruses is the nucleotides.
Seven Bridges, DNAstack, DNAnexus, and so on are the prominent platform providers.
Vaccine Development – How Big Data Technologies help
We discussed the huge data out of Genomics research and storage in Cloud Infrastructure. But, Big Data Technologies come into practice for ingestion of the huge data, cleaning of not-so-good data, standardizing/analyzing the data for analysis with the help of emerging computer technologies.
Let’s try to make sense out of the standard vaccine development process why Big Data Technologies are so important.
The Central Dogma of Molecular Biology provides the foundational framework for the flow of genetic information from DNA Sequencing to Protein. mRNA vaccines deliver coated RNA of the pathogen to human body cells that encode harmless fragments of a viral protein, after that the human immune system develops an immune response. Then human body gets ready to fight the virus.
This whole life cycle of vaccine development starts with DNA Sequencing as explained below.
The latest DNA Sequencing process like High-throughput/Next Generation DNA Sequencing Process generates the string with positional sequence encoded through mechanisms like Concise Idiosyncratic Gapped Alignment Report (CIGAR). The purpose is to indicate which bases align (match/mismatch) with the reference, which bases are deleted from the reference, and which bases are insertions that are not in the reference.
This process generates huge data. But, the representation as a sequence creates the possibility of utilizing specific techniques of Artificial Intelligence in predicting the target molecule.
Vaccine Development – How Machine Learning and Deep Learning help
The value of Machine Learning and Deep Learning is that they can analyze large data sets and find hidden patterns in structure or sequence and predict the expected structure or pattern or traits of the target molecule with high accuracy. The fundamental premise of using these Artificial Intelligence Techniques is that human life, behavior, and the like are governed by the constituent molecules and there is a relationship between nature/structure of the molecules and the behavior of human beings. Classical machine learning methods are replaced with Deep Learning methods to take care of the nonlinear dependencies in the sequence and interaction effects operating at the genomic scale.
Convolutional Neural Networks (CNN), a popular mechanism used for Computer Vision, are being used for multi-dimensional image-based genomic data. Recurrent Neural Networks (RNN), a popular mechanism used for Natural Language Processing, are able to analyze the dependencies of sequential data generated out of DNA Sequencing. There is a possibility to combine both techniques where CNN encodes the image and RNN can generate image description. These AI techniques are able to predict the target molecules of vaccines with high accuracy.
Generative Adversarial Network-based technologies can completely change the way proteins for vaccines are generated through the mechanism called Generative Modeling. Also, it has the capability to predict the course of the mutation of bacteria or virus. Protein Engineers started using these techniques to design the structure of the molecule of mutated virus or bacteria instead of waiting for the virus or bacteria to change its pathogenicity through random mutation or natural selection. Deep autoregressive models that are supervised learning-based feed-forward sequence models (i.e. not recurrent) have huge potential to be used to predict the target molecule for the vaccine, even taking care of the vaccines for the mutated virus or bacteria.