Bioinformatics is the use of current techniques in information technology and computer science to process and analyze biological data. It’s a part of the larger field of computational biology and is most noted for its use in the analysis of molecular biology.
A good example is the Human Genome Project, which sought to piece together a complete draft of the human genome and is regarded as the most well-known application of bioinformatics. But the field isn’t limited to just genome drafting. The integration and utilization of biological information databases, the prediction of DNA elements, evolutionary biology and phylogeny building, and protein structure prediction are all examples of bioinformatics in action.
Paulien Hogeweg originally coined the term “bioinformatics” to refer to the study of informatic processes and patterns often present in biotic systems. Early on, bioinformatics was primarily concerned with the accumulation, storage and retrieval of sequence data using computing networks. This work eventually led to many of the most common bioinformatics tools for sequence alignments and analysis, including the NCBI’s BLAST tools. The field progressed beyond just sequence data, as other forms of biological analysis could be approached using computational methods.
The first major area of research in the field of bioinformatics concerns the useful information that can be extracted from the sequence of an organism. Comparing and aligning two sequences of DNA can shed light on the evolutionary relationship of the two donor organisms. Algorithms can be designed to predict the location of genes, promotors, CpG islands and other genetic elements based solely on the DNA sequence. Sequence alignments over many organisms can lead to homology predictions, where similar coding regions can be used to predict the structure and function of the resulting proteins. Also, these same alignments can be used to build a variety of phylogenies for assessing evolutionary trends.
These same techniques can go beyond the DNA sequence and be used to study RNA or even peptide sequences. Homology alignments are far more effective for peptides than for DNA, because the direct function of a protein is more related to the specific string of peptides than the genetic code from which it originated.
Another major area of bioinformatics that is the analysis of micro-array data. Micro-arrays are wet-lab tools that are used to measure gene expression by providing a signal when particular mRNAs leading to protein synthesis are bound. The resulting data sets are grids of intensities that can be fed into computer algorithms that can extract meaningful data quickly and accurately. Similarly, bioinformatics techniques are used to analyze other sets of high-throughput data (like in DNA sequencers and mass spectrometers).
Bioinformatics can also be used to build meaningful systems models. Given a set of kinetics rates and biological reactions, one can create a model that reflects the data and can simulate a given biological system. The programs to model these biological systems can be as complex or simple as needed, given the adequate computational resources and enough relevant biological information. Predictions can then be made about how outside influences will affect the system as a whole.
Another quickly growing area of bioinformatics is in the area of genome annotation. As we come to understand the genetic code and discover new things about specific genes, cataloguing the findings becomes critical to helping future researchers. Huge databases and retrieval systems have appeared recently that allow users to find out all the available information about a gene, saving time and preventing repeat experiments by other researchers.