Student Name—-Muhammad Ramzan
VU ID mc160402117
and use of tools in it.
The scientific knowledge has multiplied very rapidly and is shared
in a vast manner as was never been in the past. The scientific knowledge was
divided into many disciplines due to its vastness but with new discoveries has
increased this knowledge so much that these disciplines are coming closer and closer
in such a way that new fields are emerging. This vast scientific is needed
again and again and is retrieved and analyzed for obtaining data.
Bioinformatics is the one of such disciplines of science that provides the
facility of retrieval and analysis of biological data to carry out further
investigations in order to get more biological information. This branch of
science helps the biologists to get important information from already
preserved biological data saved on different web sites or computer programs
which are also called bioinformatics tools without any cost. The present review
provides a detailed summary of some of the tools used in bioinformatics available
for biologist for the retrieval and analysis of biological data. Particularly
this review focuses on those fields of life researches in which these
biological tools provide great information as analysis of DNA structure and
sequence, protein structure and sequence for the identification of different
characteristics as the finding of 3D structure of their molecules to find
molecular interactions. It also discusses about life phenomenon to get
important information from the already preserved data on various biological
Life sciences; Sequence analysis; Phylogeny; Structure prediction; Molecular
interaction; biological data, biomolecules, sequencing profiling
ADMET (absorption, distribution, metabolism, excretion and
BLAST (Basic Local Alignment Search Tool), I-TASSER
(Iterative threading assembly refinement), DNA (Deoxyribo Nucleic Acid), cDNA (complementary
DNA), ORF (Open Reading Frame), PDB (Protein Data
Bank), ExPASy (the Expert Protein Analysis System), SIB (Swiss Institute of
Bioinformatics), HMM (Hidden
Markov Model), CADD(Computer Aided Drug Design)
very important and useful interdisciplinary science of the present age. It is
derived from the combination of many other branches of science as biology,
computer science, mathematics, statistics etc. It is developed for the storage
of biological data, its retrieval and analysis 1.
The scientist who used the term “bioinformatics” for the first time in 1970 was
Paulien Hodgeweg who was a Dutch system-biologist. He used of the information
technology for the study of the life 2,3.
The introduction of this useful mechanized modeling and the production of SWISS
MODEL about 18 years ago 4 caused a great deal of progress of the bioinformatics.
Then to onward the field of bioinformatics has become an integral part of
biological informational data with a faster speed.
other computing tools are used for the determination of structure and
properties of proteins, genes and genetic analysis. It also helps in the study
of biomolecules and their interaction in the cell. Although the information
generated by these tool are not as reliable as obtained by experimentation. The
process of experimentation is costly and time consuming. However in sillico
analysis can still make it easy to reach a known decision for performing an
expensive experiment. For example a druggable material contains ADMET
(absorption, distribution, metabolism, excretion and toxicity) characteristics to
get through medical tests. If the a material does not contain necessary
ADMET’s, then most probably it is not accepted. In order to overcome such
deficit, several bioinformatics tools are being developed to find ADMET’
characteristics that permit the scientists and researchers to evaluate a large
quantity of compounds to find most druggable substance before starting of
medical tests 5.
Many reviews on particular conditions
of bioinformatics have written 6-8. But no one proves to be equitable for scientific
researcher who does not work according to computational biology. In this review
we are taking chance to introduce some tools of bioinformatics. In this review
only those tools are selected which are advantageous to get information from
biological data to a large extent. These tools include analysis of DNA and
protein sequences and structure including 3D structure of proteins,
phylogenetic studies as well as the interaction of different molecules in the
Gene Identification and Sequence Analysis:
Sequence analysis means to understand
various characteristics of a biomolecule as proteins or nucleic acid. For this
purpose first step is to retrieve relative sequences from public database.
After refining if necessary these are introduced to different tools that forecast
their features. These tools as BLAST (Basic Local Alignment Search Tool) 10,
help us to find gene and protein sequences find their evolutionary history and basis.
These tools use latest mathematical and statistical methods to analyze the
sequences. Some tools especially helpful in finding promoter regions (the
regions of genes which start transcription process) and terminator (that mark
the end of the gene, introns, exons. For this purpose
ORF (Open Reading Frame) is used. Mostly predictions rely on
complementary DNA (cDNA) and Expressed Sequence Tags (ESTs). However, the
cDNA/ESTs information is often limited and deficient, therefore makes the work
of finding new genes enormously difficult. Computational scientists have developed
another technique known as an ab initio geneidentification. The prospective of
this technique was established in a study, which was able to forecast 88% of
already confirmed exons and 90% of the coding nucleotides from Drosophila
melanogaster with very low rate of false-positive recognition 12.
Keeping in view the accuracy (~90%) delivered by this approach, it could be a trustworthy
tool for annotating lengthy genomic sequences and calculation of new genes
Following in the table are some tools used for
gene identification and sequence analysis of proteins in bioinformatics.
tool is used for DNA sequences, amino acid sequences and protein sequence
analysis. BLAST tool helps to find the order comparing with library or
database of sequences.
tool is used for homologous protein sequences. Its common use is to find homologous protein or nucleotide sequences, and to perform sequence
is the most recent form of Clustal alignment program. It is online and
command-line based. The distinctive trait of Clustal-omega is its
scalability, as thousands of medium to large sized sequences can be
associated at the same time. It will also make use of multiple processors,
where present. In addition, the quality of alignments is better to the
previous versions. The algorithm uses seeded guide trees and HMM
profile-profile progressive alignments.
It is used
for sequence profiling.
This tool is
used to find Open Reading Frame when an accepted gene sequence is subjected
to this tool. This tool is especially helpful in finding promoter regions
(the regions of genes which start transcription process) and terminator (that
mark the end of the gene, introns, exons,
A sector of
the UniProt data base containing the physically annotated protein sequences
A very popular site
for pairwise and multiple sequence alignment. . It runs in Windows,
Linux/Unix and Mac operating systems
GENSCAN is freely
accessible software used for “recognition of whole gene structures in
genomic DNA”. Genscan can be used “for predicting the locations and
exon-intron structures of genes in genomic sequences from a diversity of
Predicting Protein Structure and Function
beginning protein molecules have no shape of amino acid strings, which finally
fold to form a three-dimensional (3D) structure to become biologically active.
The folding of the protein into a correct way is a precondition for any protein
to perform its biological functions. Therefore, information of 3D structure of
a protein is essential to gain an impending into the function of a definite
protein. Frequently, 3D structures are found by X-ray crystallography or correlated
techniques. Though, these techniques are costly, difficult and intense and are
often vulnerable by the bad heterologous expression, and attempts to get good
crystals 14. Therefore, a few structures (~250) using XRD and NMR(Nuclear Magnetic Resonance) spectroscopy are submitted compared to nearly
a million monthly submissions to NCBI. Information of tertiary structures on
genome scale level for many proteins is consequently missing. Instead, a
protein’s 3D structure can be found using different bioinformatics tools, and as
a result has become important in the field of bioinformatics 14. i-TASSER:
It is a tool used for finding protein 3D structure. It can also describe the
functions of proteins that are based on sequence. This server gives 3D
structure of selected protein through numerous threading using templates from
PDB 16. One of the most important tools for
finding protein structure is ExPASy (the Expert Protein Analysis System powered
by the Swiss Institute of Bioinformatics (SIB). The Expasy tool also
provides many supplementary tools to determine resemblance, outline
recognition, and studying post-translational modifications 15.
Here are given in the table some important tool
used for finding protein structure in bioinformatics.
Iterative Threading ASSEmbly Refinementis
a bioinformatics tool for finding three-dimensional structure model of
protein molecules from amino acid sequences
Predicts 3D structure of protein
based on comparative modeling
The Expert Protein
Analysis System powered by the Swiss Institute of Bioinformatics (SIB). The
Expasy tool also provides many supplementary tools to determine resemblance,
outline recognition, and studying post-translational modifications
Protein Data Bank
another major resource of proteins containing information of
experimentally-determined structures of nucleic acids, proteins, and other
Discovery of drugs is a process by which
new drug molecules are discovered or designed to treat different diseases.
Before the arrival of bioinformatics tools, scientists used chemistry,
pharmacology and clinical and medical sciences to find out new compounds.
However, the conventional procedure is quite slow and costly as well.
Bioinformatics has greatly helped in this difficult process and is playing a
crucial role in advancing the process of drug discovery/designing. In fact, a totally new and devoted field
known as Computer Aided Drug Design (CADD) has come into reality to discover
new drug molecules 19. The whole
process of discovering and designing new drug molecules is quite complex and
difficult. The whole process can be divided into four steps: recognition of
drug aim and validation of target 20.
In this section, we will briefly discuss how bioinformatics is useful in
discovering new drugs.
. A number of databases have been
developed to make easy the look for of new drug targets. Here are given some
bioinformatics tools to describe drug designing.
This tool provides the information present in protein database
related to the drug and their latest based version targets
This is a collection of drug like molecules along with their 2-D
structure, calculated and abstracted properties such as; logP, molecular
mass, binding constants, pharmaco-kinetics etc.
Bioinformatics is a relatively new
discipline and is progressing very rapidly in the last few years. It has made
it promising to test our hypothesis practically and therefore allows to take a
better and an informed conclusion before initiation expensive experimentations.
Although, more and more tools for analyzing genomes, proteomes, predicting
structures, normal drug designing and molecular simulations are being
developed; none of them is ‘perfect’. Therefore, pursue for finding a better
method for solving the given problems will persist. One thing is clear that the
future research will be guided largely by the accessibility of databases, which
could be either generic or specific. It can also be assumed that developments in
the field of bioinformatics and bioinformatics tools and software packages
would be able to give results that are more correct and thus more trustworthy
interpretations. prediction in the field of bioinformatics include its future that
it will contribute to practical perceptive of the human genome, leading to better
discovery of drug targets and individualized treatment. Thus, bioinformatics
and other scientific disciplines have to be improved for the benefit of human
(2004) Sequence and genome analysis. New York: Cold Spring.
B, Hogeweg P (1970) Bioinformatica:eenwerkconcept. Kameleon 1:28-9.
(2011) The roots of bioinformatics in theoretical biology.
PLoS Comput Biol 7: e1002021.
MC (1996) ProMod and Swiss-Model: Internet-based tools for
automated comparative protein modelling. Biochem Soc Trans 24: 274-279.
Dibyajyoti S, Bin ET, Swati P
S, Bin ET, Swati P (2013) Bioinformatics: The effects on the cost of drug discovery.
Galle Med J 18:44-50.
CA, Valencia A (2003) Early bioinformatics: the birth of a discipline–a
personal view. Bioinformatics 19: 2176-2190.
M, Molotja N, Pouris A (2009) Abibliometric study of
bioinformatics research in South Africa. Scientometrics 81:47-59.
CA (2012) Rise and demise of bioinformatics? Promise and
progress. PLoS Comput Biol 8: e1002487.
RC, Sayers EW (2003) Entrez: making use of its power. Brief Bioinform 4:
10 Altschul SF, Madden TL,
Schäffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 25: 3389-3402.
11. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
12. Salamov AA, Solovyev VV (2000) Ab initio gene
finding in Drosophila genomic DNA. Genome Res 10: 516-522.
13. Boeckmann B, Bairoch A, Apweiler R, Blatter
MC, Estreicher A, et al. (2003) The SWISS-PROT protein knowledgebase and its supplement
TrEMBL in 2003. Nucleic Acids Res 31: 365-370.
14. Huang T, He ZS, Cui WR,
Cai YD, Shi XH, et al. (2013) A sequence-based approach for predicting protein
disordered regions. Protein PeptLett 20: 243-248.
15. Gasteiger E, Gattiker
A, Hoogland C, Ivanyi I, Appel RD, et al. (2003) ExPASy: The proteomics server for in-depth protein
knowledge and analysis. Nucleic Acids Res 31: 3784-3788.
16. Roy A,
Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein
structure and function prediction. Nat Protoc 5: 725-738.
17. Eswar N, Webb B, Martin-Renom MA, Shen MY, Pieper U, et al.
(2006) Comparative protein structure modeling using
Modeller. CurrProtoc Bioinformatics.
18. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al.
(2000) The Protein Data Bank. Nucleic Acids Res 28:
19. Cordeiro MN, Speck-PlancheA (2012) Computer-aided drug design, synthesis and evaluation of
new anti-cancer drugs. Curr Top Med Chem 12: 2703-2704.
20. Katara P (2013) Role of bioinformatics and pharmacogenomics in
drug discovery and development process. Network Modeling Analysis in Health
Informatics and Bioinformatics 2:225-30.
21. Altschul SF, Gish W, Miller W,
Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:
22. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity
searching. Nucleic Acids Res 39: W29-37.
23. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, et al.
(2011) Fast, scalable generation of high-quality
protein multiple sequence alignments using Clustal Omega. MolSyst Biol 7: 539.
24. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, et al.
(2012) ChEMBL: a large-scale bioactivity database for
drug discovery. Nucleic Acids Res 40: D1100-1107.
25. Ganesan N,
Bennett NF, Velauthapillai M, Pattabiraman
N, Squier R, et al. (2005) Web-based interface facilitating sequence-to-structure
analysis of BLAST alignment reports. Biotechniques 39: 186, 188.