Chat room

Create a Meebo Chat Room

Friday, August 27, 2010

Use of Bioperl in Bioinformatics

Bioperl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. As such, it does not include ready to use programs in the sense that many commercial packages and free web-based interfaces do (e.g. Entrez, SRS). On the other hand, Bioperl does provide reusable Perl modules that facilitate writing Perl scripts for sequence manipulation, accessing of databases using a range of data formats and execution and parsing of the results of various molecular biology programs including Blast, clustalw, TCoffee, genscan, ESTscan and HMMER. Consequently, bioperl enables developing scripts that can analyze large quantities of sequence data in ways that are typically difficult or impossible with web based systems.

Friday, August 20, 2010

Molecular Dynamics on Stapled Peptide - Computational Drug Design

What is Drug Design ?

endeavor. Drug discovery is mostly portrayed as a linear, consecutive process that starts with target and lead discovery, followed by lead optimization and pre-clinical in vitro and in vivo studies to determine if such compounds satisfy a number of pre-set criteria for initiating clinical development. For the pharmaceutical industry, the number of years to bring a drug from discovery to market is approximately 12-14 years and costing upto $1.2 - $1.4 billion dollars. Traditionally, drugs were discovered by synthesizing compounds in a time-consuming multi-step processes against a battery of in vivo biological screens and further investigating the promising candidates for their pharmacokinetic properties, metabolism and potential toxicity. Such a development process has resulted in high attrition rates with failures attributed to poor pharmacokinetics (39%), lack of efficacy (30%), animal toxicity (11%), adverse effects in humans (10%) and various commercial and miscellaneous factors. Today, the process of drug discovery has been revolutionized with the advent of genomics, proteomics, bioinformatics and efficient technologies like, combinatorial chemistry, high throughput screening (HTS), virtual screening, de novo design, in vitro, in silico ADMET screening and structure-based drug design.
What is in-silico Drug Design ?

In silico methods can help in identifying drug targets via bioinformatics tools.
They can also be used to analyze the target structures for possible binding/ active sites, generate candidate molecules, check for their drug likeness , dock these molecules with the target , rank them according to their binding affinites , further optimize the molecules to improve binding characteristics

The use of computers and computational methods permeates all aspects of drug discovery today and forms the core of structure-based drug design. High-performance computing, data management software and internet are facilitating the access of huge amount of data generated and transforming the massive complex biological data into workable knowledge in modern day drug discovery process. The use of complementary experimental and informatics techniques increases the chance of success in many stages of the discovery process, from the identification of novel targets and elucidation of their functions to the discovery and development of lead compounds with desired properties. Computational tools offer the advantage of delivering new drug candidates more quickly and at a lower cost. Major roles of computation in drug discovery are; (1) Virtual screening & de novo design, (2) in silico ADME/T prediction and (3) Advanced methods for determining protein-ligand binding
 
Why in-silico Drug Design is significant ?
As structures of more and more protein targets become available through crystallography, NMR and bioinformatics methods, there is an increasing demand for computational tools that can identify and analyze active sites and suggest potential drug molecules that can bind to these sites specifically. Also to combat life-threatening diseases such as AIDS, Tuberculosis, Malaria etc., a global push is essential. Millions for Viagra and pennies for the diseases of the poor is the current situation of investment in Pharma R&D. Time and cost required for designing a new drug are immense and at an unacceptable level. According to some estimates it costs about $880 million and 14 years of research to develop a new drug before it is introduced in the market Intervention of computers at some plausible steps is imperative to bring down the cost and time required in the drug discovery process

Drug Lead Optimization.

Drug Lead Optimization. When a promising lead candidate has been found in a drug discovery program, the next step (a very long and expensive step!) is to optimize the structure and properties of the potential drug. This usually involves a series of modifications to the primary structure (scaffold) and secondary structure (moieties) of the compound. This process can be enhanced using software tools that explore related compounds (bioisosteres) to the lead candidate. OpenEye’s WABE is one such tool. Lead optimization tools such as WABE offer a rational approach to drug design that can reduce the time and expense of searching for related compounds.

Similarity Searches.

Similarity Searches. A common activity in biopharmaceutical companies is the search for drug analogues. Starting with a promising drug molecule, one can search for chemical compounds with similar structure or properties to a known compound. There are a variety of methods used in these searches, including sequence similarity, 2D and 3D shape similarity, substructure similarity, electrostatic similarity and others. A variety of bioinformatic tools and search engines are available for this work.

Homology Modeling

Homology Modeling. Another common challenge in CADD research is determining the 3-D structure of proteins. Most drug targets are proteins, so it’s important to know their 3-D structure in detail. It’s estimated that the human body has 500,000 to 1 million proteins. However, the 3-D structure is known for only a small fraction of these. Homology modeling is one method used to predict 3-D structure. In homology modeling, the amino acid sequence of a specific protein (target) is known, and the 3-D structures of proteins related to the target (templates) are known. Bioinformatics software tools are then used to predict the 3-D structure of the target based on the known 3-D structures of the templates. MODELLER is a well-known tool in homology modeling, and the SWISS-MODEL Repository is a database of protein structures created with homology modeling.

Sequence Analysis

Sequence Analysis. In CADD research, one often knows the genetic sequence of multiple organisms or the amino acid sequence of proteins from several species. It is very useful to determine how similar or dissimilar the organisms are based on gene or protein sequences. With this information one can infer the evolutionary relationships of the organisms, search for similar sequences in bioinformatic databases and find related species to those under investigation. There are many bioinformatic sequence analysis tools that can be used to determine the level of sequence similarity.

Virtual High-Throughput Screening (vHTS)

Virtual High-Throughput Screening (vHTS). Pharmaceutical companies are always searching for new leads to develop into drug compounds. One search method is virtual high-throughput screening. In vHTS, protein targets are screened against databases of small-molecule compounds to see which molecules bind strongly to the target. If there is a “hit” with a particular compound, it can be extracted from the database for further testing. With today’s computational resources, several million compounds can be screened in a few days on sufficiently large clustered computers. Pursuing a handful of promising leads for further development can save researchers considerable time and expense. ZINC is a good example of a vHTS compound library.

Computer Aided Drug Design

On the support side of the hub, Information Technology, Information Management, software applications, databases and computational resources all provide the infrastructure for bioinformatics. On the scientific side of the hub, bioinformatic methods are used extensively in molecular biology, genomics, proteomics, other emerging areas (i.e. metabolomics, transcriptomics) and in CADD research.
here are several key areas where bioinformatics supports CADD research.
Virtual High-Throughput Screening (vHTS).
Sequence Analysis.
Homology Modeling.
Similarity Searches
Drug Lead Optimization.
Physicochemical Modeling.
Drug Bioavailability and Bioactivity 

Bioinformatics in Computer-Aided Drug Design

Computer-Aided Drug Design (CADD) is a specialized discipline that uses computational methods to simulate drug-receptor interactions. CADD methods are heavily dependent on bioinformatics tools, applications and databases. As such, there is considerable overlap in CADD research and bioinformatics.Bioinformatics can be thought of as a central hub that unites several disciplines and methodologies.

Thursday, August 19, 2010

Computer Aided Drug Design

Computer aided drug design and bioinformatics:

Drug design is an integrated developing discipline. It involves the study
of effects of biologically active compound on the basis of molecular interaction
in terms of molecular structure or its physiochemical properties involved.


The development of new methods in the field of molecular biology and computer
science, has improved the tools for drug design significantly. More and more
new drugs are developed with the help of computer technique.

The field of bioinformatics has become a major part of the drug design that
plays a key role for validation drug targets. Bioinformatics can help in understanding
of complex biological processes and help improve in understanding of complex
biological processes and help improve drug discovery.
Drug design is an iterative process that begins when a chemist identifies a
compound that displays an interesting biological profile and ends when both
the activity profile and the chemical synthesis of the new chemical entity are
optimized. In general, clinically used drugs are not discovered. The compound
likely discovered as lead compound. The lead is a prototype compound that has
a desired biological or pharmacological activity but may have many undesirable
characteristics eg. high toxicity, insolubility etc. For designing of
drug there are two types of hypothesis viz. Drug discovery with out lead and
lead discovery.


Traditional approaches to drug discovery rely on a step-wise synthesis and screening program for large numbers of compounds to optimize activity profiles. Nobody could design a drug before knowing more about the disease or infectious process than past. For "rational" design, the first necessary step is the identification of a molecular target critical to a disease process or an infectious pathogen. Then the important prerequisite of "drug design" is the determination of the molecular structure of target, which makes sense of the word “rational”.

Bioinformatics Tutorials Using ClustalW to do a multiple sequence alignment

Bioinformatics Web Practical

It is an online practical for the prediction of the structure and function of the unknown protein by using primary and secondary biological databases
First go http://umber.embnet.org/dbbrowser/bioactivity/
press "ready to go" and then "go" now you are at the page http://umber.embnet.org/dbbrowser/bioactivity/nucleicfrm.html at the top of the you have Sequence translation & identification select "materials" which contain the unknown nucleotide sequences you can also use your own sequence of interest click materials and select any fragment it is the dna sequence click the fragment and get its sequence copy this sequence and paste it in translator to get the translated sequence and also find the orf(open reading frame) of the sequence . Copy the orf and paste in OWL which is actually contain information to which organism your query sequence is present. Find the exact match of your sequence.After finding the exact match copy the name of the organism and paste in query at http://umber.embnet.org/dbbrowser/bioactivity/proteinfrm.html to get the full protein sequence to which our query is part.After getting the full sequence copy it and paste in psi blast to get similarity to other related proteins
now you can find the structure and function of the protein by using this primary database the secondary database procedure will be told you latter. The first hit of the blast result will be the sequence of our query protein you can find its structure and function and compare its structure and function to other hits.

Overview of the UniProt Bioinformatics Website

How to predict the function of unknown or known protein using Swissprot?

First go http://expasy.org/sprot/ search uniprotkb for protein name (opsin 1) then go at this page it will give this protein in many different organism http://www.uniprot.org/uniprot/?query=opsin%201 so narrow down your search click the fields>> and select any option instead of All like organism [OS] so select in next bar human [9606] and click Add&Search the result of protein family present in humans is appeared as a result click any one of the ID so you can get the information about any protein like its domains and function etc http://www.uniprot.org/uniprot/P08100
I hope you will like it regards "Quratt ul ain Siddique"

Wednesday, August 18, 2010

Bioinformatics Tutorials (Lesson 2):Using BLAST to search for similarities

Bioinformatics Tutorials (Lesson 1):Using SwissProt database to search for a specific protein

Adeno-Associated Virus 9 SWISS-MODEL Structure

How are the opsin genes related to each other?

Answering this question requires making a multiple sequence alignment and then using it to make a phylogenetic tree. For these tasks, we move to another database where it's a little easier to gather a bunch of sequences into a single FASTA file.
Point your browser to http://us.expasy.org. PASY is mirrored at several locations including the following:
http://www.expasy.org/  http://ca.expasy.org/
If one does not work or responds slow, try a different one.
You see the home page of ExPASy, the Expert Protein Analysis System. As I said earlier, ExPASy is a complete protein tool box. With ExPASy, you can do almost any imaginable analysis or comparison of protein sequences and structures.
Click Swiss-Prot and TrEMBL under Databases.
Read the introduction to these databases. They are high quality protein sequence databases with abundant annotation, minimal redundancy, and many connections to other databases.
Click Advanced search in Swiss-Prot and TrEMBL.
With advance searching, you can limit your search to specific genes and organisms, and you can search on descriptive information in the entries
Set up a search for human opsins, as follows:
  • Search Swiss-Prot only.
  • Enter Description: opsin
  • Organism: Choose "Human" from the pull-down menu
  • Check "Append and prefix * to query terms. The * is a "wild card". You are searching for all entries that contain "opsin" as a whole or partial word.
Click Submit.
The page Swiss-Prot description is your search result page.
Look over the results. On 9/8/2003, this search gave 14 hits. The rod pigment rhodopsin (OPSD), along with the three cone pigments (OPSB, OPSG, OPSR). There is also a "visual pigment-like receptor peropsin", OPSX. Sound mysterious. Let's find out more about it, and in the process, see a typical Swiss-Prot entry.
Click on the gene name, OPSX.
You see the NiceProt View of Swiss-Prot: O14718. Persue this entry and try to find out just what this rhodopsin-like protein is thought to do. Under Comments, you'll learn that it's found in the retina (the RPE or retinal pigment epithelium), and that it may detect light, or perhaps monitors levels of retinoids, the general class of compounds that are the actual light absorbers in opsins. Also under Comments - Similarity, you see, as mentioned earlier, that this protein is a member of the large family of G protein-coupled receptors. If you click "G protein-coupled receptors" under the Keywords, you find a list of all purported 7-transmembrane receptor proteins in SwissProt. The human genome alone contains 350 of them! See if you can verify this statement, without counting. Now back up to the NiceProt view.
Under References click the journal citation, "Proc. Natl. Acad. Sci. U.S.A. 94:9893-9898(1997). From the resulting page, you can read a full article in the Journal of the National Academy of Sciences (PNAS) about this protein. Like many journals, PNAS puts full articles online just 6 to 12 months after publication.
Looking further down the page, you find cross-references to the protein or its gene in other databases, predicted structural features of the protein, and last, the sequence. Note also, at the bottom of the page, links to a number of ExPASy tools listed for further analysis of this sequence. Try some of them. For example, I just learned in about ten seconds from Compute pI/MW that the isoelectric pH (or pI) of this protein is 8.78. And I learned in no time at all from ScanProSite that the sequence contains signatures indicating that the protein is probably a G protein-coupled receptor (no surprise, but comforting) and that it has a retinal binding site. ProSite is a tool for finding signatures of function in new sequences.When you finish playing with these powerful tools, return to your SwissProt search results by use of the back button of your browser. If you're lost, go back to ExPASy and do the search again.
Now let's compare the sequences with each other. We'll use the program ClustalW to make a multiple sequence alignment.
Scroll down the result page and check the boxes at the left of these entries
  • OPSB (blue-sensitive opsin)
  • OPSD (rhodopsin)
  • OPSG (green-sensitive opsin)
  • OPSR (red-sensitive opsin)
  • OPSX (visual pigment-like receptor opsin)
At the top of the page, at Send selected sequences to, select Clustal W (multiple alignment) from the menu, and click Submit.
ClustalW has been implemented at many web sites. This one, at EMBnet.org, automatically receives the FASTA files from the selected entries, allows you to make some settings of the alignment criteria, and then does the alignment. We will just accept the default alignment settings. First, scroll in the Input Sequences box and verify that it contains five FASTA files, one right after the other. To make them easier to identify in subsequent outputs, edit the name of each FASTA comment line (begins with ">") as follows:
  • Change "sp|P03999|OPSB_HUMAN Blue-sensitive opsin (Blue cone photoreceptor pigment) - Homo sapiens (Human)." to "Blue".
  • Change "sp|P08100|OPSD_HUMAN Rhodopsin (Opsin 2) - Homo sapiens (Human)." to "Rhodopsin".
  • Change "sp|P04001|OPSG_HUMAN Green-sensitive opsin (Green cone photoreceptor pigment) - Homo sapiens (Human)." to "Green".
  • Change "sp|P04000|OPSR_HUMAN Red-sensitive opsin (Red cone photoreceptor pigment) - Homo sapiens (Human)." to "Red".
  • Change "sp|O14718|OPSX_HUMAN Visual pigment-like receptor peropsin - Homo sapiens (Human)." to "Peropsin".
In all cases, be sure to leave the ">" in the first line of each FASTA entry. To save some work in case something goes wrong, select the edited contents of the Input Sequences box, copy it, and paste it onto an empty word-processor page, and save the file in text format. Name it Opsins.txt.
Click Run ClustalW.
The resulting page is called ClustalW query receipt, and it contains links to several output files.
Click clustalw (aln).
You see the typical ClustalW alignment file, showing our five protein sequences aligned to maximize identical and similar residues. Below each line of five sequences are symbols to show the extent of similarity among the sequences. An asterisk (*) means that the same residue is always (that is, for all of these sequences) found at that location; for example, the first asterisk marks a location where only N (asparagine) is found. Colon (:) means that all residues at this location are very similar; for example, the first colon is where only F (phenylaline), I (isoleucine), and L (leucine) -- residues with large, nonpolar sidechains -- occur. Period (.) means somewhat similar residues; for example, at the first period, serine, threonine, and glutamine occur -- all polar, but varied in size. If there is no mark then the residues at that location display no predominant common properties.
Once more, as a safety measure, copy this alignment to your clipboard, and paste it onto an empty word-processor page. Then save the file in text format. Name it OpsMSA.txt. Remember that it is still on your clipboard, for pasting at our next stop. This multiple sequence alignment is one type of input you can use to make a phylogenetic tree.

How to predict the 3D Structure of known or unknown Protein ?

If you have a nucleotide or amino acid sequence of unknown or known protein and you want to predict its structure then follow the following steps given as :

1.Apply blast to your sequence and get the name of your protein which you are using as a query.

eg your protein name is opsin 1 then go to http://www.ncbi.nlm.nih.gov/ and select structure from search drop down menu and enter the protein name as opsin 1. Then structure of this protein and slight function will be available at this page http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=77531
click the option of download cn3D then download this software for your own system it is a desktop application

it is used to visualize accurately the 3D structure of your protein and you can also rotate your protein with the help of your mouse
The structure will be as

Tuesday, August 17, 2010

Bacteriorhodopsin

Rhodopsin interaction with transducin

Tutorial: Examining the 3D Structure of Rhodopsin at NCBI

What proteins in humans are similar to the red opsin?

Now return to the NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/. We're going to search the human genome for sequences similar to that of the red opsin.

Click the B next to Homo sapiens (human).

This is the NCBI's BLAST search tool. BLAST is a widely used program for finding sequences similar to a "query" sequence that you're interest in. Pick these options from the various menus:
Database: Protein (Search the database of proteins sequences.)
Program: blastp (Use the version of BLAST that compares protein sequences, unlike blastn, which compares nucleotide sequences.)
Other Parameters, Expect: 10 (The higher the number, the less stringent that matching, and the more hits you'll get)

Next, copy the FASTA data from your file protred.txt to your clipboard, and paste it into the BLAST search box, above which it says, "Enter an accession..." Check to be sure that the first character in the box is the ">" at the beginning of the FASTA data. Then click Begin Search.

The next page is for formatting your search results. Just click that enthusiastic Format! button. When your results are ready, the results of BLAST page appears. Look down the page to the graphical display, a box containing lots of colored lines. Each line represents a hit from your blast search. If you pass your mouse cursor over a red line, the narrow box just above the box gives a brief description of the hit. You'll find that the first hit is your red opsin. That's encouraging, because the best match should be to the query sequence itself, and you got this sequence from that gene entry. The second hit is the green opsin -- remember that the PubMed entry reported that the red and green pigments are the most similar. The third and fourth hits are the blue opsin and the rod-cell pigment rhodopsin. Other hits have lower numbers of matching residues, and are color coded according to a score of matches. If you click on any of the colored lines, you'll skip down to more information about that hit, and you can see how much similarity each one has to the red opsin, your original query sequence. As you go down the list, each succeeding sequence has less in common with red opsin. Each sequence is shown in comparison with red opsin in what is called a pairwise sequence alignment. Later, you'll make multiple sequence alignments from which you can discern relationships among genes.

See what you can figure out about what the scores mean. Identities are residues that are identical in the hit and the query (red opsin), when the twoo are optimally aligned.. Positives are residues that are very similar to each other (see residue number 1 in the blue opsin -- it's threonine in red opsin, and the very similar serine in the blue). Gaps are sometimes introduced into a hit to improve its alignment with the query. The more identities and positives, and the fewer gaps, the higher the score. Note that blue opsin and rhodopsin are only about 45% identical to the red opsin. Other proteins, which are apparently not visual pigments, have even lower scores. Now let's take a look at where all these hits are in the human genome.

The amino-acid sequence of this OPN1LW

Things look a lot like before, but this is a protein entry, containing the amino-acid sequence in one-letter abbreviations. Just as with the mRNA entry, turn this into a FASTA display, and copy it into a new word-processor document. Save it in text format as protred.txt. Return to LocusLink.
you can translate the FASTA format of the nucleotide sequence of the gene otherwise its amino acid sequence is also present in the genbank page you can access it here
FASTA of the amino acid sequence is at this page http://www.ncbi.nlm.nih.gov/protein/9910526

What is the nucleotide sequence of this gene?

Remember that we are looking at the gene for the red-sensitive opsin in humna vision, and it's located near the bottom tip of the X chromosome. Scroll down to NCBI Reference Sequences (RefSeq). You see that mRNA (messenger RNA) and protein sequences are available, along with a GenBank sequence.

Click the entry number beside mRNA.

This is a typical GenBank nucleotide file, and a lot of it is hard to read, but a few things are clear. First note, under references, a citations to the publication of this sequence in the scientific literature. To see an abstract of the article in which this gene was described, click the PubMed link below the reference. As you see, you've been here before. There are many ways to move from one database to another, which is both a blessing and a curse. You have to keep your eyes open for useful links, and when you find a path that you think you might use again, make a note of it and bookmark the web pages. It is frustrating to know there's an easier way to do something, and not remember how you did it.

NB to GR: point back to this abstract when you get the phylogenetic tree.
to find the sequence go to this page http://www.ncbi.nlm.nih.gov/nuccore/9910525?report=genbank
Scroll to the bottom of this long page. The last thing is the sequence of this messenger RNA. You are seeing the actual list of As, Ts, Gs, and Cs that make up the message for synthesis of this opsin. But wait! You know that RNA contains no T. In most nucleotide databases, U from RNA is represented as T, to make for easy comparison of DNA and RNA sequences. This sequence information is not in the form that is most useful for searching in databases, say, searching for related genes. Let's display this entry in a form more useful for searching.

At the top of the page, beside the Display button, pull down the menu that says default (we are looking at the default entry display), and select FASTA (note that several other display options are available). Then click the Display button. You see one descriptive or "comment" line that begins with ">", followed by the nucleotide sequence. This little file is just what you need to search nucleotide databases for similar sequences. Let's keep it for future use.
This is the FASTA format of the gene OPN1LW http://www.ncbi.nlm.nih.gov/nuccore/164419729?report=fasta
Click and drag on the web page to select everything from the ">" through the last nucleotide. Be careful not to select anything else. From your browser's Edit menu, select Copy to make a copy of this information on your clipboard, for pasting elsewhere. Now start your favorite word processor, make a new document, and paste. The FASTA comment and sequence should appear. Select all of the text and change the font to Courier or Monaco -- these "typewriter" fonts make it easy to align letters into columns, because all letter are the same width. Save this file, choosing text or plain text as the file type. Call it mrnared.txt. Save it to a convenient location for the files you'll be making later. Click your browser's Back button until you return to LocusLink.

Find and Characterize the gene using Bioinformatics and its tools and databases

By using Bioinformatics we have to find the specific gene in genome where it is present in the genome.
Here we go:
Our gene of interest is Opsin:
Where are the opsin genes in the human genome? 
First go to this site http://www.ncbi.nlm.nih.gov/mapview/. 
Read the instructions. Note that you can look at a genome by clicking on the NAME of the species, not the B beside it. The species name takes you to a viewer for the genome of that organism. The B takes you to a BLAST search tool (later).

Click Homo sapiens (human).

You see a diagram of the human chromosomes, and a search box at the top. Enter "opsin" in the box next to Search for.

Click Find.
 You see the diagram again, with red marks at your "hits", the locations of genes whose entries contain "opsin" as a whole or partial word. Below the diagram is a list of the indicated genes. Among them are the rhodopsin gene (RHO), and three cone pigments, short-, medium-, and long-wavelength sensitive opsins (for blue, green, and red light detection). Four hits look like visual pigments, which probably does not surprise you. To the left of each entry is the chromosome number, allowing you to tell which red mark corresponds to each entry. Note that two opsins are on the X chromosome, one of the sex-determining chromosomes. You can pursue multiple hits on the same chromosome with the all matches link for that chromosome.

Click all matches next to X.

You see a very complicated display (don't sweat -- we're going to use only a part of this now). On the left is a diagram of the X chromosome, with red marks at the positions of the gene(s) you've followed to this page -- in our case, the two opsins, medium- and long-wave, which are located near the bottom tip of the X chromosome. To the right are various representations of the X chromosome, with listings of annotated areas. The two opsin genes are highlighted in pink. If you pass your cursor over this page without clicking, you will find that some symbols provide brief information, most about regions that are not yet characterized well enough to have a full entry.

As you can see, there is a tremendous amount of information on this page, with links to much more. If you want full information about the meanings of abbreviations and symbols on this page, as well as the kinds of information linked to the page, you can use Map Viewer Help at the top of the page. You will find abundant information about the Map Viewer, explanations of all symbols and links, and even tutorials about how to ask and answer all kinds of questions about the genome.

For now, note the information provided for the first of the two highlighted opsin genes, OPN1LW (this is called the gene symbol). You see that this is the long-wavelength-sensitive (red) opsin, and that it's a gene involved in color blindness (a sex-linked trait -- no surprise).

The Tools used in Bioinformatics

NCBI Map Viewer
For finding genes and gene products (RNAs and proteins) that interest you
BLAST
For finding genes or proteins with sequences similar to yours
ClustalW
For comparing your sequence with others, and lots of sequences with each other
Phylip
For making phylogenetic trees, which show how sequences are related to each other.
Treeprint
For printing phylogenetic trees
PSIPRED
For predicting the location of helices, pleated sheets, and transmembrane elements of proteins of unknown structure
Swiss-Model
For automated building theoretical structural models of your sequence based on known structures (homology modeling)
Deep View (also knows as Swiss-PdbViewer)
For seeing and exploring macromolecular models in three dimensions, and for manual and semiautomated homology modeling
PubMed
For searching ALL the literature of the life sciences
ExPASy (Expert Protein Analysis System
Not so much a tool as a tool box -- a very complete set of protein analysis tools

Databases used in Bioinformatics

The Databases (and their acronyms!):
 
Genbank, operated by NCBI (National Center for Biotechnology Information)
Contains all publicly available sequences of DNA, with annotations
Same DNA sequence content as EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Data Bank of Japan)
Swiss-Prot and TrEMBL, operated by SIB (Swiss Institute of Bioinformatics) and EBI (European Bioinformatics Institute)
Contains most of the publicly available sequences of proteins, with annotations
Protein Data Bank
Contains all publicly availalble experimentally determined structural models of proteins and nucleic acids (determined by x-ray crystallography and NMR)
Swiss-Model Repository
Contains many theoretical structural models of proteins (determined by automated homology modeling)
Online Mendelian Inheritance in Man
A catalog of human genes and genetic disorders, linked to gene entries in GenBank

Sunday, August 15, 2010

How to use Genbank Database

The Genbank Sequence Database is an open access,annotated collection of all publically available sequences and their protein translations. This database is produced at NCBI as the part of INSDC .

For making use of Genbank follow this tutorial:
Making sense of the Genbank entry of the Prokaryotic Gene.
Go to:
www.ncbi.nlm.nih.gov/entrez/.
Select Nucleotide from search drop down menu and enter your query's accession number from Genbank. eg accession number is X01714 .The output page will be displayed which will tell you E.coli dut gene for dUTPase.
Then click the test button on the text bar to generate a true flat file format of the entry and you can save your entry
by choosing file save from your browser's main menu.
click on the hyperlink for further details of the entry.

the Genbank file format has three parts
1. Comment or defination line
2. Citation
3. Sequence
You can also view the Fasta format of the gene which includes the defination line and nucleotide sequence of your query.
from display settings select FASTA.
To go further retrieving Genbank  entries without using accession numbers.
 Select nucleotide from search drop down menu and type your query in the form of keywords as in the case of
 Human[organism] AND dUTPase [protein name].
Accession number search will give you the exact result of gene on locus but keyword search will give you many entries like exon 1,exon 2,exon 3, mRNA 1,mRNA 2 etc. different accession number entries indicate full amino acid sequence of the two forms of(nuclear and mitochondrial) of the dUTPase protein as well as alternative exon usage pattern.
 This tutorial will over I hope you will be benifited from this tutorial.
Regards "Quratt ul ain Siddique"

What is Genbank Database

GenBank is the NIH genetic sequence database, an annotated collection
of all publicly available DNA sequences. GenBank (at NCBI), together
with the DNA DataBank of Japan (DDBJ) and the European Molecular
Biology Laboratory (EMBL) comprise the International Nucleotide
Sequence Database Collaboration. These three organizations exchange
data on a daily basis.

GenBank grows at an exponential rate, with the number of nucleotide
bases doubling approximately every 14 months. Currently, GenBank
contains more than 13 billion bases from over 100,000 species.

Biological Databases

What Is a Biological Database?

A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence.
For researchers to benefit from the data stored in a database, two additional requirements must be met:
easy access to the information

a method for extracting only that information needed to answer a specific biological questionThe data in GenBank are made available in a variety of ways, each tailored to a particular use, such as data submission or sequence searching.



At NCBI, many of our databases are linked through a unique search and retrieval system, called Entrez. Entrez (pronounced ahn' tray) allows a user to not only access and retrieve specific information from a single database but to access integrated information from many NCBI databases. For example, the Entrez Protein database is cross-linked to the Entrez Taxonomy database. This allows a researcher to find taxonomic information (taxonomy is a division of the natural sciences that deals with the classification of animals and plants) for the species from which a protein sequence was derived.

Importance of Bioinformatics

Why Is Bioinformatics So Important?
Although a human disease may not be found in exactly the same form in animals, there may be sufficient data for an animal model that allow researchers to make inferences about the process in humans.


The rationale for applying computational approaches to facilitate the understanding of various biological processes includes:
a more global perspective in experimental design


the ability to capitalize on the emerging technology of database-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms

Introduction to Bioinforamtics

What Is Bioinformatics

Biology in the 21st century is being transformed from a purely lab-based science to an information science as well.


Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.
Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:

the development and implementation of tools that enable efficient access to, and use and management of, various types of information

the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequence

Twitter Delicious Facebook Digg Stumbleupon Favorites More