Chat room

Create a Meebo Chat Room

Wednesday, August 18, 2010

How are the opsin genes related to each other?

Answering this question requires making a multiple sequence alignment and then using it to make a phylogenetic tree. For these tasks, we move to another database where it's a little easier to gather a bunch of sequences into a single FASTA file.
Point your browser to http://us.expasy.org. PASY is mirrored at several locations including the following:
http://www.expasy.org/  http://ca.expasy.org/
If one does not work or responds slow, try a different one.
You see the home page of ExPASy, the Expert Protein Analysis System. As I said earlier, ExPASy is a complete protein tool box. With ExPASy, you can do almost any imaginable analysis or comparison of protein sequences and structures.
Click Swiss-Prot and TrEMBL under Databases.
Read the introduction to these databases. They are high quality protein sequence databases with abundant annotation, minimal redundancy, and many connections to other databases.
Click Advanced search in Swiss-Prot and TrEMBL.
With advance searching, you can limit your search to specific genes and organisms, and you can search on descriptive information in the entries
Set up a search for human opsins, as follows:

  • Search Swiss-Prot only.
  • Enter Description: opsin
  • Organism: Choose "Human" from the pull-down menu
  • Check "Append and prefix * to query terms. The * is a "wild card". You are searching for all entries that contain "opsin" as a whole or partial word.
Click Submit.
The page Swiss-Prot description is your search result page.
Look over the results. On 9/8/2003, this search gave 14 hits. The rod pigment rhodopsin (OPSD), along with the three cone pigments (OPSB, OPSG, OPSR). There is also a "visual pigment-like receptor peropsin", OPSX. Sound mysterious. Let's find out more about it, and in the process, see a typical Swiss-Prot entry.
Click on the gene name, OPSX.
You see the NiceProt View of Swiss-Prot: O14718. Persue this entry and try to find out just what this rhodopsin-like protein is thought to do. Under Comments, you'll learn that it's found in the retina (the RPE or retinal pigment epithelium), and that it may detect light, or perhaps monitors levels of retinoids, the general class of compounds that are the actual light absorbers in opsins. Also under Comments - Similarity, you see, as mentioned earlier, that this protein is a member of the large family of G protein-coupled receptors. If you click "G protein-coupled receptors" under the Keywords, you find a list of all purported 7-transmembrane receptor proteins in SwissProt. The human genome alone contains 350 of them! See if you can verify this statement, without counting. Now back up to the NiceProt view.
Under References click the journal citation, "Proc. Natl. Acad. Sci. U.S.A. 94:9893-9898(1997). From the resulting page, you can read a full article in the Journal of the National Academy of Sciences (PNAS) about this protein. Like many journals, PNAS puts full articles online just 6 to 12 months after publication.
Looking further down the page, you find cross-references to the protein or its gene in other databases, predicted structural features of the protein, and last, the sequence. Note also, at the bottom of the page, links to a number of ExPASy tools listed for further analysis of this sequence. Try some of them. For example, I just learned in about ten seconds from Compute pI/MW that the isoelectric pH (or pI) of this protein is 8.78. And I learned in no time at all from ScanProSite that the sequence contains signatures indicating that the protein is probably a G protein-coupled receptor (no surprise, but comforting) and that it has a retinal binding site. ProSite is a tool for finding signatures of function in new sequences.When you finish playing with these powerful tools, return to your SwissProt search results by use of the back button of your browser. If you're lost, go back to ExPASy and do the search again.
Now let's compare the sequences with each other. We'll use the program ClustalW to make a multiple sequence alignment.
Scroll down the result page and check the boxes at the left of these entries
  • OPSB (blue-sensitive opsin)
  • OPSD (rhodopsin)
  • OPSG (green-sensitive opsin)
  • OPSR (red-sensitive opsin)
  • OPSX (visual pigment-like receptor opsin)
At the top of the page, at Send selected sequences to, select Clustal W (multiple alignment) from the menu, and click Submit.
ClustalW has been implemented at many web sites. This one, at EMBnet.org, automatically receives the FASTA files from the selected entries, allows you to make some settings of the alignment criteria, and then does the alignment. We will just accept the default alignment settings. First, scroll in the Input Sequences box and verify that it contains five FASTA files, one right after the other. To make them easier to identify in subsequent outputs, edit the name of each FASTA comment line (begins with ">") as follows:
  • Change "sp|P03999|OPSB_HUMAN Blue-sensitive opsin (Blue cone photoreceptor pigment) - Homo sapiens (Human)." to "Blue".
  • Change "sp|P08100|OPSD_HUMAN Rhodopsin (Opsin 2) - Homo sapiens (Human)." to "Rhodopsin".
  • Change "sp|P04001|OPSG_HUMAN Green-sensitive opsin (Green cone photoreceptor pigment) - Homo sapiens (Human)." to "Green".
  • Change "sp|P04000|OPSR_HUMAN Red-sensitive opsin (Red cone photoreceptor pigment) - Homo sapiens (Human)." to "Red".
  • Change "sp|O14718|OPSX_HUMAN Visual pigment-like receptor peropsin - Homo sapiens (Human)." to "Peropsin".
In all cases, be sure to leave the ">" in the first line of each FASTA entry. To save some work in case something goes wrong, select the edited contents of the Input Sequences box, copy it, and paste it onto an empty word-processor page, and save the file in text format. Name it Opsins.txt.
Click Run ClustalW.
The resulting page is called ClustalW query receipt, and it contains links to several output files.
Click clustalw (aln).
You see the typical ClustalW alignment file, showing our five protein sequences aligned to maximize identical and similar residues. Below each line of five sequences are symbols to show the extent of similarity among the sequences. An asterisk (*) means that the same residue is always (that is, for all of these sequences) found at that location; for example, the first asterisk marks a location where only N (asparagine) is found. Colon (:) means that all residues at this location are very similar; for example, the first colon is where only F (phenylaline), I (isoleucine), and L (leucine) -- residues with large, nonpolar sidechains -- occur. Period (.) means somewhat similar residues; for example, at the first period, serine, threonine, and glutamine occur -- all polar, but varied in size. If there is no mark then the residues at that location display no predominant common properties.
Once more, as a safety measure, copy this alignment to your clipboard, and paste it onto an empty word-processor page. Then save the file in text format. Name it OpsMSA.txt. Remember that it is still on your clipboard, for pasting at our next stop. This multiple sequence alignment is one type of input you can use to make a phylogenetic tree.

0 comments:

Post a Comment

Pages 381234 »
Twitter Delicious Facebook Digg Stumbleupon Favorites More