Select "BLASTP" tab as alignment program on the top of the page
Compute the alignment.
Which features in the alignment indicate that it is a local and not a global alignment?
Click "Edit and Resubmit" button, expand "Algorithm parameters". Record the E values for different scoring matrices at the "Matrix" in "Scoring Parameters":
Global Pairwise Alignment with Needleman-Wunsch Method
Open the Needleman-Wunsch Alignment tool from EMBOSS
Copy and paste the P450 sequences from previous section into separate sequence fields.
Compute the alignment.
Which features in the alignment indicate that it is a global and not a local alignment?
Multiple Alignment
Open the web page for the multiple alignment program MultAlin
Click on the Browse button and upload the sequences P450 sequences that were saved to a file in the first step.
Compute the multiple alignment by clicking the Start button.
Identify the invariant cysteine residue at the end of the multiple alignment.
Locate this invariant cysteine in the Needleman-Wunsch and BLAST2 alignments.
Explain why multiple alignment are more appropriate for identifying consensus sequences than pairwise alignments?
Day 2: Phylogenetics, Gene Annotations & Introduction to R
Download Cytochrome P450 Sequences from NCBI
Go to http://www.ncbi.nlm.nih.gov/protein, in protein database, run the following query:
P450 & hydroxylase & human [organism]
Select under Limits SwissProt as database.
Save the 24 sequences in FASTA format to your desktop via the "Send to" drop down menu.
Phylogenetic Analysis Pipeline
Open the Phylogeny.fr web page. The Documentation of this phylogenetic analysis page provides a very colorful overview of its rich utilities.
To start the analysis, select "A La Carte" in the Phylogeny Analysis section. Under the workflow settings select ClustalW as multiple alignment program and "ProtDist/FasstDist + Neighbor". The latter option defines the Neighbor-Joining algorithm as the tree building method. The default settings should be used for the remaining options. Start the analysis by clicking the "Create workflow" button.
Upload your sequences on the next page by selecting the "Browse" function. Clicking the "Submit" button will run the entire phylogenetic analysis pipeline from the start to the end. This includes the following steps:
An optional alignment curation step that allows the user to edit the multiple alignment (e.g. removal of long unaligned sections).
Computation of a distance matrix.
Calculation of a neighbor-joining tree based on the rate corrected distance matrix.
Upload of the tree to an interactive tree viewing page.
It is also possible to run the Workflow in a step-by-set mode by selecting on its setup page "Run Workflow step by step".
After the pipeline run is finished, the following exercises will evaluate the intermediate data sets individually that were generated by each analysis step.
Multiple Alignment
After the pipeline is finished click the "3. Alignment" tab on the top of the final result page. This alignment page allows to view the alignment with background shading in a separate window. In addition, one can download the alignment in different formats for imports into other programs.
Distance Matrix
Next, click the "5. Phylogeny" tab on the top of the site. The window in the middle of the resulting page shows the tree in pure text format. Below the tree there are links to download the distance matrix and the corresponding raw tree in the Newick format.
Phylogenetic Tree: Rooting and Editing
Click the "6. Tree Rendering" tab on the top of the site. This page allows the user to root the tree, to edit the tree labels, to color its branches and to save the tree in different formats.
Root the tree with the midpoint method by click the corresponding button on the page. Describe in one sentence how the rooting steps changes the tree.
Change the text of some of the leaf labels of the tree by clicking on the "Change leaf name" button and then clicking on the labels. For instance, remove from some leaf labels the description part until only their ID numbers remain.
Download the tree in Newick format to the desktop and inspect it in a text editor. Newick is one of the most common tree formats that can be imported into many other programs including the TreeDyn program that is used for the Phylogeny.fr pipeline.
Colorize the selected tree branches with the "Colorize" function. Finally save the tree image in PNG format (similar to JPEG format). The available SVG format allows to import the tree into vector graphics programs like InkScape for further post-processing.
Pipeline Report
A very nice feature of this page is that it provides an overview report for each analysis pipeline. To open the analysis report for this analysis click the "1. Overview" tab on top of the site.
Table of Contents
Introduction to R and Bioconductor
[ Slides ] [ R Code ]
- Load R Code into RStudio. Then follow instructions.
- To install Bioconductor libraries, please follow these instructions.
Table of Contents
Handling Genome Data in R and Bioconductor
[ Slides ] [ R Code ]
- Load R Code into RStudio. Then follow instructions.
- To install the libraries required for this exercise, please run the following two commands from within R: