Phylogenetics Laboratory

Molecular Phylogenetics Exercise


Week 1
Week 2
Using MacClade

Definition of molecular phylogenetics:

    The study of evolutionary relationships among organisms or genes by a combination of molecular biology and statistical techniques (Li, 1997.  Molecular Evolution. Sinauer Press, Massachusetts)

1. History:

    1904 Nuttal used serological cross-reactions to infer relationships among organisms.  Showed humans closely related to apes.
    1950’s Molecular techniques such as protein sequencing and starch gel electrophoresis introduced into evolutionary studies.
    1960’s-1970’s Molecular data used in phylogeny reconstruction at higher levels such as orders and classes.
    1985 development of PCR (polymerase chain reaction) has led to unprecedented levels of activity in phylogenetic

2.  Why use DNA sequence data instead of morphological data?

3.  Rules of molecular evolution

4.  DNA sequences: Alignment is crucial


5. Phylogeny reconstruction techniques


Phylogeny reconstruction using molecular data

Week 1

This lab will teach you how to:

Today you will acquire the sequences from GenBank and create a PAUP file in MS Word.  Next week you will import your MS Word file to PAUP, and from PAUP to MacClade.


Epidemiological studies have increasingly used DNA sequence information.  To illustrate the utility of phylogenetic methods and sequence data, you will explore the following true case.  A French patient is hospitalized for an operation.  Prior to hospitalization, she is HIV negative and has no risk factors associated with HIV.  Shortly after hospitalization, however, she is found to be HIV positive.  Of the hospital staff, two nurses are HIV positive and so one of them may have infected the patient.  


GOAL: Determine which nurse, if either, is responsible for infecting the patient.  You will provide printouts of your computer results and an explanation of your findings.

GenBank to PAUP:  Getting sequences

                    Nurse 1:  AF125605
                    Nurse 2:  AF125606

These are the three main "players" in this study.  But in addition, you need a sample of HIV sequences from the French population at large in order to test whether the patient acquired the disease outside the hospital.  Here are some samples and accompanying labels for your file.

                    Sample 4:  AF125607
                    Sample 5:  AF125608
                    Sample 6:  AF125609
                    Sample 7:  AF125610
                    Sample 8:  AF125611

After completing these steps, you should have eight "blocks" of data, each with a distinctive label.  Next, you need to add the computer code that will provide PAUP with information about your data file.  Follow the syntax below very carefully.

At the start of your file, add the following:  #NEXUS

This simply identifies the type of input file that you have (i.e. a Nexus file).

Now you have to provide code that will tell PAUP about your data matrix.  The program also needs details on the dimensions of this data matrix in terms of the number of taxa (NTAX), which in this case are different HIV samples rather than species.  It also needs information on the number of characters for each taxon (NCHAR).  A line or two below your Nexus statement, type:  


A line or two below that block of code, type the following to identify your data matrix:  MATRIX

Now, go to the end of your file and add a semicolon (indicating the end of the data matrix), then type:


to indicate the end of the file.

Print out your file and save it to disk.  You will need it next week for running PAUP.

Next week you will also be generating a tree from your homework.  Create a PAUP file in MS Word using the above protocol.  Remember that each base (T, A, C or G) in your sequence is a different character; you therefore need to create a matrix similar to that created above, but with 1's and 0's as character states instead of nucleotide bases.  What coding do you need to change or delete for this different type of data?  What other PAUP codes need to be changed for your homework problem?


Phylogeny reconstruction 

Week 2

Last week you acquired nucleotide sequences from GenBank.  This week you will use those sequences to generate three phylogenetic trees, one for each of:  the "French patient" epidemiological study, the "bible passage" homework problem, and the "fasteners" homework problem.  For your research project, you will follow this general process of acquiring sequences and generating trees, so carefully doing this exercise in class will help you with your independent project.

The first step is to import your file from MS Word into PAUP.  Start with the epidemiological study.  First, make sure that the file is saved as “Text” (use the pull-down menu in the “Save As” dialog box).

Next, open PAUP.  When the dialog box comes up, choose your file and click on the EXECUTE BUTTON.  (If there are any problems with your file, it will not execute.  PAUP will open the file for you to edit.  Correct any mistakes, using last week’s handout as an example.  Once your file is corrected, go to Execute FILE NAME in the File menu.)

Open your file by going to the Window menu and choosing your file.  Take a look at your data matrix.  Use spaces and tabs to align your sequences in PAUP.  Scroll to the right to see each of the bases.  Save your file (as a different name) within PAUP, when it is the active window.

(Phylogenetic analysis can more definitively test the hypothesis that the patient acquired the infection in the hospital.)

Now you are ready to generate a tree.  

Go to Search menu, choose Exhaustive.  Take a look at the dialog box.  Change the default settings so that PAUP retains all trees with less than 685 steps (a change from one base to another counts as one step; the sum of these steps equals “tree length”).  Once you have made this change, hit the Search button.  

The program will evaluate every possible tree, keeping only those with fewer than 685 steps.  It will also print out a histogram of the tree lengths.  

Use the output to answer the following questions:

At this point, you have not viewed any of your trees.  Examine the trees by going to menu Trees; select “Show Trees” and choose one of your trees.  

The tree you viewed may or not be the most parsimonious tree.  Get information on your stored trees by going to the Trees menu and select “Length and fit measures”.  Identify the tree with the fewest steps; this is the most parsimonious tree.  Display that tree and use it to answer the main question:  

Now, systematically go through the other trees.  

Save your most parsimonious tree and one other, of your choosing, to bring into MacClade for the next section of this lab.  Under the Trees menu, go to “Save Trees to File”.  At the bottom of the dialog box, change “All trees” to "Tree ___ to ___".  Type in the tree numbers that you want to save, if they are consecutive; otherwise, you may have to save to two files (e.g. to save tree 1, type 1 in both boxes; to then save tree 21, type 21 in both boxes on a subsequent save).   

Before moving on to MacClade, there is one more thing to cover in PAUP.  The nurse example is relatively simple because there are very few “taxa."  In your independent project, however, you may find that it takes too long to search every possible tree (** you may also have to align your sequences **).  For larger data matrices, you can sample the possible trees by doing a “heuristic” search…

First, go to “Clear Trees” under the “Trees” menu.  Go to the Search menu and choose “heuristic”.  Keep all trees less than or equal to 685 steps.  Click search.  Examine these trees and their characteristics.  

As the final step for using PAUP, run your data matrices from the homework problems.  Are the trees equivalent to the trees that you calculated by hand?


Using MacClade

MacClade cannot be used to generate trees, but it is useful for understanding patterns of evolution.  The first step is to import your file from PAUP.  This is easy:  just open it in MacClade.  This will bring up your data file, which looks something like an Excel file.  Your sequences can be seen in this window.  Each column represents a different character (indicated by a number), with the various character states (T, A, C, or G) for each base provided in the cells below.

To see the tree, go to the Display menu and choose “Go to Tree Window”.  In the dialog box, choose “Open tree file”, and choose the file that represents your most parsimonious tree.

There is your tree!  Now, trace a character on that tree.  Under the Trace menu, choose "trace character."  It will start with character 1.  The character states of the different "taxa" are shown by boxes at the "tips" of the phylogeny (top of the window).  Different bases are indicated by the different colors.  The key to these colors is shown in the small box at the bottom right of the window.

Character one is not very interesting in this context.  All sequences are the same for this position:  they all have character state A.  In this case, then, character 1 is not "informative."

Scroll through the characters to identify situations where samples have different character states.  Do this by going to the box with the "key" to the colors (bottom right), and click on the scroll bar at the bottom of the box.

Click through the characters until you find an informative character (indicated by two colors on the tree).  

It is also useful to examine the total number of changes that occur along each branch.  

View the total changes by going to the Trace menu and choosing "Trace all changes".  The number of changes will be illustrated by colors on the different branches.  You can also view the number of changes directly by changing the "trace all changes option", also under the Trace menu.  Click "graphics options" and change the setting to "label by amount of change."  

This lab has only covered a small subset of the options available in MacClade.  Explore the windows and their options.  One particularly neat trick is to drag the branches around the screen to create new tree topologies.  For example, if you force the patient and nurse 1 to be sister taxa…

MacClade may be particularly useful for your independent project -- for more practice, try out the different tools with your other "homework" trees, imported as described above.