Wagner Trees: Phylogeny Reconstruction

(from an exercise by Dr. John Lundberg)

readings - introduction - exercise instructions - homework - instructor hints


Farris, J. S. 1970. Methods for computing Wagner trees. Syst. Zool . 19: 83-92.

Introduction for students

Wagner trees (and networks) are estimates of evolutionary trees. The construction of a Wagner Tree begins with a table that lists a series of characters for each species. (This is called a character by  taxon matrix.)  In the table the states of characters are coded numerically. The Wagner Tree is a evolutionary tree that t requires the minimum number of evolutionary steps to explain the observed pattern. It is therefore the "most parsimonious" tree.

Exercise Instructions

  • Choose characters to study in the taxa provided. This involves estimating homologous relations of features in different organisms using the criteria noted in class. Verbalize or, if possible, enumerate the states of each character.
  • Arrange the character states in the taxa under study on the basis of their overall similarity. This similarity should thus reflect an postulated evolutionary sequence. Thus, if a 5 toed ancestor gave rise to a 1 toed descendent, we would suppose a series of intermediate ancestors with 4, 3, and 2 toes.  Examples of "morphocline state identifiers":
    • 5 toes - 4 toes - 3 toes - 2 toes - 1 toe
    • 5 mm - 20 mm - 21 mm
    • blue - green - yellow
    • eyes present - eyes absent
  • Estimate an ancestral state for each character using the criteria for primitiveness discussed in class. The selection of an ancestral state yields a hypothesis on the direction of evolution for a character. For example:
    • blue ---------> green ---------> yellow          OR
    • blue <--------- green ---------> yellow          OR
    • blue <--------- green <--------- yellow          
  • Using integers in an additive fashion, and a similar scale for each character (by convention a unit difference between adjacent character states) code numerically to preserve the estimated form and direction of a character state evolution. The coding schemes below correspond to the sequence of character states listed in the example that starts with the toes. 
    • 0 - 1 - 2 - 3 - 4      OR        4 - 3 - 2 - 1 - 0
    • 0 - 1 - 2                 OR        2 - 1 - 0
    • 0 - 1 - 2                 OR        2 - 1 - 0
    • 0 - 1                       OR        1 - 0

The Ancestor
  • Now make an entry for the character states of the hypothetical ancestor.

  • The common ancestor for the group of taxa under study is taken to be the collection of all estimated ancestral states.
  • Add the ancestor to the "character by species" data table

The steps in tree construction and a hypothetical example.

The steps in the construction of a Wagner Tree are illustrated here with a hypothetical data set of 4 taxa and 12 characters ( Link to Character by Species Data Table ). The characters are already coded and an ancestor for the four taxa has been estimated and added to the initial data matrix.

Step 1. Calculate the table showing the differences between the species according to the formula:

The difference between any two spp. is the sum of the absolute difference between all their character states

In explicit notation :

wagner formula

  • where d (J,K) is the entry in each cell of the table and represents the difference between the two taxa J and K
  • X is the character state value in either species J or K for a character i
  • and n is the number of characters

Link to Difference Matrix

Step 2. Form the first interval of the tree by connecting the ancestor to the taxon to which it is most similar.

sample tree

The first interval has a length of 4, i.e. 4 evolutionary steps separate C from the ancestor. Check the data matrix to see that these steps involve characters 3, 6, 10 and 11.

Step 3. Select the next taxon to be placed on the tree.

The next taxon to be placed is that one which is most similar to any interval of the tree.

Calculate the difference between each unplaced taxon and every interval according to the formula. (**If in another example there is a tie - it is not unreasonable to toss a coin to decide placement).

d (L, INT(J,K)) = d (L, J) + d (L,K) - d (J,K)

where d (L, INT(J,K)) is the difference between an unplaced taxon L and an interval composed of the taxa J and K.
d (L, J)  and d (L,K) and  d (J,K) are the differences between these taxa taken directly from the difference matrix.

Thus, in the example, there are three remaining unplaced taxa, A, B, and D.

d (A, INT(ANC,C))
      = d(A, ANC) + d (A,C) - d (ANC - C)
      = 7 + 11 - 4
      = 14
d (B, INT(ANC,C))
      = d(B, ANC) + d (B,C) - d (ANC - C)
      = 9 + 11 - 4
      = 16
d (D, INT(ANC,C))
      = d(D, ANC) + d (D,C) - d (ANC - C)
      = 6 + 8 - 4
      = 10

Since D is the closest to the interval it is placed next.

Step 4. Construct a hypothetical intermediate ancestor.

One of the remarkable things about this method is that it is possible to reconstruct what the common ancestor of D and C should have looked like!
Obviously, we do not actually know that it looked like this, and therefore it is a hypothetical intermediate ancestor, that is sometimes called a "Hypothetical Taxonomic Unit" or HTU. 

This HTU will be placed between the the unplaced taxon selected in Step 3 and the two members of the interval to which that unplaced taxon is closest. It is possible (in most cases probable) that the unplaced taxon will diverge from the lineage leading from ANCESTOR to C at an intermediate level instead of directly from ANCESTOR or C.

Thus, from,  where exactly can we hypothesize that D diverged ? This is where parsimony plays a major role.

Step 4

The states of an HTU are always based on the states of three taxonomic units (real or hypothetical). HTU's are constructed in such a way as to reduce the number of evolutionary steps implied in the final tree.  The parsimony rules for HTU construction are as follows:

(a) If all three taxa have the same state for a character, the HTU will take that state.
Example - Character 1

step 4a

If D does connect to the HTU and if the HTU had any state except 1, we would be suggesting an evolutionary event for which there is no evidence.

(b) If two of the three taxa have the same state, the HTU will take that state.

Example, Character 4
step 4b

If D does connect to the HTU and if the HTU had state 1, we would be suggesting that 0 -->1 and that 1 --> 0 in going from ANC to C. As coded, we have no a priori evidence of reversal.

(c) If all three taxa have different states, the hypothetical taxonomic unit will take the intermediate state.

step 4 c

If D does connect to the HTU, what would we imply if the HTU had state 0 or state 2?

The HTU for D, C and ANC, called HTU 1, is different from any other taxonomic unit so far in the study and it is added to both the data and difference matrices. If the HTU turns out to be identical to C or ANC we could ignore it.

Step 5.

Connect the unplaced taxon to the real or hypothetical taxon from which it differs least.
In our hypothetical case,
Step 5

How long is each interval? Identify the characters that change state in each interval.

Step 6.

If any real taxon remains unplaced, return to step 3. Otherwise, stop. (* Note that in our hypothetical case we have now 2 remaining taxa. In the next step 3, these will have to be compared to each of the three intervals that currently exist in the tree. Our old first interval, ANC - C, was destroyed with the intercalation of HTU 1.).

Complete the construction of the tree based on the data at hand. The answer is:

step 6

Plot the position of all character state changes. Which characters undergo homoplasious evolution? List the derived states that are shared by groups of taxa and the derived states that are unique to single taxa.

  • What are the consequences, in terms of homoplasy, in terms of homoplasy, of shifting the cladistic positions of taxa?
  • Or can you make a shorter tree in terms of total number of evolutionary steps by shifting the positions of taxa or groups of taxa?

Wagner Networks are an extension of Wagner Trees. The difference is that no ancestral states are estimated a priori, and therefore,  there is no hypothetical common ancestor to serve as a starting point for construction. The first interval is made by connecting  (by convention) the two most different real taxa. After this one proceeds as in tree construction. Wagner networks yield hypotheses on cladistic pattern, but not sequence. But, sometimes the pattern may help in inferring sequence.

Homework or in class

ASSIGNMENT: Construct a Wagner Tree using the imaginary animals of the genus Lundbergia (see below).

  • Decide which characters differ between species. Assign numerical values for each character state.
  • Construct a character x taxon matrix, and then construct a difference matrix.
  • Finally, construct a tree based on your determinations.

Creatures (Lundbergia sp.)  for Wagner Trees (for Students)

Hints for Instructors

Here is a link to the matrices for the assignment.
If you would like to make this a more challenging experiment, you can add other Lundbergia species to the current three species.

Data Matrices for Lundbergia (for Instructors)