Inferring Phylogenetic Trees from Transposon Data

This tutorial is meant to supplement the material in the text, The Evidence for Evolution. You should read that first.

Here is a made-up set of transposon data:

Species     abcde
 mouse      *****
 rat        *****
 dog        OO***
Each column represents a transposon, "*" means "present", and "O" means absent. Thus, the first two transposons (a and b) are present in mouse rat but not in dog. The next three (c-e) are present in all three species.

What can we infer from such data? There is only one plausible way that transposons a and b could exist both in mouse and rat: those transposons must have inserted into the common ancestor of the two species. (Otherwise, we'd have to assume that the same transposon inserted in precisely the same spot in the DNA of two separate species, and that is so improbable that it would verge on the miraculous.) Consequently, transposons a and b imply that mouse and rat must have had a common ancestor. Furthermore, these transposons are not present in dog. This tells us that the common ancestor of mouse and rat was not an ancestor of dog.

Here is the tree that we can draw so far, based only on the first two transposons:

mouse -----|
  rat -----|

  dog -----------
This tree shows that mouse and rat had a common ancestor that was not also the ancestor of dog. The final three transposons (c-e) fill in the rest of the tree. They are present in all three species, so they tell us that all three species had a common ancestor. This allows us to complete the tree:

mouse -----|
  rat -----|     |
  dog -----------|
This is the phylogenetic tree implied by our made-up data set.

Let's try again with a different set of made-up data:

Species    abcdefg
   1       OO*****
   2       OO*****
   3       *****OO
   4       *****OO
Now we have two sets of closely-related species. Transposons f and g imply that species 1 and 2 share a common ancestor not shared with the other species. Thus, they occupy their own branch of the tree. Similarly, transposons a and b tell us that species 3 and 4 occupy their own branch. Finally, transposons c-e tell us that all four species have a common ancestor. In other words, the tree looks like:

 1 -----|
 2 -----|     |
 3 -----|     |
 4 -----|