History of Hominin Brain Size

A few years ago, paleoanthropologists agreed that there was a burst of evolution in brain size about 2 mya, followed by a long interval with little change, and then another burst within the past several hundreds of thousands of years. More recently, people have been advocating other positions, involving a long history of gradual change (see John Hawks's blog or the article by Sang-Hee Lee and Milford Wolpoff.)

What's remarkable about this is that no new data are involved. The differing positions just involve different approaches to graphical data analysis. In this lab, you'll have a look for yourself. You'll work with a data set in which each observation describes an individual fossil skull. There are two variables: age and brvol. The first variable gives the age of the skull in millions of years. (Where this value is known only approximately, I've chosen the middle of the range of plausible values.) The second variable is an estimate of cranial capacity in cc. To get these data into R, download the brhist.txt file from the data page on the class website and import it using read.table.

Readings

Cleveland, WS. 1993. Visualizing Data, from beginning of chapter 3 through section 3.3.

Exercise

Make a scatter plot of brvol against age. To make time go from left to right, define myxlim like this:

myxlim <- c(max(brhist$age)+0.1, min(brhist$age)-0.1)

where brhist is the name of your data frame, and then add + xlim(myxlim) to ggplot.

Fit a straight line to the data, and plot the data along with the line. Make a residual dependence plot (plotting residuals against age), including a horizontal line at y=0 and a loess smooth. (Hint: use the "resdep" function that you will find in the examples page.) Write a sentence or two about this residual plot. Is the straight-line fit satisfactory?
Repeat step 2 for the quadratic model: brvol ~ age + I(age^2)
Repeat for a loess model with span=2/3 and degree 1.
Use an spread-location (sl) plot to check for non-uniform spread in residuals. You should find that spread increases strongly with the magnitude of the fitted value.
To correct this problem, repeat steps 2-4 with the log of brvol. Use residual dependence plots to decide which model works best. Then check for non-uniform spread using an sl plot.
Once you have arrived at a satisfactory model, use an r-f spread (rfs) plot to see how well it explains the data.
If all went well, you should discover that these data are well described by loess regression on log scale. The final step is to tune loess so that it does an appropriate amount of smoothing. Begin with the default value (span=0.75), and then experiment with different values of span. Smaller values of span imply less smoothing. You want a value that is just small enough to flatten out the wiggles in the smoothed residuals. (I never worry about the value of span in the residual dependence plot.)
Write a paragraph summarizing the conclusions from this analysis.