Quantile-Quantile Plots

This exercise is about quantiles and quantile-quantile plots. You'll probably find these mysterious until you gain some experience with them. But persevere--they are one of the most important techniques you'll learn in this course.

Readings

Exercise

This exercise involves the ToothGrowth data set, which is part of the default R distribution. You don't have to download it. It is there for you to use whenever you launch R. This data set describes tooth length in guinea pigs that have been fed diets supplemented with Vitamin C, delivered either as orange juice (OJ) or as ascorbic acid (VC). There are also three different doses: 0.5 milligrams, 1 milligram, and 2 milligrams.

To get a sense of what's in the dataset, type

head(ToothGrowth)

at the R prompt.

As you work on this project, look for inspiration at the examples page of the course website. When you find a relevant example, click on the image. This will bring up the R code that produced the image.

In the ToothGrowth dataset, "dose" is numeric. I advise turning it into a factor, since it has only 3 values. To do this, you'll need to make your own copy of the dataset:

df <- ToothGrowth
df$dose <- factor(df$dose)
  1. For each type of supplement (OJ or VC), use strip plots to see how dosage affects tooth growth. To this end, use the "aes" function to map "len" to the x axis, "supp" to the y axis, and "dose" to color. Then use geom_jitter. You'll have to play around with the "height" and "width" arguments to get a clear picture. It will help to specify "shape=1" so that the points plot as open circles.

  2. Then use superimposed density plots.

  3. Then use qq plots to compare each adjacent pair of dosages (1 vs 0.5, and 2 vs 1). I couldn't find an elegant solution to this problem, so here are some inelegant hints.

First extract vectors for a single supplement and a pair of dosages:

vc05 <- with(df, len[supp=="VC" & dose=="0.5"])
vc1 <- with(df, len[supp=="VC" & dose=="1"])

Now we can use the trick discussed in the qqplots lecture, and also in tenorbassqq.r, on the examples page:

df2 <- as.data.frame(qqplot(vc05, vc1, plot.it=FALSE))

This constructs a data frame, in which variable x has the quantiles of one distribution and variable y has those of the other.

In the plot command, we want to make the X and Y ranges the same, so we need the range of both vectors taken together:

r <- range(c(vc05, vc1))

The plot command is now

ggplot(df2, aes(x,y)) +
    geom_abline(slope=1,intercept=0,color="blue") +
    coord_fixed(ratio=1, xlim=r, ylim=r) +
    geom_point()

You'll need to repeat this process for each comparison: once (as above) to compare dose 1 to 0.5 for the VC supplement, then again to compare 2 to 1, and then two more times for the OJ supplement.

This is vastly easier in Lattice. Here is the whole of step 3:

library(lattice)
qq(dose~len|supp, data=subset(ToothGrowth, dose==0.5 | dose==1.0), aspect=1)
qq(dose~len|supp, data=subset(ToothGrowth, dose==1 | dose==2), aspect=1)
  1. Write a paragraph or two describing how the dosage of vitamin C affects tooth growth. Does growth increase with dosage? Do the distributions change by additive shifts, or in some other pattern? Does it matter whether the vitamin C is provided as orange juice or as ascorbic acid?