The next project set involves data that Elizabeth Cashdan collected several years ago in a study involving hormones and behavior. We'll study just the data on hormones.
The data are in a file called hormall.dat
, which you will find on
the data page of the class web site. You'll need a username and
password, which I'll provide in class. Each record in these data
refers to an individual woman. The data include levels of several
hormones, which are detailed in the header of the data file itself.
There are a number of outliers in the data, so we will often need
robust techniques to make sense of them. These are described by
Cleveland and illustrated on the course examples page.
At the outset, you may want to exclude such variables as "id", and "year". I used this command to reduce the number of variables:
h <- subset(h, select = c("t", "a", "dheas", "freet", "e", "cort"))
Search for relationships among the various hormones, two at a
time. Use robust methods and transforms wherever these are
appropriate. You may find it helpful to manipulate xlim
and/or
ylim
in order to exclude outliers. In your lab report, do not
include graphs for the hormone pairs that show no relationship. Just
say that you found no relationship.
Use slicing to see how these relationships change when you condition on the level of testoterone. You'll find an example of this in the "Conditional Scatterplots" graph on the examples page.
Don't view these exercises as arbitrary computing tasks. Your goal is to understand these data and then to tell me what you have learned. Don't include anything in your lab report that doesn't serve these goals.
As before, the lab report should consist of R code with text incorporated as comments. The text is at least as important as the code. Include the graphs you need to support the points you make in the text, and include only as much R code as you need to produce those graphs.