Introduction to R

This project is your introduction to R. We'll go over it in class, but you should try to have most of the work done before you get there. This will involve a reading assignment, and then the exercises below. If you plan to work on your own computer, you'll also need to download R from the official R website.

Readings

Before undertaking this project, please read chapters 1-3 and sections 8.1-8.3 of

Owen provides a gentle introduction for newbies. However, Owen's idea of a newbie is an undergrad major in math-stat. For this reason, he uses calculus in a few places. Don't let this worry you. There will be no calculus in this course. Just skim these passages if calculus is not your thing.

Although Owen's book is well-written, it can often be useful to consult an alternative. Here's another gentle introduction, also freely available:

If you already know another computer language, you might prefer:

This one is my favorite, but it's too terse for beginners. It is freely available on the R website in several formats. You can also buy a hard-copy version from any bookstore.

Another good introduction to R can be found in the first two chapters of Introductory Statistics with R, by Peter Dalgaard.

Finally, here's one that I don't know well, but which looks promising R in Action

Using R

To get into R, type "R" at shell prompt (linux or Mac), or click on the icon such as RStudio. To get out of R, type q() at the R command prompt, or just click the button to close the R GUI. Either way, you'll be asked whether you want to save your workspace. I always answer "no", because I quickly lose track of what's there when I allow R so save the workspace from session to session. I prefer to keep data and programs in plain text files, which I then read into R using the source command. That way, I start each session with a clean slate, and I always know what I'm doing. If I do want to save a copy of the commands I entered at the R prompt, I use the savehistory command. (You'll learn about these two commands in the readings below, or by typing ?source and ?savehistory at the R prompt.)

Text Editors

If you are going to write more than a few lines of code, it is important to save that work in a plain text file. For this purpose, you can use either a text editor or an integrated development environment (IDE).

I prefer to use a text editor, because this allows me to use the same tool for everything I write--not only code in various computer languages, but also letters and scientific papers. But many people prefer to use an IDE, which provides a convenient graphical interface. If this is your preference, consider R Studio. It is free, open source, and well regarded. I've never used it, however, so I can't help if you get stuck.

The other approach is to use a text editor, which is not the same as a word processor. Microsoft Word simply won't do. You need an editor that will make plain text files. You might, for example, consider using NotePad (on Windows) or TextEdit (on a Mac). This will work, but there are drawbacks. These editors are too dumb to help you much with programming. They will not highlight your code in any useful way, and they won't help you find errors. Feel free to use them, but consider using something more powerful.

There are lots of free editors designed for programming. Furthermore, R makes it easy to access one. In the R GUI, look for the Editor icon. Or at the R command line, type editor(). Either approach will bring up some default editor. The nature of this editor depends on the system you are using. For some systems--at least the Mac GUI--the default editor is pretty self explanatory. On other systems, you'll need to learn how to use it.

My favorite editors are emacs and vim, both of which are widely used by professional programmers. These take some time to learn but are worth the effort. If the default editor on your system presents you with a column of tilde characters along the left margin, then you're probably looking at vim. If you want to shop for a text editor, check out the list on Wikipedia.

In my own work, I use emacs with a package called "ESS" (for "Emacs Speaks Statistics"). Like an IDE, this allows you to edit code and run it from within a single interface. You'll see this package in operation during my lectures.

Exercise

  1. Write a paragraph explaining the distinction between the workspace and the search path. How does one add items to each? How does one remove items from each? Include a transcript of an R session that illustrates these concepts. Use comments to explain what is happening.

  2. Use R's help facility to explore the sample function. Explain what it does in your own words. Illustrate its use with several examples. Make sure to explain and illustrate the function of the replace argument.

  3. Explain each line of the following code:

    
    x <- rep(100, 10)
    x <- lapply(x, rnorm)
    x <- lapply(x, mean)
    x <- unlist(x)
    var(x)
    

    (Hint: first experiment with it at the R command prompt. Examine x after each step. Then look up the commands in Owen, Paradis, and/or Venables. Then use the online help system.)

  4. Write a function that estimates the variance of the sample mean of standard normal random variables. In other words, it should generate K data sets, each consisting of N numbers drawn at random from a normal distribution with mean zero and variance 1. It should calculate the mean of each of these data sets and then return the variance of the resulting means. Your function should have two arguments, N and K. (Hint: make sure you understand step 3.)

  5. Write another function that estimates the variance of the sample standard deviation of standard normal random variables. (The standard deviation is the square root of the variance. You calculate it with the "sd" function.) This function should be very similar to the other one, with the same arguments.

  6. Which is more variable, the same mean or the sample standard deviation? Support your conclusion with output from the two functions you wrote in steps 4 and 5. Use a large number--say K=10000--of replicates so that you get an accurate answer. Use a small number--say 10--of observations within each random sample, so that the variances are large enough to see.

What to turn in

Your lab report should be in the form of an essay, with a numbered section heading for each of steps 1-3 and 6. Please hand this in as hard copy in the class following the one in which we work on this assignment.

For steps 4-6, send email with an attachment containing the R code for the two functions and for the experiment you did in step 6. This attachment should be a plain text file, whose name ends with ".r" or ".R". I want to be able to run it.