Likelihood

Introduction

This project will introduce you to the method of maximum likelihood for testing statistical hypotheses. Before attempting it, you should read Just Enough Likelihood. If you are rusty on probability, you may also want to reread Just Enough Probability. I'll assume from here on that you know how likelihood is used both for estimating parameters and for testing hypotheses. But you still don't know how to implement these methods in R. There are three steps in the process.

First, you have to write an R function that represents the log likelihood. The goal is to find the parameter value(s) that maximize this function. You'll eventually hand this function to one of R's facilities for numerical maximization.

Unfortunately, none of these work well unless you can tell them where to start looking. In other words, you need to make an initial guess about the answer. The better this initial guess, the happier you are likely to be with the answer. Finding this initial guess is the second step in the process. Fortunately, this is easy when there is only one parameter: simply graph the function value against the parameter value and look for the maximum. For an example of this process, see the optimization example on the examples page of the class website.

Finally, hand the function and the initial guess to one of R's numerical optimizers. We'll use "optimize", because our problem has only a single parameter. Optimize returns a list with two components. One of these (maximum) gives the parameter value that maximizes our function. The other (objective) gives the corresponding function value. In our context, these will be the maximum-likelihood estimate and the corresponding log likelihood.

Before proceeding, work your way through the optimization example, and make sure you understand all the steps.

Exercise

We'll be working with a simulated dataset that you can upload here. Each observation refers to a newly-hatched chick, and there are two variables: "x" is the amount of food available to the chick, and "y" is a 0/1 variable indicating whether the chick lived or died. We'll assume that there is an S-shaped curve relating x to y. Ordinarily, we'd worry about how to model this relationship, but with simulated data this issue is not in doubt. Here is the function relating food to survival. The effect of food on survival depends on a parameter (a), which we wish to estimate.

# survival probability function.  On input, x is vector of food
# resource per chick, and "a" is a single number measuring the effect
# of food on survival.  Function returns vector of survival
# probabilities.
sp <- function(x,a) {
   (1 - exp(-a*x))^4
}

Write an R function that returns the log likelihood (the natural log) under this model. As you ponder this, it is best to begin with the likelihood of an individual chick. If that chick lived, its likelihood is sp(x,a), where x is the resource available to that chick, and "a" is the parameter value. If it died, its likelihood is 1-sp(x,a). The overall log likelihood is the sum of the log likelihoods of the individual chicks.
Explain why the overall log likelihood is the sum of the log likelihoods of the chicks. What is being assumed here? (Hint: the log of a product is a sum of logs. Thus, summing the log likelihoods is equivalent to forming a product of the likelihoods of the individual chicks.)
Plot log likelihood against "a" in order to make a rough guess about the value of "a" that maximized likelihood. You may have to do this several times, because your initial plot is likely to use an inappropriate range of x axis values.
Use "optimize" to get an accurate estimate of the optimal value of "a" and also the maximal log likelihood under this model.
Consider another model, which assumes that a=1.5. There is no optimization to be done under this model, because the value of "a" is specified by the model itself. All you have to do is calculate the log likelihood using your function.
Perform a likelihood ratio test of the hypothesis that a=1.5 against the original unconstrained hypothesis.