This tutorial will try and show one way of doing dynamical systems analysis. To show how it can be done we'll walk through finding set points for a few different variables, and how some of these variables interact with each other. The code below is all in r, but most of what will be shown is doable in any statistical program (like SPSS, Stata or SAS).
The first thing that needs to be done is get the data in the correct format, which means creating a lagged version of the variable we are interested in. The data below is contains information on testosterone (T) and estrogen (E) levels for individuals (ID) over a 13 day period (SPITDAY). Other information available includes sex drive (drive).
First read in the data and look at the first few lines:
# Read in csv text data
d01 <- read.csv("tutorial.csv")
# View first few records
head(d01, 3)
## SPITDAY T E drive ID
## 1 0 31 1.6 2.0 1
## 2 1 15 1.1 2.5 1
## 3 2 17 2.0 1.5 1
The point of dynamical systems models is to look at change over time so the data needs to be manipulated a bit. First we need to create lagged variables. Lagged means data from the previous time, so the sex hormone and sex drive at time 1 (drive t0 ) and the sex hormone and drive from the previous time (drive t-1 ) which will help us understand the change in sex hormones and drive from day to day. The general idea is that sex drive at time 1 (drive t0 ) should predict, or have a relationship to, the sex drive at time 2 (drive t1 ). By creating lagged variables we will be able to see what that relationship.
# Use a library of code with more fuctions: in this case 'slide' to create
# the lagged variable
library(DataCombine)
# Order the data by person ID and Day
attach(d01)
d01 <- d01[order(ID, SPITDAY), ]
# Create variable for each individual from the time directly preceding it
# (t-1)
d02 <- slide(d01, Var = "T", GroupVar = "ID", NewVar = "LeadT") # For Testosterone
d02 <- slide(d02, Var = "E", GroupVar = "ID", NewVar = "LeadE") # For Estrogen
d02 <- slide(d02, Var = "drive", GroupVar = "ID", NewVar = "LeadD") # For Drive
# Get rid of records at tn since they have null lead values
d02 <- na.omit(d02)
Now we have a small data set (d02
) with a lagged
variables. Now we can plot the sex hormone by the lagged data. If a
scatter plot is created with the testosterone plotted on the x-axis and
lagged testosterone on the y-axis we get a graph that looks like this.
# Add functionality for modifiable and pretty plots (graphs)
library(ggplot2)
# Create graph
# Locate the data, and determine the variables to put on a scatter plot
sp01 <- ggplot(d02, aes(T, LeadT))
# Add the type of graph (scatter plot == geom_point) and title
p1 <- sp01 + geom_point() + labs(title = "Testosterone Levels by Testosterone 1 day later")
# Display scatterplot
p1
If there is no change day by day the data points should line up on a perfect 45 degree line. We can add that 45 degree line and add a smoother line (a curved line that follows the data points) to see where the set points are.
# Take exisiting scatter plot and add a 45 degrees line (geom_abline) and a
# curved line representing a locally weighted average (stat_smooth)
p1 + geom_abline(intercept = 0, slope = 1) + stat_smooth()
Another way to look at this is to create a difference score, which will rotate the graph so no change is a horizontal line.
# Create a new variable by subtracting t-1 from t
d02$DiffScoreT <- (d02$LeadT - d02$T)
# Set up data for graph
sp02 <- ggplot(d02, aes(T, DiffScoreT))
# Graph data as scatter plot and add labels and lines
sp02 + geom_point() + labs(title = "Testosterone by Daily Difference", x = "Testosterone",
y = "Change in Testosterone") + geom_abline(intercept = 0, slope = 0) +
stat_smooth()
If the smoothed slope (the blue line) crosses the horizontal line it indicates that this is were there is no change from one day to the next, a set point. Set points come in two flavors when looking at a single variable like this, they are either “attractors” or “repellors”.
Attractors are stable points, and it is expected that values will converge to these set points over time. In other words the attractors are attractive and pull values close to them. Repellors are the opposite, they push things away, repelling values away over time.
Telling which set points are attractors and which are repellors is pretty easy, attractors have a negative slope, and repellors have a positive slope. Looking at the difference plot above there seems to be an attractive set point (attractor) at the 24. Near the 45 mark it looks like the line almost crosses the 0 line. If it did cross it would be a repellor.
# Create new variables by subtracting t-1 from t for sex drive (D) and
# estrogen (E)
d02$DiffScoreE <- (d02$LeadE - d02$E)
d02$DiffScoreD <- (d02$LeadD - d02$drive)
# Set up data for estrogen graph
sp03 <- ggplot(d02, aes(E, DiffScoreE))
# Graph data as scatter plot and add labels and lines
sp03 + geom_point() + labs(title = "Estrogen by Daily Difference", x = "Estrogen",
y = "Change in Estrogen") + geom_abline(intercept = 0, slope = 0) + stat_smooth()
# Set up data for sex drive graph
sp04 <- ggplot(d02, aes(drive, DiffScoreD))
# Graph data as scatter plot and add labels and lines
sp04 + geom_point() + labs(title = "Sex Drive by Daily Difference", x = "Sex Drive",
y = "Change in Drive") + geom_abline(intercept = 0, slope = 0) + stat_smooth()
In the two graphs above it is easy to find a attractors in both the estrogen and sex drive graphs.
The slope of the line not only tells us if a set point is an attractor or repellor, but it also can tell us how strong the attraction/repulsion is. The next step is to figure out exactly where these set points are and how strong they are.
We can use basic polynomial regression to figure out exactly where the set points are and how strong they are. In a regression analysis we could say that the day to day difference in sex hormones is equal to an intercept (b0) plus the sex hormone raised to a power of three (X3) plus a error term (e), or \[ \Delta x = \beta_0 + \beta_1x^3 + e \]. The reason we raise the variable to a power of 3 is because a cubic polynomial allows for a function that curves both up and down
In regression if there is a polynomial (X3) all lower order terms must also be included (X2 , X). So our equation turns into \[ \Delta x = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + e \]. In order to more accurately estimate the change in Estrogen we are going to create a new variable that approximates the rate of change, or velocity of change through time.
The code to do this in R
is below, setting the velocity of estrogen as the dependent variable which is being predicted by estrogen itself.
# Created Lead Variables (the opposite of a lag) and use the lead and lag
# variables to generate velocity
d02 <- slide(d02, Var = "E", GroupVar = "ID", NewVar = "LagE", slideBy = 1) # Create Lag for Estrogen
d02$velocityE <- (d02$LagE - d02$LeadE)/2
# Linear regression model with a cubic polynomial for estrogen (E) with
# velocity as the dependent variable
lm01 <- lm(velocityE ~ poly(E, 3), data = d02)
# Display results of the model
summary(lm01)
##
## Call:
## lm(formula = velocityE ~ poly(E, 3), data = d02)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.001 -0.475 0.176 0.733 7.803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.091 0.297 0.31 0.76
## poly(E, 3)1 -0.305 2.283 -0.13 0.89
## poly(E, 3)2 -2.246 2.313 -0.97 0.34
## poly(E, 3)3 -1.195 2.296 -0.52 0.60
##
## Residual standard error: 2.3 on 55 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.0222, Adjusted R-squared: -0.0311
## F-statistic: 0.417 on 3 and 55 DF, p-value: 0.741
# Find Set points
m <- polyroot(lm01$coefficients)
# show the set points as complex numbers
m
## [1] 0.14-0i -0.32+0i -1.70-0i
The output above gives us the beta coefficients for estrogen, estrogen2 and estrogen3.
The intercept gives the average change in estrogen. If we take the
coefficients and set the change score to 0 (0=0.09 + -0.31 * x + -2.25
* x2 -1.2 * x3) we can solve for x (estrogen) and get the set points (m
) [1].
The resulting values (set points) for x are 0.14, -0.32, -1.7. These
are the exact locations where there is no change over time. If you look
at these set points in the code above you'll notice there are actually
two numbers added together, the latter of which is multiplied by i
. The i
is an imaginary number and if there is a number multiplied by i
that means you have something going on that is more complicated than a nice stable attractor or repellor.
Now that we have the set points we need to check how strong they are. The strength of an attractor or repellor is called the Lyapunov exponent (or characteristic root). To find the Lyapunov exponent is to get the derivative just off of the set points. The code below will do this for us
# Find the mean of Estrogen
Ehat <- mean(d02$E)
# Calculate Derivative function and get slopes at setpoints / derivative
# function is li+2quad*x+3cub*x^2
lyap1 <- Ehat + 2 * Ehat * Re(m[1]) + 3 * Ehat * Re(m[1])^2
lyap2 <- Ehat + 2 * Ehat * Re(m[2]) + 3 * Ehat * Re(m[2])^2
lyap3 <- Ehat + 2 * Ehat * Re(m[3]) + 3 * Ehat * Re(m[3])^2
So the Lyapunov exponents are 8, 3.98 , and 37.55. The third set point (-1.7) has the strongest effect.
One of the fun graph that can be used to show all of this fun information is a vector flow field graph. For estrogen the vector flow is shown below
# Load library for graphing the vector field
library(pracma)
# Create a function using the regression model
f <- function(x, y) lm01$coefficients[1] + (lm01$coefficients[2] * x) + (lm01$coefficients[3] *
x^2) + (lm01$coefficients[4] * x^3)
# Find dimensions of the graph by finding the largest and smallest set
# point, and add 1
xx <- c(min(Re(m)) - 1, max(Re(m)) + 1)
yy <- c(min(Re(m)) - 1, max(Re(m)) + 1)
# Create the vector field plot
vectorfield(f, xx, yy, scale = 0.1, col = "steelblue")
# Make a title
title("Vector Flow Field for Estrogen")
If we do the same thing for testosterone and sex drive we get:
d02 <- slide(d02, Var = "T", GroupVar = "ID", NewVar = "LagT", slideBy = 1) # Create Lag for Testosterone
d02 <- slide(d02, Var = "drive", GroupVar = "ID", NewVar = "LagD", slideBy = 1) #Create Lag for Drive
d02$velocityD <- (d02$LagD - d02$LeadD)/2
d02$velocityT <- (d02$LagT - d02$LeadT)/2
# Linear regression model with a cubic polynomial
lm02 <- lm(DiffScoreT ~ poly(T, 3), data = na.omit(d02))
lm03 <- lm(velocityD ~ poly(drive, 3), data = d02)
# Display results of the model
summary(lm02)
##
## Call:
## lm(formula = DiffScoreT ~ poly(T, 3), data = na.omit(d02))
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.59 -7.67 -3.68 4.63 37.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.74 1.75 1.00 0.32262
## poly(T, 3)1 -55.12 13.41 -4.11 0.00013 ***
## poly(T, 3)2 -28.68 13.41 -2.14 0.03691 *
## poly(T, 3)3 -33.69 13.41 -2.51 0.01496 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13 on 55 degrees of freedom
## Multiple R-squared: 0.336, Adjusted R-squared: 0.299
## F-statistic: 9.26 on 3 and 55 DF, p-value: 4.7e-05
summary(lm03)
##
## Call:
## lm(formula = velocityD ~ poly(drive, 3), data = d02)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8488 -0.2712 -0.0062 0.2438 0.9419
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0174 0.0532 0.33 0.75
## poly(drive, 3)1 -0.0190 0.4365 -0.04 0.97
## poly(drive, 3)2 -0.0967 0.4477 -0.22 0.83
## poly(drive, 3)3 0.3284 0.4278 0.77 0.45
##
## Residual standard error: 0.41 on 55 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.0125, Adjusted R-squared: -0.0414
## F-statistic: 0.232 on 3 and 55 DF, p-value: 0.874
# Get set points
m2 <- polyroot(lm02$coefficients)
m3 <- polyroot(lm03$coefficients)
m2
## [1] 0.031+0.0i -0.441+1.2i -0.441-1.2i
m3
## [1] 0.32+0.24i -0.34-0.00i 0.32-0.24i
f2 <- function(x, y) lm02$coefficients[1] + (lm02$coefficients[2] * x) + (lm02$coefficients[3] *
x^2) + (lm02$coefficients[4] * x^3)
xx <- c(0, 0.5)
yy <- c(min(Re(m2)) - 1, max(Re(m2)) + 1)
vectorfield(f2, xx, yy, scale = 0.005, col = "steelblue")
title("Vector Flow Field for Testosterone")
f3 <- function(x, y) lm03$coefficients[1] + (lm03$coefficients[2] * x) + (lm03$coefficients[3] *
x^2) + (lm03$coefficients[4] * x^3)
xx <- c(min(Re(m3)) - 1, max(Re(m3)) + 1)
yy <- c(min(Re(m3)) - 1, max(Re(m3)) + 1)
vectorfield(f3, xx, yy, scale = 0.1, col = "steelblue")
title("Vector Flow Field for Sex Drive")
The set points of testosterone are and sex drive get complex [2][3]. There is one real answer in each, 0.03 for testosterone and -0.34 for sex drive. The two complex solutions are -0.4412+1.212i, -0.4412-1.212i and 0.3165+0.2371i, 0.3165-0.2371i repectively.
The complex set points indicates that there may be a spiral or periodic attractor/repellor. Spirals are another type of topological feature (along with attractors/repellors) but they only exist by increasing the dimensions of the analysis.
Hopefully we'll post a two dimesional tutorial soon.
[1]: R provides a nice function to find set points. If you are using another program (without this happy function) you can take the equation and plug it into wolfram alpha http://www.wolframalpha.com
[2]: literally, ha ha.
[3]: Boy I'm glad no one reads footnotes.