Wednesday, April 11, 2007

How to use "identify" in R.

I like R better than S-PLUS. Not because R is free, I can get free student version for one year, but the number of packages and functions from the open source communities and R user groups. However the one function that I miss the most from S-Plus is brush. The closest function in R is identify. It's not brush but get the job done. Here is how to use identify function in R. Suppose you want to find out the index of outliers in a linear regression

# creating example data > x <- c(1:10) > y <- 2 * x + 5 # two outliers at 2 and 5 > y[2] <- 0 > y[5] <- 20
Now making plot
# saving linear model variables to ll > ll <- lm(y~x) > plot(x, y, main="example: y ~ x") > abline(ll) # adding regression line to plot
This will plot data points and the linear regression line. Now you want to find the index of outliers in the plot, then you need to use identify function. Unlike the most functions in R, it needs plot windows opened and input from the mouse. Once you call the function, just left-click on the point that you want to identify and rick-click on the window when your done. It will ask you whether to continue or stop.
> idx <- identify(x,y)
If you call identify with parameter n, for example n=3, the function will automatically stop after you click on three points. This is not brush and spin in S-PLUS, but the closest function that I can find in R. Note: Previously posted on my old WordPress blog.

No comments: