The four lessons of R can be browsed from STA6329 home page Get started with R.
Lesson 1: Basic R: I/O, Array and matrix, conditioning and do loop
R is a free statistics software that is almost identical to Splus. It is available to many operating system including unix and microsoft window. It can be downloaded from the website http://cran.r-project.org/.
In the departmental UNIX system, simply type R, you will get some messages and then a prompt >. Everything starts from there. In the LINUX system, you go from Application => Other => R, you will get a R console and the prompt >. Everything starts from there.
Documents can be found by typing help(). You will get a document windwom. First try An Introduction to R.
Let us begin with the simplest input.> a<-3.14159 > aThe first line means I define (<- or =) a as 3.14159 and the second "a" means that I want output "a" on the screen. The screen will show
[1] 3.14159Or you can add descriptions by using cat():
> cat("The value of a is",a,".")
The value of a is 3.14159 .Instructions can be separated by ; or by a new line. To get out type q() after the prompt. Note that R is case sensitive.
You may define a vector by
>x <- c(1,2,3,4,5,6,7,8) >x[2]=3.5 >x[6]<- -9 >xYou will see
[1] 1.0 3.5 3.0 4.0 5.0 -9.0 7.0 8.0Matrix is input by array() or matrix(). Let the x be define as the string just defined. Then
> A <- array(x,dim=c(4,2)) (or matrix(x,4,2)) > A[,1] [,2] [1,] 1.0 5 [2.] 3.5 -9 [3,] 3.0 7 [4,] 4.0 8You can change the matrix values by redefining A, e.g., A[3,2]=-7. then the 7 at the position [3,2] will be changed to -7. Another example,> B = array(1:8,dim=c(2,4)) (or matrix(1:8,2,4)) > B[1,1]=0 > B[,1] [,2] [,3] [,4] [1,] 0 3 5 7 [2.] 2 4 6 8where 1:8 means from 1 to 8 with integer steps, 1,2,...,8. If 0:0 or 0 is used, then the whole matirx contains 0.Dimension can be higher than 2. For example, w<-array(0:0, dim<-c(3,4,2)) defines a 0 (3,4,2) array. Here is what happens in display.
> w<-array(0:0, dim<-c(3,4,2)) > w[1,2,1]=1;w[3,2,2]=2 > W, , 1 [,1] [,2] [,3] [,4] [1,] 0 1 0 0 [2,] 0 0 0 0 [3,] 0 0 0 0 , , 2 [,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 0 0 0 0 [3,] 0 2 0 0Remove Values
Note that once B is defined as a 2x4 matrix, you may not defined it as any other matrix unless you remove it first. Here is the remove operation.> rm(A,B,x)Now A, B and x are removed from the memory.Read data from files
Data can be read from a string (1-D) or a table (2-D). The following time series data is saved as econ.dat.
Economical data with seasonal and linear trend 1981-1992 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC 254 118 132 129 121 135 148 148 136 119 104 278 262 126 141 135 125 149 170 170 158 133 114 280 297 150 178 163 172 178 199 199 184 162 146 305 316 180 193 181 183 218 230 242 209 191 172 332 348 196 236 235 229 243 264 272 237 211 180 358 350 188 235 227 234 264 302 293 259 229 203 371 397 233 267 269 270 315 364 347 312 274 237 405 423 277 317 313 318 274 413 405 355 306 271 448 458 301 356 348 355 422 465 467 404 347 305 471 501 318 362 348 363 435 491 505 404 359 310 478It can be read as a string as follows.> x<-scan(file="econ.dat",skip=2) # Read data from econ.dat as x > x[14][1] 1126> x[120][1] 478Here "#" is the comment symbol for R. This data can also be transferred into a 2-D table, using matrix() and the matrix transpose function, t().> y<- matrix(x,12,10) > y[1,3][1] 297> yt <- t(y) > yt[3,4]
[1] 163Sometime a table may contain more than just numbers. Then it have to be read by special column instructions, "" for characters and 0 for numbers. Suppose we have the following table (saved as time.dat).
T X1 X2 X3 Data for time series 1 5.5 2.4 T 2 4.5 2.4 C 3 5.1 2.4 C 4 5.5 2.2 T 5 5.7 2.1 T 6 5.1 1.5 CThen it can be read by
> ex1<-scan("time.dat",list(0,0,0,""),skip=1)
> T<-ex1[[1]] # T is the first column of ex1.
> X1<-ex1[[2]]; X2<-ex1[[3]]; X3<-ex1[[4]]
> M <- cbind(T,X1,X2) # column binding function
> M
T X1 X2
[1,] 1 5.5 2.4
[2,] 2 4.5 2.4
[3,] 3 5.1 2.4
[4,] 4 5.5 2.2
[5,] 5 5.7 2.1
[6,] 6 5.1 1.5
> M[4,3]
[1] 2.2> MTM=t(M)%*%M > MTM
T X1 X2
T 91.0 110.90 42.70
X1 110.9 165.26 67.96
X2 42.7 67.96 28.78
where %*% is for matrix multiplication. To get rid of the label and make it a "pure" matrix, use:
> A<-matrix(M,6,3)
[,1] [,2] [,3]
[1,] 1 5.5 2.4
[2,] 2 4.5 2.4
[3,] 3 5.1 2.4
[4,] 4 5.5 2.2
[5,] 5 5.7 2.1
[6,] 6 5.1 1.5
So far, all the outputs are on the screen. To output results in a file, use sink(). For example,
> sink("result.out")
> MTM
There is no MTM output on the screen. It is in the file result.out. To return to screen output use >sink().
For more complex data files, see R Data Import/Export in the Document page.
The basic arithmetic operators and functions are:
+, -, *, /, ^(exponent), %/% (integer division); sqrt(), abs(),sin().cos(), tan(), asin(),acos(),atan(), exp(), log() (e based) log10()
For matrix operations:
Operations : multiplication: %*% ex: x %*% y
function :
rbind(,,), cbind(,,) (row, column concatenations)
svd(x):Singular value decomposition (p. 28)
t(x) : transpose of matrix x
diag(x) : diagonal of a matrix x
eigen(A) : eigenvalues of a summetric matrix A
The eigenvectors are given in eigen(A)$values (p. 27)
solve(x) : inverse of x matrix or X^{-1} (p.27)
solve(a, b) : solution x of the equations a %*% x = b
mean(x), median(x),range(x),var(x), quantile
var(x,y) = (covarinace between x and y)
cov(x,y) =(correlation between x and y)
Let us continue with the matrix M and M'M. First find singular value decomposition for M.
> svd.out <- svd(M)
> svd.out$d
[1] 16.477988 3.642783 0.496028
svd.out$u
[,1] [,2] [,3]
[1,] -0.3374511 0.67201955 0.63064903
[2,] -0.3238290 0.32929693 -0.62988892
[3,] -0.3853863 0.17479933 -0.38821643
[4,] -0.4337413 -0.02406623 0.01664167
[5,] -0.4746014 -0.23604000 0.05824250
[6,] -0.4683598 -0.59422587 0.22614290
svd.out$v
[,1] [,2] [,3]
[1,] -0.5497879 -0.8199249 -0.1595506
[2,] -0.7742524 0.4285394 0.4657115
[3,] -0.3134747 0.3795750 -0.8704346
The output means u'Mv=d, or M=udv'. Note the SVD(M) is transferred
to svd.out which contains three components d, u and v. The way to
retrieve them is to use svd.out$u, etc. Similarly,
> eigen.MTM<-eigen(MTM)
> eigen.MTM
$values
[1] 271.5240859 13.2698704 0.2460437
$vectors
[,1] [,2] [,3]
[1,] 0.5497879 0.8199249 0.1595506
[2,] 0.7742524 -0.4285394 -0.4657115
[3,] 0.3134747 -0.3795750 0.8704346
Note that this eigenvectors are the same as the svd.out$v and the eigenvalues
are the squares of the svd.out$d. Can you prove this as a general theorem?
The missing data in R have to be denoted as NA. For example.
> MISSN=c(1,2,3,NA,5,6,7,8,9,10,11,12)
> MISS<-matrix(MISSN,3,4)
> MISS
[,1] [,2] [,3] [,4]
[1,] 1 NA 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> NEW<-M%*%MISS
> NEW
[,1] [,2] [,3] [,4]
[1,] 19.2 NA 72.6 99.3
[2,] 18.2 NA 71.6 98.3
[3,] 20.4 NA 83.4 114.9
[4,] 21.6 NA 91.8 126.9
[5,] 22.7 NA 99.5 137.9
[6,] 20.7 NA 96.3 134.1
The conditioning and do loop are written as follows (p. 46): (Some commonly used symbols: =, > >=, <, <=, &&(and), || (or).)
> xx<- c(1:2); > if (M[2,3]>0 && M[1,2]<0) xx[1]=-1 else xx[1]=0; # && is for "and" > if (M[2,3]>0 || M[1,2]<0) xx[2]=-1 else xx[2]=0; # || for "or" > xx
[1] 0 -1
For Do loops: (for, while):
> w<-matrix(0,3,3)
> for(i in 1:3){for(j in 1:2){w[i,j]=1}}
>w
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 1 1 0
[3,] 1 1 0
> k=0
> while(k<=1) {k=k+1; w[1,k]=5;w[k,3]=-5}
> w
[,1] [,2] [,3]
[1,] 5 5 -5
[2,] 1 1 -5
[3,] 1 1 0
Remark: R (or Splus) are not good for debugging.
The error message is usually very vague such as
ERROR: syntax error.
One common error is the case. If you write IF(...), it is an syntax error. It should be if(...).