4 February 2002

 

Principal Components Analysis (PCA)

 

Recall some relationships of random variables and coefficients (or weights):

 

Let , where a1 is a vector of coefficients.  The new vector, y1 (n x 1), is formed by a linear combination of the original data matrix X (: n x m).  The variance of the new variable y1 can be computed by:

 

,

 

where  is the covariance matrix of X.  Define a new variable: .  Then the covariance matrix of y1 and y2 is:

 

.

 

Let W be a matrix of coefficients, where each column of W contains a vector of coefficients, i.e.,  and let Y be a matrix of new variables, i.e., .

 

The covariance matrix of Y (  ) can be computed by:

 

.

 

Now we turn our attention to eigen values and eigen vectors of a matrix.  Recall from the previous lectures that a matrix (M) can be decomposed into an eigen vector matrix (U) and a diagonal eigen value matrix (D):

 

M = UDU-1

 

Applying this relationship to the covariance matrix of X:

 

,

 

where columns of U contain eigen vectors of  and the diagonal elements of D contains eigen values of .  Recall that  is a symmetric matrix.

 

On the side:

Some nice properties of symmetric matrices (proofs are omitted)  More details can be found in Schott (1997) and Rencher (1995). 

  1. If A is an m x m real symmetric matrix, the eigen values of A are real and corresponding eigen vectors are also real.
  2. If the m x m matrix A is symmetric, it is possible to construct a set of m eigen vectors of A such that the set is orthonormal. 
  3. If we let the m x m matrix X = (x1, x2, …, xm), where x1, x2, …, xm, are the orthonormal vectors described in the previous result, and D = diag(λ1, λ2, …, λm) is the diagonal matrix of eigen values, then the eigen value
     eigen vector equation
     can be expressed collectively as the matrix equation
    .  Because the columns of X are orthonormal vectors, X is an orthogonal matrix.  Premultiplication of this matrix equation by XT results in the relationship XTAX = D, or equivalently: A = XDXT, which is known as the spectral decomposition of A. 

 

Applying these results to the covariance matrix of X:

 

.

 

Premultiplying it by UT and postmultiplying it by U:

 

.

 

The covariance matrix of Y (new variable made out of a linear combination of the original X) is a diagonal matrix whose components are the eigen values of the covariance matrix of the original X.  Because  is a diagonal matrix, the new y vectors are uncorrelated.  In other words (with geometric interpretations of the projection), the projection of original data on new axes (called scores) is uncorrelated. 

 

Some details:

 

The sample covariance matrix of X can be found either by using a built-in function or by computing the following:

 

,

 

where  is the matrix of column means of X, i.e., the i-th column of  contains the i-th column mean of X, and n is the sample size (the number of rows in X).

 

There is an excellent explanation of PCA in Rencher (1995; Chapter 12).