4 February 2002
Principal Components Analysis (PCA)
Recall some relationships of random variables and coefficients (or weights):
Let ,
where a1 is a vector of coefficients. The new vector, y1 (n x
1), is formed by a linear combination of the original data matrix X (: n x
m). The variance of the new variable y1
can be computed by:
,
where is the covariance matrix of X. Define a new variable:
. Then the covariance matrix of y1
and y2 is:
.
Let W be a matrix of coefficients, where each column of W
contains a vector of coefficients, i.e., and let Y be a matrix of new variables, i.e.,
.
The covariance matrix of Y ( ) can be computed by:
.
Now we turn our attention to eigen values and eigen vectors of a matrix. Recall from the previous lectures that a matrix (M) can be decomposed into an eigen vector matrix (U) and a diagonal eigen value matrix (D):
M = UDU-1
Applying this relationship to the covariance matrix of X:
,
where columns of U contain eigen vectors of and the diagonal elements of D contains eigen
values of
. Recall that
is a symmetric matrix.
On the side:
Some nice properties of symmetric matrices (proofs are omitted) More details can be found in Schott (1997) and Rencher (1995).
Applying these results to the covariance matrix of X:
.
Premultiplying it by UT and postmultiplying it by U:
.
The covariance matrix of Y (new variable made out of a
linear combination of the original X) is a diagonal matrix whose components are
the eigen values of the covariance matrix of the original X. Because is a diagonal matrix, the new y vectors are
uncorrelated. In other words (with
geometric interpretations of the projection), the projection of original data
on new axes (called scores) is uncorrelated.
Some details:
The sample covariance matrix of X can be found either by using a built-in function or by computing the following:
,
where is the matrix of column means of X, i.e., the
i-th column of
contains the i-th column mean of X, and n is
the sample size (the number of rows in X).
There is an excellent explanation of PCA in Rencher (1995; Chapter 12).