Multivariate Analyses
1 March 2002
Canonical Correlation Analysis
In Principal Component Analysis and Discriminant Analysis, we had one group of data with multiple variables. If we want to look at relationships between two groups of data, both of which contained multiple variables, we use a canonical correlation analysis.
Canonical correlation analysis is concerned with the amount of linear relationship between two sets of variables (Rencher 1995). Canonical correlation analysis provides a measure of overall correlation between two sets of variables.
Nuts and bolts of CCA (See Rencher 1995 for more details):
(All the following math can be done with the covariance matrices.)
We have two sets of variables: m1 and m2. Let m1 ≥ m2. Then the dimension of the data matrix is n x (m1 + m2). We now partition the data matrix into the following:
,
where X1: n x m1 and X2: n x m2. We then compute the sample correlation
matrix of this data matrix:
.
Notice that this correlation matrix is just a large matrix but can be partitioned in this way. You need to know where to draw the lines. In this example, C11 is m1 x m1, C12 is m1 x m2, C21 is m2 x m1, and C22 is m2 x m2.
We then calculate the measure of association between the two sets:
and
. The positive square roots of the eigenvalues
(λ) of M1 and M2 are called
canonical correlations. Note:
Eigenvalues of M1 and M2 are the same but eigenvectors
are not the same. The largest
eigenvalue is the best overall measure of association between a linear
combination of the first set and a linear combination of the second set. Eigenvectors of M’s provide the coefficients
(or weighting) for these linear relationships.
Other eigenvalues provide measures of supplemental dimensions of linear
relationship between the two sets.
Two Properties of Canonical Correlations (Rencher 1996)
To conduct a hypothesis testing on a canonical correlation analysis, we may use non-parametric bootstrap.