# Questions on Interpreting Factor Analysis Results and Scores

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am trying to learn factor analysis and I thought it would be a good idea to try and very poorly "mimic" the computation for IQ scores with a dataset of dummy values as a way to "learn by example".

To start off, this is what I intend to do, and I don't know if this methodology is correct or not: I have the loadings for that factor determined. Now that I have the loadings, I want to generate a score for each of the samples. That will leave me with a population of scores that I can then standardize around a mean of 100. From there I would plot a normal distribution. Whenever I get a new sample, I can then generate a score for it and see where it falls on the distribution.

To get my results, I am using Python's Sklearn library, specifically the`FactorAnalysis`class. I noticed that the`FactorAnalysis`class has a`score_samples()`method. The output score for each sample is the log-likelihood of the sample.

Here are some of the questions I have:

• Is my approach in generating a distribution based on the samples' factor scores flawed? How do they do it in practice?

• Is the log-likelihood of a sample even an appropriate score to use? (If not, what alternative ways are there to score a sample?)

• I have gone ahead and generated the scores using the`score_samples()`method for all the samples, but they range between -4 and -49. Is there a reason they would be negative?

• If you are only looking for 1 latent factor, is it good practice to set the number of factors to 1 or should you leave it unspecified anyways?

Here are the loadings if I leave set the number of factors to 1:

``Factor 1 variable 1 0.082558 variable 2 0.107940 variable 3 0.199645 variable 4 0.612495 variable 5 0.623707``

Here are the loadings if I do not specify the number of factors:

``Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 variable 1 0.263914 0.426346 -0.012893 -0.0 0.0 variable 2 0.297078 0.415269 -0.002193 0.0 -0.0 variable 3 0.243590 -0.005131 0.085178 -0.0 -0.0 variable 4 0.487537 -0.224135 -0.019501 -0.0 -0.0 variable 5 0.484462 -0.248173 -0.008902 0.0 0.0``

Is my approach in generating a distribution based on the samples' factor scores flawed? How do they do it in practice?

I found this somewhat difficult to follow. But in general, you should be able to approximate a set of test scores using a multivariate normal distribution where the covariance matrix implies positive correlations between all tests. Some might be larger and some smaller, but the idea is that all ability tests are correlated. And general mental ability can be estimated as the first unrotated factor that results from such tests.

Is the log-likelihood of a sample even an appropriate score to use? (If not, what alternative ways are there to score a sample?)

This sounds more like how you evaluate a model. E.g., how you evaluate a factor analytic solutions. In general, factor saved scores will be a weighted composite of the scores on the component tests.

In R, you can use`factanal`

``factanal(x, factors, data = NULL, covmat = NULL, n.obs = NA, subset, na.action, start = NULL, scores = c("none", "regression", "Bartlett"), rotation = "varimax", control = NULL,… )``

See the`scores`argument. There are a few different methods.

I have gone ahead and generated the scores using the score_samples() method for all the samples, but they range between -4 and -49. Is there a reason they would be negative?

I don't know Python. But in general, factor saved scores are typically quantified in such a way that they are z-scores (e.g., mean = 0, sd = 1).

If you are only looking for 1 latent factor, is it good practice to set the number of factors to 1 or should you leave it unspecified anyways?

You need to either extract only one factor or ensure that you apply no rotation to the extract factors. Without a rotation, the first factor will be equivalent to just one factor. If you rotate, variation will be partitioned across the extracted factors.

## Communalities

The next item from the output is a table of communalities which shows how much of the variance (i.e. the communality value which should be more than 0.5 to be considered for further analysis. Else these variables are to be removed from further steps factor analysis) in the variables has been accounted for by the extracted factors. For instance over

90% of the variance in “Quality of product” is accounted for, while 73.5% of the variance in “Availability of product” is accounted for (Table 4).

## How To Calculate an Index Score from a Factor Analysis

One common reason for running Principal Component Analysis (PCA) or Factor Analysis (FA) is variable reduction.

In other words, you may start with a 10-item scale meant to measure something like Anxiety, which is difficult to accurately measure with a single question.

You could use all 10 items as individual variables in an analysis–perhaps as predictors in a regression model.

But you’d end up with a mess.

Not only would you have trouble interpreting all those coefficients, but you’re likely to have multicollinearity problems.

And most importantly, you’re not interested in the effect of each of those individual 10 items on your outcome. You’re interested in the effect of Anxiety as a whole.

So we turn to a variable reduction technique like FA or PCA to turn 10 related variables into one that represents the construct of Anxiety.

FA and PCA have different theoretical underpinnings and assumptions and are used in different situations, but the processes are very similar. We’ll use FA here for this example.

So let’s say you have successfully come up with a good factor analytic solution, and have found that indeed, these 10 items all represent a single factor that can be interpreted as Anxiety. There are two similar, but theoretically distinct ways to combine these 10 items into a single index.

### Factor Scores

Part of the Factor Analysis output is a table of factor loadings. Each item’s loading represents how strongly that item is associated with the underlying factor.

Some loadings will be so low that we would consider that item unassociated with the factor and we wouldn’t want to include it in the index.

But even among items with reasonably high loadings, the loadings can vary quite a bit. If those loadings are very different from each other, you’d want the index to reflect that each item has an unequal association with the factor.

One approach to combining items is to calculate an index variable via an optimally-weighted linear combination of the items, called the Factor Scores. Each item’s weight is derived from its factor loading. So each item’s contribution to the factor score depends on how strongly it relates to the factor.

Factor scores are essentially a weighted sum of the items. Because those weights are all between -1 and 1, the scale of the factor scores will be very different from a pure sum. I find it helpful to think of factor scores as standardized weighted averages.

### Factor-Based Scores

The second, simpler approach is to calculate the linear combination ignoring weights. Either a sum or an average works, though averages have the advantage as being on the same scale as the items.

In this approach, you’re running the Factor Analysis simply to determine which items load on each factor, then combining the items for each factor.

The technical name for this new variable is a factor-based score.

Factor based scores only make sense in situations where the loadings are all similar. In that case, the weights wouldn’t have done much anyway.

### Which Scores to Use?

It’s never wrong to use Factor Scores. If the factor loadings are very different, they’re a better representation of the factor. And all software will save and add them to your data set quickly and easily.

There are two advantages of Factor-Based Scores. First, they’re generally more intuitive. A non-research audience can easily understand an average of items better than a standardized optimally-weighted linear combination.

Second, you don’t have to worry about weights differing across samples. Factor loadings should be similar in different samples, but they won’t be identical. This will affect the actual factor scores, but won’t affect factor-based scores.

But before you use factor-based scores, make sure that the loadings really are similar. Otherwise you can be misrepresenting your factor.

## Total variance explained

Eigenvalue actually reflects the number of extracted factors whose sum should be equal to number of items which are subjected to factor analysis. The next item shows all the factors extractable from the analysis along with their eigenvalues.

The Eigenvalue table has been divided into three sub-sections, i.e. Initial Eigen Values, Extracted Sums of Squared Loadings and Rotation of Sums of Squared Loadings. For analysis and interpretation purpose we are only concerned with Extracted Sums of Squared Loadings. Here one should note that Notice that the first factor accounts for 46.367% of the variance, the second 18.471% and the third 17.013%. All the remaining factors are not significant (Table 5).

1. Component: As can be seen in the Communalities table 3 above, there 8 components shown in column 1 under table 3.
2. Initial Eigenvalues Total: Total variance.
3. Initial Eigenvalues % of variance: The percent of variance attributable to each factor.
4. Initial Eigenvalues Cumulative %: Cumulative variance of the factor when added to the previous factors.
6. Extraction Sums of Squared Loadings % of variance: The percent of variance attributable to each factor after extraction. This value is of significance to us and therefore we determine in this step that they are three factors which contribute towards why would someone by a particular product.
7. Extraction Sums of Squared Cumulative %: Cumulative variance of the factor when added to the previous factors after extraction.
8. Rotation of Sums of Squared Loadings Total: Total variance after rotation.
9. Rotation of Sums of Squared Loadings % of variance: The percent of variance attributable to each factor after rotation.
10. Rotation of Sums of Squared Loadings Cumulative %: Cumulative variance of the factor when added to the previous factors.

## Factor Analysis: A Short Introduction, Part 2–Rotations

An important feature of factor analysis is that the axes of the factors can be rotated within the multidimensional variable space. What does that mean?

Here is, in simple terms, what a factor analysis program does while determining the best fit between the variables and the latent factors: Imagine you have 10 variables that go into a factor analysis.

The program looks first for the strongest correlations between variables and the latent factor, and makes that Factor 1. Visually, one can think of it as an axis (Axis 1).

The factor analysis program then looks for the second set of correlations and calls it Factor 2, and so on.

Sometimes, the initial solution results in strong correlations of a variable with several factors or in a variable that has no strong correlations with any of the factors.

In order to make the location of the axes fit the actual data points better, the program can rotate the axes. Ideally, the rotation will make the factors more easily interpretable.

Here is a visual of what happens during a rotation when you only have two dimensions (x- and y-axis):

The original x- and y-axes are in black. During the rotation, the axes move to a position that encompasses the actual data points better overall.

Programs offer many different types of rotations. An important difference between them is that they can create factors that are correlated or uncorrelated with each other.

Rotations that allow for correlation are called oblique rotations rotations that assume the factors are not correlated are called orthogonal rotations. Our graph shows an orthogonal rotation.

Once again, let’s explore indicators of wealth.

Let’s imagine the orthogonal rotation did not work out as well as previously shown. Instead, we get this result:

 Variables Factor 1 Factor 2 Income 0.63 0.14 Education 0.47 0.24 Occupation 0.45 0.22 House value 0.39 0.25 Number of public parks in neighborhood 0.12 0.20 Number of violent crimes per year 0.21 0.18

Since our first attempt was an orthogonal rotation, we specified that Factor 1 and 2 are not correlated.

But it makes sense to assume that a person with a high “Individual socioeconomic status” (Factor 1) lives also in an area that has a high “Neighborhood socioeconomic status” (Factor 2). That means the factors should be correlated.

Consequently, the two axes of the two factors are probably closer together than an orthogonal rotation can make them. Here is a display of the oblique rotation of the axes for our new example, in which the factors are correlated with each other:

Clearly, the angle between the two factors is now smaller than 90 degrees, meaning the factors are now correlated. In this example, an oblique rotation accommodates the data better than an orthogonal rotation.

## Factor Analysis

Factor analysis is a multivariate technique designed to analyze correlations among many observed variables and to explore latent factors. This chapter provides an overview of the evolution of factor analysis since the early 20 th century and a review of applied research in various fields. Today, factor analysis is widely used not only in the field of psychology but also in fields such as politics, literature, biology, and medical science. For example, in anthropology, morphological knowledge has been obtained through the factor analysis of correlations among the measured traits of human bones and the factor analysis of measured traits of animals and plants. The chapter introduces factor analysis model and deals with statistical inference in factor analysis. Formulae for the standard errors of parameter estimates in factor analysis are complicated or may not be expressed in closed forms. One of the advantages of the bootstrap methods is that they can be used without analytical derivations. However, caution is needed to use the bootstrap methods in factor analysis. The chapter also covers the various methods of factor rotation and estimation of factor scores.

## Integrating Personality/Character Neuroscience with Network Analysis

### 3.1.1 Factor Analysis

Factor analysis conceptualizes the structure of associations in terms of latent variables or “factors” that give rise to observed, manifested, or measured variables. Factor analysis (and the closely-related principal components analysis) accomplishes this by identifying sets of observed variables that have more in common with each other than with other observed variables in the analysis. Factor analysis begins with a correlation matrix of bivariate associations among observed variables. Conceptually, factor analysis scans the matrix to identify which observed variables go together. It searches for clusters of observed variables that are strongly correlated with each other and that are weakly correlated with observed variables in other clusters. More technically, it extracts factors that account for as much variation in the observed variables as possible.

Exploratory factor analysis can be seen as steps that are often conducted in an iterative, back-and-forth manner: extraction, selection of a number of factors, rotation, and examination of factor loadings and (potentially) factor correlations. 79 The first step involves applying an “extraction method” that identifies combinations of observed variables, and these combinations are called factors. There are several types of extraction methods, but principal axis factor analysis and principal components analysis are the most frequently used. Extraction produces one eigenvalue for each potential factor, with as many potential factors as there are observed variables. A factor’s eigenvalue can be seen as the amount of variance in the observed variables explained by the factor.

In the second step, researchers decide on the number of factors that adequately summarize the relationships between the original variables. The “appropriate” number of factors can be ambiguous, but there are rules-of-thumb to aid in the process. 80 The rules-of-thumb generally depend on the relative magnitudes of the eigenvalues, but information from subsequent steps can be used to inform the decision (e.g., clarity of the factor loadings, see step 4).

In the third step, researchers usually use a “rotation” to clarify the psychological meaning of the factors. Rotation is intended to produce simple structure, a pattern of associations in which each observed variable associates strongly with (i.e., “loads on”) one factor and only one factor. There are two general types of rotation: orthogonal rotation generates factors that are uncorrelated, and oblique rotation generates factors that can be correlated with each other.

Fourth, researchers draw psychological conclusions based on key statistical outcomes, primarily factor loadings and (if relevant) interfactor correlations. Factor loadings are values representing associations between each observed variable and each factor. By noting which observed variables are most strongly associated with each factor, researchers can interpret the psychological meaning of the factors. There are several types of factor loadings that might be produced, but they are all roughly or literally on a correlational metric of −1 to +1, with values closer to −1 or +1 representing strong associations, and values close to 0 indicating no connection between an observed variable and a factor. Interfactor correlations are obtained when researchers extract more than one factor and implement an oblique rotation, and they reveal the degree to which the dimensions underlying the observed variables are themselves associated with each other.

## I. Exploratory Factor Analysis (EFA)

• Introduction
1. Motivating example: The SAQ
2. Pearson correlation formula
3. Partitioning the variance in factor analysis
• Extracting factors
1. principal components analysis
2. common factor analysis
• principal axis factoring
• maximum likelihood
1. Simple Structure
2. Orthogonal rotation (Varimax)
3. Oblique (Direct Oblimin)

## Factor Analysis

Factor analysis includes both component analysis and common factor analysis. More than other statistical techniques, factor analysis has suffered from confusion concerning its very purpose. This affects my presentation in two ways. First, I devote a long section to describing what factor analysis does before examining in later sections how it does it. Second, I have decided to reverse the usual order of presentation. Component analysis is simpler, and most discussions present it first. However, I believe common factor analysis comes closer to solving the problems most researchers actually want to solve. Thus learning component analysis first may actually interfere with understanding what those problems are. Therefore component analysis is introduced only quite late in this chapter.

## What Factor Analysis Can and Can't Do

### Some Examples of Factor-Analysis Problems

It was an interesting idea, but it turned out to be wrong. Today the College Board testing service operates a system based on the idea that there are at least three important factors of mental ability--verbal, mathematical, and logical abilities--and most psychologists agree that many other factors could be identified as well.

2. Consider various measures of the activity of the autonomic nervous system--heart rate, blood pressure, etc. Psychologists have wanted to know whether, except for random fluctuation, all those measures move up and down together--the "activation" hypothesis. Or do groups of autonomic measures move up and down together, but separate from other groups? Or are all the measures largely independent? An unpublished analysis of mine found that in one data set, at any rate, the data fitted the activation hypothesis quite well.

3. Suppose many species of animal (rats, mice, birds, frogs, etc.) are trained that food will appear at a certain spot whenever a noise--any kind of noise--comes from that spot. You could then tell whether they could detect a particular sound by seeing whether they turn in that direction when the sound appears. Then if you studied many sounds and many species, you might want to know on how many different dimensions of hearing acuity the species vary. One hypothesis would be that they vary on just three dimensions--the ability to detect high-frequency sounds, ability to detect low-frequency sounds, and ability to detect intermediate sounds. On the other hand, species might differ in their auditory capabilities on more than just these three dimensions. For instance, some species might be better at detecting sharp click-like sounds while others are better at detecting continuous hiss-like sounds.

4. Suppose each of 500 people, who are all familiar with different kinds of automobiles, rates each of 20 automobile models on the question, "How much would you like to own that kind of automobile?" We could usefully ask about the number of dimensions on which the ratings differ. A one-factor theory would posit that people simply give the highest ratings to the most expensive models. A two-factor theory would posit that some people are most attracted to sporty models while others are most attracted to luxurious models. Three-factor and four-factor theories might add safety and reliability. Or instead of automobiles you might choose to study attitudes concerning foods, political policies, political candidates, or many other kinds of objects.

5. Rubenstein (1986) studied the nature of curiosity by analyzing the agreements of junior-high-school students with a large battery of statements such as "I like to figure out how machinery works" or "I like to try new kinds of food." A factor analysis identified seven factors: three measuring enjoyment of problem-solving, learning, and reading three measuring interests in natural sciences, art and music, and new experiences in general and one indicating a relatively low interest in money.

### The Goal: Understanding of Causes

1. How many different factors are needed to explain the pattern of relationships among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?

### Absolute Versus Heuristic Uses of Factor Analysis

The previous examples can be used to illustrate a useful distinction--between absolute and heuristic uses of factor analysis. Spearman's g theory of intelligence, and the activation theory of autonomic functioning, can be thought of as absolute theories which are or were hypothesized to give complete descriptions of the pattern of relationships among variables. On the other hand, Rubenstein never claimed that her list of the seven major factors of curiosity offered a complete description of curiosity. Rather those factors merely appear to be the most important seven factors--the best way of summarizing a body of data. Factor analysis can suggest either absolute or heuristic models the distinction is in how you interpret the output.

### Is Factor Analysis Objective?

A similar balancing problem arises in regression and analysis of variance, but it generally doesn't prevent different workers from reaching nearly or exactly the same conclusions. After all, if two workers apply an analysis of variance to the same data, and both workers drop out the terms not significant at the .05 level, then both will report exactly the same effects. However, the situation in factor analysis is very different. For reasons explained later, there is no significance test in component analysis that will test a hypothesis about the number of factors, as that hypothesis is ordinarily understood. In common factor analysis there is such a test, but its usefulness is limited by the fact that it frequently yields more factors than can be satisfactorily interpreted. Thus a worker who wants to report only interpretable factors is still left without an objective test.

A similar issue arises in identifying the nature of the factors. Two workers may each identify 6 factors, but the two sets of factors may differ--perhaps substantially. The travel-writer analogy is useful here too two writers might each divide the US into 6 regions, but define the regions very differently.

Another geographical analogy may be more parallel to factor analysis, since it involves computer programs designed to maximize some quantifiable objective. Computer programs are sometimes used to divide a state into congressional districts which are geographically continguous, nearly equal in population, and perhaps homogeneous on dimensions of ethnicity or other factors. Two different district-creating programs might come up with very different answers, though both answers are reasonable. This analogy is in a sense too good we believe that factor analysis programs usually don't yield answers as different from each other as district-creating programs do.

### Factor Analysis Versus Clustering and Multidimensional Scaling

Another advantage of factor analysis over these other methods is that factor analysis can recognize certain properties of correlations. For instance, if variables A and B each correlate .7 with variable C, and correlate .49 with each other, factor analysis can recognize that A and B correlate zero when C is held constant because .7 2 = .49. Multidimensional scaling and cluster analysis have no ability to recognize such relationships, since the correlations are treated merely as generic "similarity measures" rather than as correlations.

We are not saying these other methods should never be applied to correlation matrices sometimes they yield insights not available through factor analysis. But they have definitely not made factor analysis obsolete. The next section touches on this point.

### Factors "Differentiating" Variables Versus Factors "Underlying" Variables

One possible meaning of the phrase about "differentiating" is that a set of variables all correlate highly with each other but differ in their means. A rather similar meaning can arise in a different case. Consider several tests A, B, C, D which test the same broadly-conceived mental ability, but which increase in difficulty in the order listed. Then the highest correlations among the tests may be between adjacent items in this list (rAB, rBC and rCD) while the lowest correlation is between items at the opposite ends of the list (rAD). Someone who observed this pattern in the correlations among the items might well say the tests "can be put in a simple order" or "differ in just one factor", but that conclusion has nothing to do with factor analysis. This set of tests would not contain just one common factor.

A third case of this sort may arise if variable A affects B, which affects C, which affects D, and those are the only effects linking these variables. Once again, the highest correlations would be rAB, rBC and rCD while the lowest correlation would be rAD. Someone might use the same phrases just quoted to describe this pattern of correlations again it has nothing to do with factor analysis.

• Are you above 5 feet 2 inches in height?
• Are you above 5 feet 4 inches in height?
• Are you above 5 feet 6 inches in height?
• Etc.
• Should our nation lower tariff barriers with nation B?
• Should our two central banks issue a single currency?
• Should our armies become one?
• Should we fuse with nation B, becoming one nation?

Applying multidimensional scaling to a correlation matrix could discover all these simple patterns of differences among variables. Thus multidimensional scaling seeks factors which differentiate variables while factor analysis looks for the factors which underlie the variables. Scaling may sometimes find simplicity where factor analysis finds none, and factor analysis may find simplicity where scaling finds none.

## Basic Concepts and Principles

### A Simple Example

Imagine that these are correlations among 5 variables measuring mental ability. Matrix R55 is exactly consistent with the hypothesis of a single common factor g whose correlations with the 5 observed variables are respectively .9, .8, .7, .6, and .5. To see why, consider the formula for the partial correlation between two variables a and b partialing out a third variable g:

This formula shows that rab.g = 0 if and only if rab = rag rbg. The requisite property for a variable to function as a general factor g is that any partial correlation between any two observed variables, partialing out g, is zero. Therefore if a correlation matrix can be explained by a general factor g, it will be true that there is some set of correlations of the observed variables with g, such that the product of any two of those correlations equals the correlation between the two observed variables. But matrix R55 has exactly that property. That is, any off-diagonal entry rjk is the product of the jth and kth entries in the row .9 .8 .7 .6 .5. For instance, the entry in row 1 and column 3 is .9 x .7 or .63. Thus matrix R55 exactly fits the hypothesis of a single common factor.

If we found that pattern in a real correlation matrix, what exactly would we have shown? First, the existence of the factor is inferred rather than observed. We certainly wouldn't have proven that scores on these 5 variables are affected by just one common factor. However, that is the simplest or most parsimonious hypothesis that fits the pattern of observed correlations.

Second, we would have an estimate of the factor's correlation with each of the observed variables, so we can say something about the factor's nature, at least in the sense of what it correlates highly with or doesn't correlate with. In this example the values .9 .8 .7 .6 .5 are these estimated correlations.

Third, we couldn't measure the factor in the sense of deriving each person's exact score on the factor. But we can if we wish use methods of multiple regression to estimate each person's score on the factor from their scores on the observed variables.

Matrix R55 is virtually the simplest possible example of common factor analysis, because the observed correlations are perfectly consistent with the simplest possible factor-analytic hypothesis--the hypothesis of a single common factor. Some other correlation matrix might not fit the hypothesis of a single common factor, but might fit the hypothesis of two or three or four common factors. The fewer factors the simpler the hypothesis. Since simple hypothesis generally have logical scientific priority over more complex hypotheses, hypotheses involving fewer factors are considered to be preferable to those involving more factors. That is, you accept at least tentatively the simplest hypothesis (i.e., involving the fewest factors) that is not clearly contradicted by the set of observed correlations. Like many writers, I'll let m denote the hypothesized number of common factors.

Without getting deeply into the mathematics, we can say that factor analysis attempts to express each variable as the sum of common and unique portions. The common portions of all the variables are by definition fully explained by the common factors, and the unique portions are ideally perfectly uncorrelated with each other. The degree to which a given data set fits this condition can be judged from an analysis of what is usually called the "residual correlation matrix".

The name of this matrix is somewhat misleading because the entries in the matrix are typically not correlations. If there is any doubt in your mind about some particiular printout, look for the diagonal entries in the matrix, such as the "correlation" of the first variable with itself, the second with itself, etc. If these diagonal entries are not all exactly 1, then the matrix printed is not a correlation matrix. However, it can typically be transformed into a correlation matrix by dividing each off-diagonal entry by the square roots of the two corresponding diagonal entries. For instance, if the first two diagonal entries are .36 and .64, and the off-diagonal entry in position [1,2] is .3, then the residual correlation is .3/(.6*.8) = 5/8 = .625.

Correlations found in this way are the correlations that would have to be allowed among the "unique" portions of the variables in order to make the common portions of the variables fit the hypothesis of m common factors. If these calculated correlations are so high that they are inconsistent with the hypothesis that they are 0 in the population, then the hypothesis of m common factors is rejected. Increasing m always lowers these correlations, thus producing a hypothesis more consistent with the data.

We want to find the simplest hypothesis (that is, the lowest m) consistent with the data. In this respect, a factor analysis can be compared to episodes in scientific history that took decades or centuries to develop. Copernicus realized that the earth and other planets moved around the sun, but he first hypothesized that their orbits were circles. Kepler later realized that the orbits were better described as ellipses. A circle is a simpler figure than an ellipse, so this episode of scientific history illustrates the general point that we start with a simple theory and gradually make it more complex to better fit the observed data.

The same principle can be observed in the history of experimental psychology. In the 1940s, experimental psychologists widely believed that all the basic principles of learning, that might even revolutionize educational practice, could be discovered by studying rats in mazes. Today that view is considered ridiculously oversimplified, but it does illustrate the general scientific point that it is reasonable to start with a simple theory and gradually move to more complex theories only when it becomes clear that the simple theory fails to fit the data.

This general scientific principle can be applied within a single factor analysis. Start with the simplest possible theory (usually m = 1), test the fit between that theory and the data, and then increase m as needed. Each increase in m produces a theory that is more complex but will fit the data better. Stop when you find a theory that fits the data adequately.

Each observed variable's communality is its estimated squared correlation with its own common portion--that is, the proportion of variance in that variable that is explained by the common factors. If you perform factor analyses with several different values of m, as suggested above, you will find that the communalities generally increase with m. But the communalities are not used to choose the final value of m. Low communalities are not interpreted as evidence that the data fail to fit the hypothesis, but merely as evidence that the variables analyzed have little in common with one another. Most factor analysis programs first estimate each variable's communality as the squared multiple correlation between that variable and the other variables in the analysis, then use an iterative procedure to gradually find a better estimate.

Factor analysis may use either correlations or covariances. The covariance covjk between two variables numbered j and k is their correlation times their two standard deviations: covjk = rjk sj sk, where rjk is their correlation and sj and sk are their standard deviations. A covariance has no very important substantive meaning, but it does have some very useful mathematical properties described in the next section. Since any variable correlates 1 with itself, any variable's covariance with itself is its variance--the square of its standard deviation. A correlation matrix can be thought of as a matrix of variances and covariances (more concisely, a covariance matrix) of a set of variables that have already been adjusted to standard deviations of 1. Therefore I shall often talk about a covariance matrix when we really mean either a correlation or covariance matrix. I will use R to denote either a correlation or covariance matrix of observed variables. This is admittedly awkward, but the matrix analyzed is nearly always a correlation matrix, and as explained later we need the letter C for the common-factor portion of R.

### Matrix Decomposition and Rank

The central theorem of factor analysis is that you can do something similar for an entire covariance matrix. A covariance matrix R can be partitioned into a common portion C which is explained by a set of factors, and a unique portion U unexplained by those factors. In matrix terminology, R = C + U, which means that each entry in matrix R is the sum of the corresponding entries in matrices C and U.

As in analysis of variance with equal cell frequencies, the explained component C can be broken down further. C can be decomposed into component matrices c1, c2, etc., explained by individual factors. Each of these one-factor components cj equals the "outer product" of a column of "factor loadings". The outer product of a column of numbers is the square matrix formed by letting entry jk in the matrix equal the product of entries j and k in the column. Thus if a column has entries .9, .8, .7, .6, .5, as in the earlier example, its outer product is

Earlier I mentioned the off-diagonal entries in this matrix but not the diagonal entries. Each diagonal entry in a cj matrix is actually the amount of variance in the corresponding variable explained by that factor. In our example, g correlates .9 with the first observed variable, so the amount of explained variance in that variable is .9 2 or .81, the first diagonal entry in this matrix.

In the example there is only one common factor, so matrix C for this example (denoted C55) is C55 = c1. Therefore the residual matrix U for this example (denoted U55) is U55 = R55 - c1. This gives the following matrix for U55:

This is the covariance matrix of the portions of the variables unexplained by the factor. As mentioned earlier, all off-diagonal entries in U55 are 0, and the diagonal entries are the amounts of unexplained or unique variance in each variable.

Often C is the sum of several matrices cj, not just one as in this example. The number of c-matrices which sum to C is the rank of matrix C in this example the rank of C is 1. The rank of C is the number of common factors in that model. If you specify a certain number m of factors, a factor analysis program then derives two matrices C and U which sum to the original correlation or covariance matrix R, making the rank of C equal m. The larger you set m, the closer C will approximate R. If you set m = p, where p is the number of variables in the matrix, then every entry in C will exactly equal the corresponding entry in R, leaving U as a matrix of zeros. The idea is to see how low you can set m and still have C provide a reasonable approximation to R.

### How Many Cases and Variables?

The rules about number of variables are very different for factor analysis than for regression. In factor analysis it is perfectly okay to have many more variables than cases. In fact, generally speaking the more variables the better, so long as the variables remain relevant to the underlying factors.

### How Many Factors?

Of the two rules that are discussed in this section, the first uses a formal significance test to identify the number of common factors. Let N denote the sample size, p the number of variables, and m the number of factors. Also RU denotes the residual matrix U transformed into a correlation matrix, |RU| is its determinant, and ln(1/|RU|) is the natural logarithm of the reciprocal of that determinant.

To apply this rule, first compute G = N-1-(2p+5)/6-(2/3)m. Then compute

If it is difficult to compute ln(1/|RU|), that expression is often well approximated by rU 2 , where the summation denotes the sum of all squared correlations above the diagonal in matrix RU.

To use this formula to choose the number of factors, start with m = 1 (or even with m = 0) and compute this test for successively increasing values of m, stopping when you find nonsignificance that value of m is the smallest value of m that is not significantly contradicted by the data. The major difficulty with this rule is that in my experience, with moderately large samples it leads to more factors than can successfully be interpreted.

I recommend an alternative approach. This approach was once impractical, but today is well within reach. Perform factor analyses with various values of m, complete with rotation, and choose the one that gives the most appealing structure.

## Rotation

### Linear Functions of Predictors

Now suppose a co-worker suggests summing each student's verbal and math scores to obtain a composite "academic skill" score I'll call AS, and taking the difference between each student's verbal and math scores to obtain a second variable I'll call VMD (verbal-math difference). The co-worker suggests running the same set of regressions to predict grades in individual courses, except using AS and VMD as predictors in each regression, instead of the original verbal and math scores. In this example, you would get exactly the same predictions of course grades from these two families of regressions: one predicting grades in individual courses from verbal and math scores, the other predicting the same grades from AS and VMD scores. In fact, you would get the same predictions if you formed composites of 3 math + 5 verbal and 5 verbal + 3 math, and ran a series of two-variable multiple regressions predicting grades from these two composites. These examples are all linear functions of the original verbal and math scores.

The central point is that if you have m predictor variables, and you replace the m original predictors by m linear functions of those predictors, you generally neither gain or lose any information--you could if you wish use the scores on the linear functions to reconstruct the scores on the original variables. But multiple regression uses whatever information you have in the optimum way (as measured by the sum of squared errors in the current sample) to predict a new variable (e.g. grades in a particular course). Since the linear functions contain the same information as the original variables, you get the same predictions as before.

Given that there are many ways to get exactly the same predictions, is there any advantage to using one set of linear functions rather than another? Yes there is one set may be simpler than another. One particular pair of linear functions may enable many of the course grades to be predicted from just one variable (that is, one linear function) rather than from two. If we regard regressions with fewer predictor variables as simpler, then we can ask this question: Out of all the possible pairs of predictor variables that would give the same predictions, which is simplest to use, in the sense of minimizing the number of predictor variables needed in the typical regression? The pair of predictor variables maximining some measure of simplicity could be said to have simple structure. In this example involving grades, you might be able to predict grades in some courses accurately from just a verbal test score, and predict grades in other courses accurately from just a math score. If so, then you would have achieved a "simpler structure" in your predictions than if you had used both tests for all predictions.

### Simple Structure in Factor Analysis

In the extreme case of simple structure, each X-variable will have only one large entry, so that all the others can be ignored. But that would be a simpler structure than you would normally expect to achieve after all, in the real world each variable isn't normally affected by only one other variable. You then name the factors subjectively, based on an inspection of their loadings.

In common factor analysis the process of rotation is actually somewhat more abstract that I have implied here, because you don't actually know the individual scores of cases on factors. However, the statistics for a multiple regression that are most relevant here--the multiple correlation and the standardized regression slopes--can all be calculated just from the correlations of the variables and factors involved. Therefore we can base the calculations for rotation to simple structure on just those correlations, without using any individual scores.

A rotation which requires the factors to remain uncorrelated is an orthogonal rotation, while others are oblique rotations. Oblique rotations often achieve greater simple structure, though at the cost that you must also consider the matrix of factor intercorrelations when interpreting results. Manuals are generally clear which is which, but if there is ever any ambiguity, a simple rule is that if there is any ability to print out a matrix of factor correlations, then the rotation is oblique, since no such capacity is needed for orthogonal rotations.

### An Example

Oblique Promax rotation of 4 factors of 24 mental ability variables From Gorsuch (1983)

This table reveals quite a good simple structure. Within each of the four blocks of variables, the high values (above about .4 in absolute value) are generally all in a single column--a separate column for each of the four blocks. Further, the variables within each block all seem to measure the same general kind of mental ability. The major exception to both these generalizations comes in the third block. The variables in that block seem to include measures of both visual ability and reasoning, and the reasoning variables (the last four in the block) generally have loadings in column 3 not far above their loadings in one or more other columns. This suggests that a 5-factor solution might be worth trying, in the hope that it might yield separate "visual" and "reasoning" factors. The factor names in Table 1 were given by Gorsuch, but inspection of the variables in the second block suggests that "simple repetitive tasks" might be a better name for factor 2 than "numerical".

I don't mean to imply that you should always try to make every variable load highly on only one factor. For instance, a test of ability to deal with arithmetic word problems might well load highly on both verbal and mathematical factors. This is actually one of the advantages of factor analysis over cluster analysis, since you cannot put the same variable in two different clusters.

## Principal Component Analysis (PCA)

### Basics

The central concept in PCA is representation or summarization. Suppose we want to replace a large set of variables by a smaller set which best summarizes the larger set. For instance, suppose we have recorded the scores of hundreds of pupils on 30 mental tests, and we don't have the space to store all those scores. (This is a very artificial example in the computer age, but was more appealing before then, when PCA was invented.) For economy of storage we would like to reduce the set to 5 scores per pupil, from which we would like to be able to reconstruct the original 30 scores as accurately as possible.

Let p and m denote respectively the original and reduced number of variables--30 and 5 in the current example. The original variables are denoted X, the summarizing variables F for factor. In the simplest case our measure of accuracy of reconstruction is the sum of p squared multiple correlations between X-variables and the predictions of X made from the factors. In the more general case we can weight each squared multiple correlation by the variance of the corresponding X-variable. Since we can set those variances ourselves by multiplying scores on each variable by any constant we choose, this amounts to the ability to assign any weights we choose to the different variables.

We now have a problem which is well-defined in the mathematical sense: reduce p variables to a set of m linear functions of those variables which best summarize the original p in the sense just described. It turns out, however, that infinitely many linear functions provide equally good summaries. To narrow the problem to one unique solution, we introduce three conditions. First, the m derived linear functions must be mutually uncorrelated. Second, any set of m linear functions must include the functions for a smaller set. For instance, the best 4 linear functions must include the best 3, which include the best 2, which include the best one. Third, the squared weights defining each linear function must sum to 1. These three conditions provide, for most data sets, one unique solution. Typically there are p linear functions (called principal components) declining in importance by using all p you get perfect reconstruction of the original X-scores, and by using the first m (where m ranges from 1 to p) you get the best reconstruction possible for that value of m.

Define each component's eigenvector or characteristic vector or latent vector as the column of weights used to form it from the X-variables. If the original matrix R is a correlation matrix, define each component's eigenvalue or characteristic value or latent value as its sum of squared correlations with the X-variables. If R is a covariance matrix, define the eigenvalue as a weighted sum of squared correlations, with each correlation weighted by the variance of the corresponding X-variable. The sum of the eigenvalues always equals the sum of the diagonal entries in R.

Nonunique solutions arise only when two or more eigenvalues are exactly equal it then turns out that the corresponding eigenvectors are not uniquely defined. This case rarely arises in practice, and I shall ignore it henceforth.

Each component's eigenvalue is called the "amount of variance" the component explains. The major reason for this is the eigenvalue's definition as a weighted sum of squared correlations. However, it also turns out that the actual variance of the component scores equals the eigenvalue. Thus in PCA the "factor variance" and "amount of variance the factor explains" are always equal. Therefore the two phrases are often used interchangeably, even though conceptually they stand for very different quantities.

### The Number of Principal Components

1. Sum of eigenvalues = p
if the input matrix was a correlation matrix

Sum of eigenvalues = sum of input variances
if the input matrix was a covariance matrix

2. Proportion of variance explained = eigenvalue / sum of eigenvalues

= eigenvaluej

= variance explained in variable i
= Cii (diagonal entry i in matrix C)
= communalityi in common factor analysis
= variance of variable i if m = p

5. Sum of crossproducts between columns i and j of factor loading matrix
= Cij (entry ij in matrix C)

6. The relations in #3, #4 and #5 are still true after rotation.

7. R - C = U. If necessary, rule 4 can be used to find the diagonal entries in C, then rule 7 can be used to find the diagonal entries in U.

## Comparing Two Factor Analyses

Actually, several different questions might be phrased as questions about the similarity of two factor analyses. First we must distinguish between two different data formats:

1. Same variables, two groups. The same set of measures might be taken on men and women, or on treatment and control groups. The question then arises whether the two factor structures are the same.

2. One group, two conditions or two sets of variables. Two test batteries might be given to a single group of subjects, and questions asked about how the two sets of scores differ. Or the same battery might be given under two different conditions.

The next two sections consider these questions separately.

### Comparing Factor Analyses in Two Groups

The question, "Do these two groups have the same factor structure?" is actually quite different from the question, "Do they have the same factors?" The latter question is closer to the question, "Do we need two different factor analyses for the two groups?" To see the point, imagine a problem with 5 "verbal" tests and 5 "math" tests. For simplicity imagine all correlations between the two sets of tests are exactly zero. Also for simplicity consider a component analysis, though the same point can be made concerning a common factor analysis. Now imagine that the correlations among the 5 verbal tests are all exactly .4 among women and .8 among men, while the correlations among the 5 math tests are all exactly .8 among women and .4 among men. Factor analyses in the two groups separately would yield different factor structures but identical factors in each gender the analysis would identify a "verbal" factor which is an equally-weighted average of all verbal items with 0 weights for all math items, and a "math" factor with the opposite pattern. In this example nothing would be gained from using separate factor analyses for the two genders, even though the two factor structures are quite different.

Another important point about the two-group problem is that an analysis which derives 4 factors for group A and 4 for group B has as many factors total as an analysis which derives 8 in the combined group. Thus the practical question may be not whether analyses deriving m factors in each of two groups fit the data better than an analysis deriving m factors in the combined group. Rather the two separate analyses should be compared to an analysis deriving 2m factors in the combined group. To make this comparison for component analysis, sum the first m eigenvalues in each separate group, and compare the mean of those two sums to the sum of the first 2m eigenvalues in the combined group. It would be very rare that this analysis suggests that it would be better to do separate factor analyses for the two groups. This same analysis should give at least an approximate answer to the question for common factor analysis as well.

Suppose the question really is whether the two factor structures are identical. This question is very similar to the question as to whether the two correlation or covariance matrices are identical--a question which is precisely defined with no reference to factor analysis at all. Tests of these hypotheses are beyond the scope of this work, but a test on the equality of two covariance matrices appears in Morrison (1990) and other works on multivariate analysis.

### Comparing Factor Analyses of Two Sets of Variables in a Single Group

As in the case of two separate samples of cases, there is a question which often gets phrased in terms of factors but which is better phrased as a question about the equality of two correlation or covariance matrices--a question which can be answered with no reference to factor analysis. In the present instance we have two parallel sets of variables that is, each variable in set A parallels one in set B. In fact, sets A and B may be the very same measures administered under two different conditions. The question then is whether the two correlation matrices or covariance matrices are identical. This question has nothing to do with factor analysis, but it also has little to do with the question of whether the AB correlations are high. The two correlation or covariance matrices within sets A and B might be equal regardless of whether the AB correlations are high or low.

Darlington, Weinberg, and Walberg (1973) described a test of the null hypothesis that the covariance matrices for variable sets A and B are equal when sets A and B are measured in the same sample of cases. It requires the assuption that the AB covariance matrix is symmetric. Thus for instance if sets A and B are the same set of tests administered in years 1 and 2, the assumption requires that the covariance between test X in year 1 and test Y in year 2 equal the covariance between test X in year 2 and test Y in year 1. Given this assumption, You can simply form two sets of scores I'll call A+B and A-B, consisting of the sums and differences of parallel variables in the two sets. It then turns out that the original null hypothesis is equivalent to the hypothesis that all the variables in set A+B are uncorrelated with all variables in set A-B. This hypothesis can be tested with MANOVA.

## Factor and Component Analysis in SYSTAT 5

### Inputting data

FACTOR will accept data in standard rectangular format. It will automatically compute a correlation matrix and use it for further analysis. If you want to analyze a covariance matrix instead, enter

If you later want to analyze a correlation matrix, enter

The "correlation" type is the default type, so you need not enter that if you want to analyze only correlation matrices.

A second way to prepare data for a factor analysis is to compute and save a correlation or covariance matrix in the CORR menu. SYSTAT will automatically note whether the matrix is a correlation or covariance matrix at the time it is saved, and will save that information. Then FACTOR will automatically use the correct type.

A third way is useful if you have a correlation or covariance matrix from a printed source, and want to enter that matrix by hand. To do this, combine the INPUT and TYPE commands. For instance, suppose the matrix

is the covariance matrix for the four variables ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM. (Normally enter correlations or covariances to more significant digits than this.) In the DATA module you could type

SAVE MATH
INPUT ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM
TYPE COVARIANCE
RUN
.94
.62 .89
.47 .58 .97
.36 .29 .38 .87
QUIT

Notice that you input only the lower triangular portion of the matrix. In this example you input the diagonal, but if you are inputting a correlation matrix so that all diagonal entries are 1.0, then enter the command DIAGONAL ABSENT just before RUN, then omit the diagonal entries.

The fourth way, which won't work, is to enter or scan the correlation or covariance matrix into a word processor, then use SYSTAT's GET command to move the matrix into SYSTAT. In this method SYSTAT will not properly record the matrix TYPE, and will treat the matrix as a matrix of scores rather than correlations or covariances. Unfortunately, SYSTAT willgive you output in the format you expect, and there will be no obvious sign that the whole analysis has been done incorrectly.

### Commands for Factor Analysis

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM

To choose common factor analysis instead of principal components, add the option IPA for "iterated principal axis". All options are listed after a slash IPA is an option but the variable list is not. Thus a command might read

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM / IPA

The ITER (iteration) option determines the maximum number of iterations to estimate communalities in common factor analysis. Increase ITER if SYSTAT warns you that communality estimates are suspect the default is ITER = 25. The TOL option specifies a change in communality estimates below which FACTOR will stop trying to improve communality estimates default is TOL = .001. The PLOT option yields plots of factor loadings for pairs of factors or components. The number of such plots is m(m-1)/2, which may be large if m is large. A command using all these options might read

FACTOR / IPA, TOL = .0001, ITER = 60, PLOT

These are the only options to the FACTOR command all other instructions to the FACTOR program are issued as separate commands.

There are two commands you can use to control the number of factors: NUMBER and EIGEN. The command

instructs FACTOR to derive 4 factors. The command

instructs FACTOR to choose a number of factors equal to the number of eigenvalues above .5. Thus when you factor a correlation matrix, the command

implements the Kaiser rule for choosing the number of factors. The default is EIGEN = 0, which causes FACTOR to derive all possible factors. If you use both NUMBER and EIGEN commands, FACTOR will follow whichever rule produces the smaller number of factors.

The ROTATE command allows you to choose a method of rotation. The choices are

The differences among these methods are beyond the scope of this chapter. In any event, rotation does not affect a factor structure's fit to the data, so you may if you wish use them all and choose the one whose results you like best. In fact, that is commonly done. The default method for rotation is varimax, so typing just ROTATE implements varimax.

There are three options for saving the output of factor analysis into files. To do this, use the SAVE command before the FACTOR command. The command

saves scores on principal components into a file named MYFILE. This cannot be used with common factor analysis (the IPA option) since common factor scores are undefined. The command

saves the coefficients used to define components. These coefficients are in a sense the opposite of factor loadings. Loadings predict variables from factors, while coefficients define factors in terms of the original variables. If you specify a rotation, the coefficients are the ones defining the rotated components. The command

saves the matrix of factor loadings it may be used with either common factor analysis or component analysis. Again, if you specify a rotation, the loadings saved are for rotated factors.

### Output

• eigenvalues
• variance explained by factors (usually equal to eigenvalues)
• proportion of variance explained by factors
• initial communality estimates
• an index of changes in communality estimates
• final communality estimates
• Input correlation or covariance matrix R
• Matrix of residual covariances--the off-diagonal part of U
• a scree plot

### An Example

use usdata
rotate = varimax
sort
print long
number = 2
factor cardio, cancer, pulmonar, pneu_flu, diabetes, liver / ipa, plot

Except for a scree plot and a plot of factor loadings, which have been omitted, and a few minor edits I have made for clarity, these commands will produce the following output:

### REFERENCES

Gorsuch, Richard L. (1983) Factor Analysis. Hillsdale, NJ: Erlbaum

Morrison, Donald F. (1990) Multivariate Statistical Methods. New York: McGraw-Hill.

Rubenstein, Amy S. (1986). An item-level analysis of questionnaire-type measures of intellectual curiosity. Cornell University Ph. D. thesis.

## Content Preview

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables. In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally we like each variable to contribute significantly to only one component. A technique called factor rotation is employed towards that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

## Factor Analysis

Factor analysis includes both component analysis and common factor analysis. More than other statistical techniques, factor analysis has suffered from confusion concerning its very purpose. This affects my presentation in two ways. First, I devote a long section to describing what factor analysis does before examining in later sections how it does it. Second, I have decided to reverse the usual order of presentation. Component analysis is simpler, and most discussions present it first. However, I believe common factor analysis comes closer to solving the problems most researchers actually want to solve. Thus learning component analysis first may actually interfere with understanding what those problems are. Therefore component analysis is introduced only quite late in this chapter.

## What Factor Analysis Can and Can't Do

### Some Examples of Factor-Analysis Problems

It was an interesting idea, but it turned out to be wrong. Today the College Board testing service operates a system based on the idea that there are at least three important factors of mental ability--verbal, mathematical, and logical abilities--and most psychologists agree that many other factors could be identified as well.

2. Consider various measures of the activity of the autonomic nervous system--heart rate, blood pressure, etc. Psychologists have wanted to know whether, except for random fluctuation, all those measures move up and down together--the "activation" hypothesis. Or do groups of autonomic measures move up and down together, but separate from other groups? Or are all the measures largely independent? An unpublished analysis of mine found that in one data set, at any rate, the data fitted the activation hypothesis quite well.

3. Suppose many species of animal (rats, mice, birds, frogs, etc.) are trained that food will appear at a certain spot whenever a noise--any kind of noise--comes from that spot. You could then tell whether they could detect a particular sound by seeing whether they turn in that direction when the sound appears. Then if you studied many sounds and many species, you might want to know on how many different dimensions of hearing acuity the species vary. One hypothesis would be that they vary on just three dimensions--the ability to detect high-frequency sounds, ability to detect low-frequency sounds, and ability to detect intermediate sounds. On the other hand, species might differ in their auditory capabilities on more than just these three dimensions. For instance, some species might be better at detecting sharp click-like sounds while others are better at detecting continuous hiss-like sounds.

4. Suppose each of 500 people, who are all familiar with different kinds of automobiles, rates each of 20 automobile models on the question, "How much would you like to own that kind of automobile?" We could usefully ask about the number of dimensions on which the ratings differ. A one-factor theory would posit that people simply give the highest ratings to the most expensive models. A two-factor theory would posit that some people are most attracted to sporty models while others are most attracted to luxurious models. Three-factor and four-factor theories might add safety and reliability. Or instead of automobiles you might choose to study attitudes concerning foods, political policies, political candidates, or many other kinds of objects.

5. Rubenstein (1986) studied the nature of curiosity by analyzing the agreements of junior-high-school students with a large battery of statements such as "I like to figure out how machinery works" or "I like to try new kinds of food." A factor analysis identified seven factors: three measuring enjoyment of problem-solving, learning, and reading three measuring interests in natural sciences, art and music, and new experiences in general and one indicating a relatively low interest in money.

### The Goal: Understanding of Causes

1. How many different factors are needed to explain the pattern of relationships among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?

### Absolute Versus Heuristic Uses of Factor Analysis

The previous examples can be used to illustrate a useful distinction--between absolute and heuristic uses of factor analysis. Spearman's g theory of intelligence, and the activation theory of autonomic functioning, can be thought of as absolute theories which are or were hypothesized to give complete descriptions of the pattern of relationships among variables. On the other hand, Rubenstein never claimed that her list of the seven major factors of curiosity offered a complete description of curiosity. Rather those factors merely appear to be the most important seven factors--the best way of summarizing a body of data. Factor analysis can suggest either absolute or heuristic models the distinction is in how you interpret the output.

### Is Factor Analysis Objective?

A similar balancing problem arises in regression and analysis of variance, but it generally doesn't prevent different workers from reaching nearly or exactly the same conclusions. After all, if two workers apply an analysis of variance to the same data, and both workers drop out the terms not significant at the .05 level, then both will report exactly the same effects. However, the situation in factor analysis is very different. For reasons explained later, there is no significance test in component analysis that will test a hypothesis about the number of factors, as that hypothesis is ordinarily understood. In common factor analysis there is such a test, but its usefulness is limited by the fact that it frequently yields more factors than can be satisfactorily interpreted. Thus a worker who wants to report only interpretable factors is still left without an objective test.

A similar issue arises in identifying the nature of the factors. Two workers may each identify 6 factors, but the two sets of factors may differ--perhaps substantially. The travel-writer analogy is useful here too two writers might each divide the US into 6 regions, but define the regions very differently.

Another geographical analogy may be more parallel to factor analysis, since it involves computer programs designed to maximize some quantifiable objective. Computer programs are sometimes used to divide a state into congressional districts which are geographically continguous, nearly equal in population, and perhaps homogeneous on dimensions of ethnicity or other factors. Two different district-creating programs might come up with very different answers, though both answers are reasonable. This analogy is in a sense too good we believe that factor analysis programs usually don't yield answers as different from each other as district-creating programs do.

### Factor Analysis Versus Clustering and Multidimensional Scaling

Another advantage of factor analysis over these other methods is that factor analysis can recognize certain properties of correlations. For instance, if variables A and B each correlate .7 with variable C, and correlate .49 with each other, factor analysis can recognize that A and B correlate zero when C is held constant because .7 2 = .49. Multidimensional scaling and cluster analysis have no ability to recognize such relationships, since the correlations are treated merely as generic "similarity measures" rather than as correlations.

We are not saying these other methods should never be applied to correlation matrices sometimes they yield insights not available through factor analysis. But they have definitely not made factor analysis obsolete. The next section touches on this point.

### Factors "Differentiating" Variables Versus Factors "Underlying" Variables

One possible meaning of the phrase about "differentiating" is that a set of variables all correlate highly with each other but differ in their means. A rather similar meaning can arise in a different case. Consider several tests A, B, C, D which test the same broadly-conceived mental ability, but which increase in difficulty in the order listed. Then the highest correlations among the tests may be between adjacent items in this list (rAB, rBC and rCD) while the lowest correlation is between items at the opposite ends of the list (rAD). Someone who observed this pattern in the correlations among the items might well say the tests "can be put in a simple order" or "differ in just one factor", but that conclusion has nothing to do with factor analysis. This set of tests would not contain just one common factor.

A third case of this sort may arise if variable A affects B, which affects C, which affects D, and those are the only effects linking these variables. Once again, the highest correlations would be rAB, rBC and rCD while the lowest correlation would be rAD. Someone might use the same phrases just quoted to describe this pattern of correlations again it has nothing to do with factor analysis.

• Are you above 5 feet 2 inches in height?
• Are you above 5 feet 4 inches in height?
• Are you above 5 feet 6 inches in height?
• Etc.
• Should our nation lower tariff barriers with nation B?
• Should our two central banks issue a single currency?
• Should our armies become one?
• Should we fuse with nation B, becoming one nation?

Applying multidimensional scaling to a correlation matrix could discover all these simple patterns of differences among variables. Thus multidimensional scaling seeks factors which differentiate variables while factor analysis looks for the factors which underlie the variables. Scaling may sometimes find simplicity where factor analysis finds none, and factor analysis may find simplicity where scaling finds none.

## Basic Concepts and Principles

### A Simple Example

Imagine that these are correlations among 5 variables measuring mental ability. Matrix R55 is exactly consistent with the hypothesis of a single common factor g whose correlations with the 5 observed variables are respectively .9, .8, .7, .6, and .5. To see why, consider the formula for the partial correlation between two variables a and b partialing out a third variable g:

This formula shows that rab.g = 0 if and only if rab = rag rbg. The requisite property for a variable to function as a general factor g is that any partial correlation between any two observed variables, partialing out g, is zero. Therefore if a correlation matrix can be explained by a general factor g, it will be true that there is some set of correlations of the observed variables with g, such that the product of any two of those correlations equals the correlation between the two observed variables. But matrix R55 has exactly that property. That is, any off-diagonal entry rjk is the product of the jth and kth entries in the row .9 .8 .7 .6 .5. For instance, the entry in row 1 and column 3 is .9 x .7 or .63. Thus matrix R55 exactly fits the hypothesis of a single common factor.

If we found that pattern in a real correlation matrix, what exactly would we have shown? First, the existence of the factor is inferred rather than observed. We certainly wouldn't have proven that scores on these 5 variables are affected by just one common factor. However, that is the simplest or most parsimonious hypothesis that fits the pattern of observed correlations.

Second, we would have an estimate of the factor's correlation with each of the observed variables, so we can say something about the factor's nature, at least in the sense of what it correlates highly with or doesn't correlate with. In this example the values .9 .8 .7 .6 .5 are these estimated correlations.

Third, we couldn't measure the factor in the sense of deriving each person's exact score on the factor. But we can if we wish use methods of multiple regression to estimate each person's score on the factor from their scores on the observed variables.

Matrix R55 is virtually the simplest possible example of common factor analysis, because the observed correlations are perfectly consistent with the simplest possible factor-analytic hypothesis--the hypothesis of a single common factor. Some other correlation matrix might not fit the hypothesis of a single common factor, but might fit the hypothesis of two or three or four common factors. The fewer factors the simpler the hypothesis. Since simple hypothesis generally have logical scientific priority over more complex hypotheses, hypotheses involving fewer factors are considered to be preferable to those involving more factors. That is, you accept at least tentatively the simplest hypothesis (i.e., involving the fewest factors) that is not clearly contradicted by the set of observed correlations. Like many writers, I'll let m denote the hypothesized number of common factors.

Without getting deeply into the mathematics, we can say that factor analysis attempts to express each variable as the sum of common and unique portions. The common portions of all the variables are by definition fully explained by the common factors, and the unique portions are ideally perfectly uncorrelated with each other. The degree to which a given data set fits this condition can be judged from an analysis of what is usually called the "residual correlation matrix".

The name of this matrix is somewhat misleading because the entries in the matrix are typically not correlations. If there is any doubt in your mind about some particiular printout, look for the diagonal entries in the matrix, such as the "correlation" of the first variable with itself, the second with itself, etc. If these diagonal entries are not all exactly 1, then the matrix printed is not a correlation matrix. However, it can typically be transformed into a correlation matrix by dividing each off-diagonal entry by the square roots of the two corresponding diagonal entries. For instance, if the first two diagonal entries are .36 and .64, and the off-diagonal entry in position [1,2] is .3, then the residual correlation is .3/(.6*.8) = 5/8 = .625.

Correlations found in this way are the correlations that would have to be allowed among the "unique" portions of the variables in order to make the common portions of the variables fit the hypothesis of m common factors. If these calculated correlations are so high that they are inconsistent with the hypothesis that they are 0 in the population, then the hypothesis of m common factors is rejected. Increasing m always lowers these correlations, thus producing a hypothesis more consistent with the data.

We want to find the simplest hypothesis (that is, the lowest m) consistent with the data. In this respect, a factor analysis can be compared to episodes in scientific history that took decades or centuries to develop. Copernicus realized that the earth and other planets moved around the sun, but he first hypothesized that their orbits were circles. Kepler later realized that the orbits were better described as ellipses. A circle is a simpler figure than an ellipse, so this episode of scientific history illustrates the general point that we start with a simple theory and gradually make it more complex to better fit the observed data.

The same principle can be observed in the history of experimental psychology. In the 1940s, experimental psychologists widely believed that all the basic principles of learning, that might even revolutionize educational practice, could be discovered by studying rats in mazes. Today that view is considered ridiculously oversimplified, but it does illustrate the general scientific point that it is reasonable to start with a simple theory and gradually move to more complex theories only when it becomes clear that the simple theory fails to fit the data.

This general scientific principle can be applied within a single factor analysis. Start with the simplest possible theory (usually m = 1), test the fit between that theory and the data, and then increase m as needed. Each increase in m produces a theory that is more complex but will fit the data better. Stop when you find a theory that fits the data adequately.

Each observed variable's communality is its estimated squared correlation with its own common portion--that is, the proportion of variance in that variable that is explained by the common factors. If you perform factor analyses with several different values of m, as suggested above, you will find that the communalities generally increase with m. But the communalities are not used to choose the final value of m. Low communalities are not interpreted as evidence that the data fail to fit the hypothesis, but merely as evidence that the variables analyzed have little in common with one another. Most factor analysis programs first estimate each variable's communality as the squared multiple correlation between that variable and the other variables in the analysis, then use an iterative procedure to gradually find a better estimate.

Factor analysis may use either correlations or covariances. The covariance covjk between two variables numbered j and k is their correlation times their two standard deviations: covjk = rjk sj sk, where rjk is their correlation and sj and sk are their standard deviations. A covariance has no very important substantive meaning, but it does have some very useful mathematical properties described in the next section. Since any variable correlates 1 with itself, any variable's covariance with itself is its variance--the square of its standard deviation. A correlation matrix can be thought of as a matrix of variances and covariances (more concisely, a covariance matrix) of a set of variables that have already been adjusted to standard deviations of 1. Therefore I shall often talk about a covariance matrix when we really mean either a correlation or covariance matrix. I will use R to denote either a correlation or covariance matrix of observed variables. This is admittedly awkward, but the matrix analyzed is nearly always a correlation matrix, and as explained later we need the letter C for the common-factor portion of R.

### Matrix Decomposition and Rank

The central theorem of factor analysis is that you can do something similar for an entire covariance matrix. A covariance matrix R can be partitioned into a common portion C which is explained by a set of factors, and a unique portion U unexplained by those factors. In matrix terminology, R = C + U, which means that each entry in matrix R is the sum of the corresponding entries in matrices C and U.

As in analysis of variance with equal cell frequencies, the explained component C can be broken down further. C can be decomposed into component matrices c1, c2, etc., explained by individual factors. Each of these one-factor components cj equals the "outer product" of a column of "factor loadings". The outer product of a column of numbers is the square matrix formed by letting entry jk in the matrix equal the product of entries j and k in the column. Thus if a column has entries .9, .8, .7, .6, .5, as in the earlier example, its outer product is

Earlier I mentioned the off-diagonal entries in this matrix but not the diagonal entries. Each diagonal entry in a cj matrix is actually the amount of variance in the corresponding variable explained by that factor. In our example, g correlates .9 with the first observed variable, so the amount of explained variance in that variable is .9 2 or .81, the first diagonal entry in this matrix.

In the example there is only one common factor, so matrix C for this example (denoted C55) is C55 = c1. Therefore the residual matrix U for this example (denoted U55) is U55 = R55 - c1. This gives the following matrix for U55:

This is the covariance matrix of the portions of the variables unexplained by the factor. As mentioned earlier, all off-diagonal entries in U55 are 0, and the diagonal entries are the amounts of unexplained or unique variance in each variable.

Often C is the sum of several matrices cj, not just one as in this example. The number of c-matrices which sum to C is the rank of matrix C in this example the rank of C is 1. The rank of C is the number of common factors in that model. If you specify a certain number m of factors, a factor analysis program then derives two matrices C and U which sum to the original correlation or covariance matrix R, making the rank of C equal m. The larger you set m, the closer C will approximate R. If you set m = p, where p is the number of variables in the matrix, then every entry in C will exactly equal the corresponding entry in R, leaving U as a matrix of zeros. The idea is to see how low you can set m and still have C provide a reasonable approximation to R.

### How Many Cases and Variables?

The rules about number of variables are very different for factor analysis than for regression. In factor analysis it is perfectly okay to have many more variables than cases. In fact, generally speaking the more variables the better, so long as the variables remain relevant to the underlying factors.

### How Many Factors?

Of the two rules that are discussed in this section, the first uses a formal significance test to identify the number of common factors. Let N denote the sample size, p the number of variables, and m the number of factors. Also RU denotes the residual matrix U transformed into a correlation matrix, |RU| is its determinant, and ln(1/|RU|) is the natural logarithm of the reciprocal of that determinant.

To apply this rule, first compute G = N-1-(2p+5)/6-(2/3)m. Then compute

If it is difficult to compute ln(1/|RU|), that expression is often well approximated by rU 2 , where the summation denotes the sum of all squared correlations above the diagonal in matrix RU.

To use this formula to choose the number of factors, start with m = 1 (or even with m = 0) and compute this test for successively increasing values of m, stopping when you find nonsignificance that value of m is the smallest value of m that is not significantly contradicted by the data. The major difficulty with this rule is that in my experience, with moderately large samples it leads to more factors than can successfully be interpreted.

I recommend an alternative approach. This approach was once impractical, but today is well within reach. Perform factor analyses with various values of m, complete with rotation, and choose the one that gives the most appealing structure.

## Rotation

### Linear Functions of Predictors

Now suppose a co-worker suggests summing each student's verbal and math scores to obtain a composite "academic skill" score I'll call AS, and taking the difference between each student's verbal and math scores to obtain a second variable I'll call VMD (verbal-math difference). The co-worker suggests running the same set of regressions to predict grades in individual courses, except using AS and VMD as predictors in each regression, instead of the original verbal and math scores. In this example, you would get exactly the same predictions of course grades from these two families of regressions: one predicting grades in individual courses from verbal and math scores, the other predicting the same grades from AS and VMD scores. In fact, you would get the same predictions if you formed composites of 3 math + 5 verbal and 5 verbal + 3 math, and ran a series of two-variable multiple regressions predicting grades from these two composites. These examples are all linear functions of the original verbal and math scores.

The central point is that if you have m predictor variables, and you replace the m original predictors by m linear functions of those predictors, you generally neither gain or lose any information--you could if you wish use the scores on the linear functions to reconstruct the scores on the original variables. But multiple regression uses whatever information you have in the optimum way (as measured by the sum of squared errors in the current sample) to predict a new variable (e.g. grades in a particular course). Since the linear functions contain the same information as the original variables, you get the same predictions as before.

Given that there are many ways to get exactly the same predictions, is there any advantage to using one set of linear functions rather than another? Yes there is one set may be simpler than another. One particular pair of linear functions may enable many of the course grades to be predicted from just one variable (that is, one linear function) rather than from two. If we regard regressions with fewer predictor variables as simpler, then we can ask this question: Out of all the possible pairs of predictor variables that would give the same predictions, which is simplest to use, in the sense of minimizing the number of predictor variables needed in the typical regression? The pair of predictor variables maximining some measure of simplicity could be said to have simple structure. In this example involving grades, you might be able to predict grades in some courses accurately from just a verbal test score, and predict grades in other courses accurately from just a math score. If so, then you would have achieved a "simpler structure" in your predictions than if you had used both tests for all predictions.

### Simple Structure in Factor Analysis

In the extreme case of simple structure, each X-variable will have only one large entry, so that all the others can be ignored. But that would be a simpler structure than you would normally expect to achieve after all, in the real world each variable isn't normally affected by only one other variable. You then name the factors subjectively, based on an inspection of their loadings.

In common factor analysis the process of rotation is actually somewhat more abstract that I have implied here, because you don't actually know the individual scores of cases on factors. However, the statistics for a multiple regression that are most relevant here--the multiple correlation and the standardized regression slopes--can all be calculated just from the correlations of the variables and factors involved. Therefore we can base the calculations for rotation to simple structure on just those correlations, without using any individual scores.

A rotation which requires the factors to remain uncorrelated is an orthogonal rotation, while others are oblique rotations. Oblique rotations often achieve greater simple structure, though at the cost that you must also consider the matrix of factor intercorrelations when interpreting results. Manuals are generally clear which is which, but if there is ever any ambiguity, a simple rule is that if there is any ability to print out a matrix of factor correlations, then the rotation is oblique, since no such capacity is needed for orthogonal rotations.

### An Example

Oblique Promax rotation of 4 factors of 24 mental ability variables From Gorsuch (1983)

This table reveals quite a good simple structure. Within each of the four blocks of variables, the high values (above about .4 in absolute value) are generally all in a single column--a separate column for each of the four blocks. Further, the variables within each block all seem to measure the same general kind of mental ability. The major exception to both these generalizations comes in the third block. The variables in that block seem to include measures of both visual ability and reasoning, and the reasoning variables (the last four in the block) generally have loadings in column 3 not far above their loadings in one or more other columns. This suggests that a 5-factor solution might be worth trying, in the hope that it might yield separate "visual" and "reasoning" factors. The factor names in Table 1 were given by Gorsuch, but inspection of the variables in the second block suggests that "simple repetitive tasks" might be a better name for factor 2 than "numerical".

I don't mean to imply that you should always try to make every variable load highly on only one factor. For instance, a test of ability to deal with arithmetic word problems might well load highly on both verbal and mathematical factors. This is actually one of the advantages of factor analysis over cluster analysis, since you cannot put the same variable in two different clusters.

## Principal Component Analysis (PCA)

### Basics

The central concept in PCA is representation or summarization. Suppose we want to replace a large set of variables by a smaller set which best summarizes the larger set. For instance, suppose we have recorded the scores of hundreds of pupils on 30 mental tests, and we don't have the space to store all those scores. (This is a very artificial example in the computer age, but was more appealing before then, when PCA was invented.) For economy of storage we would like to reduce the set to 5 scores per pupil, from which we would like to be able to reconstruct the original 30 scores as accurately as possible.

Let p and m denote respectively the original and reduced number of variables--30 and 5 in the current example. The original variables are denoted X, the summarizing variables F for factor. In the simplest case our measure of accuracy of reconstruction is the sum of p squared multiple correlations between X-variables and the predictions of X made from the factors. In the more general case we can weight each squared multiple correlation by the variance of the corresponding X-variable. Since we can set those variances ourselves by multiplying scores on each variable by any constant we choose, this amounts to the ability to assign any weights we choose to the different variables.

We now have a problem which is well-defined in the mathematical sense: reduce p variables to a set of m linear functions of those variables which best summarize the original p in the sense just described. It turns out, however, that infinitely many linear functions provide equally good summaries. To narrow the problem to one unique solution, we introduce three conditions. First, the m derived linear functions must be mutually uncorrelated. Second, any set of m linear functions must include the functions for a smaller set. For instance, the best 4 linear functions must include the best 3, which include the best 2, which include the best one. Third, the squared weights defining each linear function must sum to 1. These three conditions provide, for most data sets, one unique solution. Typically there are p linear functions (called principal components) declining in importance by using all p you get perfect reconstruction of the original X-scores, and by using the first m (where m ranges from 1 to p) you get the best reconstruction possible for that value of m.

Define each component's eigenvector or characteristic vector or latent vector as the column of weights used to form it from the X-variables. If the original matrix R is a correlation matrix, define each component's eigenvalue or characteristic value or latent value as its sum of squared correlations with the X-variables. If R is a covariance matrix, define the eigenvalue as a weighted sum of squared correlations, with each correlation weighted by the variance of the corresponding X-variable. The sum of the eigenvalues always equals the sum of the diagonal entries in R.

Nonunique solutions arise only when two or more eigenvalues are exactly equal it then turns out that the corresponding eigenvectors are not uniquely defined. This case rarely arises in practice, and I shall ignore it henceforth.

Each component's eigenvalue is called the "amount of variance" the component explains. The major reason for this is the eigenvalue's definition as a weighted sum of squared correlations. However, it also turns out that the actual variance of the component scores equals the eigenvalue. Thus in PCA the "factor variance" and "amount of variance the factor explains" are always equal. Therefore the two phrases are often used interchangeably, even though conceptually they stand for very different quantities.

### The Number of Principal Components

1. Sum of eigenvalues = p
if the input matrix was a correlation matrix

Sum of eigenvalues = sum of input variances
if the input matrix was a covariance matrix

2. Proportion of variance explained = eigenvalue / sum of eigenvalues

= eigenvaluej

= variance explained in variable i
= Cii (diagonal entry i in matrix C)
= communalityi in common factor analysis
= variance of variable i if m = p

5. Sum of crossproducts between columns i and j of factor loading matrix
= Cij (entry ij in matrix C)

6. The relations in #3, #4 and #5 are still true after rotation.

7. R - C = U. If necessary, rule 4 can be used to find the diagonal entries in C, then rule 7 can be used to find the diagonal entries in U.

## Comparing Two Factor Analyses

Actually, several different questions might be phrased as questions about the similarity of two factor analyses. First we must distinguish between two different data formats:

1. Same variables, two groups. The same set of measures might be taken on men and women, or on treatment and control groups. The question then arises whether the two factor structures are the same.

2. One group, two conditions or two sets of variables. Two test batteries might be given to a single group of subjects, and questions asked about how the two sets of scores differ. Or the same battery might be given under two different conditions.

The next two sections consider these questions separately.

### Comparing Factor Analyses in Two Groups

The question, "Do these two groups have the same factor structure?" is actually quite different from the question, "Do they have the same factors?" The latter question is closer to the question, "Do we need two different factor analyses for the two groups?" To see the point, imagine a problem with 5 "verbal" tests and 5 "math" tests. For simplicity imagine all correlations between the two sets of tests are exactly zero. Also for simplicity consider a component analysis, though the same point can be made concerning a common factor analysis. Now imagine that the correlations among the 5 verbal tests are all exactly .4 among women and .8 among men, while the correlations among the 5 math tests are all exactly .8 among women and .4 among men. Factor analyses in the two groups separately would yield different factor structures but identical factors in each gender the analysis would identify a "verbal" factor which is an equally-weighted average of all verbal items with 0 weights for all math items, and a "math" factor with the opposite pattern. In this example nothing would be gained from using separate factor analyses for the two genders, even though the two factor structures are quite different.

Another important point about the two-group problem is that an analysis which derives 4 factors for group A and 4 for group B has as many factors total as an analysis which derives 8 in the combined group. Thus the practical question may be not whether analyses deriving m factors in each of two groups fit the data better than an analysis deriving m factors in the combined group. Rather the two separate analyses should be compared to an analysis deriving 2m factors in the combined group. To make this comparison for component analysis, sum the first m eigenvalues in each separate group, and compare the mean of those two sums to the sum of the first 2m eigenvalues in the combined group. It would be very rare that this analysis suggests that it would be better to do separate factor analyses for the two groups. This same analysis should give at least an approximate answer to the question for common factor analysis as well.

Suppose the question really is whether the two factor structures are identical. This question is very similar to the question as to whether the two correlation or covariance matrices are identical--a question which is precisely defined with no reference to factor analysis at all. Tests of these hypotheses are beyond the scope of this work, but a test on the equality of two covariance matrices appears in Morrison (1990) and other works on multivariate analysis.

### Comparing Factor Analyses of Two Sets of Variables in a Single Group

As in the case of two separate samples of cases, there is a question which often gets phrased in terms of factors but which is better phrased as a question about the equality of two correlation or covariance matrices--a question which can be answered with no reference to factor analysis. In the present instance we have two parallel sets of variables that is, each variable in set A parallels one in set B. In fact, sets A and B may be the very same measures administered under two different conditions. The question then is whether the two correlation matrices or covariance matrices are identical. This question has nothing to do with factor analysis, but it also has little to do with the question of whether the AB correlations are high. The two correlation or covariance matrices within sets A and B might be equal regardless of whether the AB correlations are high or low.

Darlington, Weinberg, and Walberg (1973) described a test of the null hypothesis that the covariance matrices for variable sets A and B are equal when sets A and B are measured in the same sample of cases. It requires the assuption that the AB covariance matrix is symmetric. Thus for instance if sets A and B are the same set of tests administered in years 1 and 2, the assumption requires that the covariance between test X in year 1 and test Y in year 2 equal the covariance between test X in year 2 and test Y in year 1. Given this assumption, You can simply form two sets of scores I'll call A+B and A-B, consisting of the sums and differences of parallel variables in the two sets. It then turns out that the original null hypothesis is equivalent to the hypothesis that all the variables in set A+B are uncorrelated with all variables in set A-B. This hypothesis can be tested with MANOVA.

## Factor and Component Analysis in SYSTAT 5

### Inputting data

FACTOR will accept data in standard rectangular format. It will automatically compute a correlation matrix and use it for further analysis. If you want to analyze a covariance matrix instead, enter

If you later want to analyze a correlation matrix, enter

The "correlation" type is the default type, so you need not enter that if you want to analyze only correlation matrices.

A second way to prepare data for a factor analysis is to compute and save a correlation or covariance matrix in the CORR menu. SYSTAT will automatically note whether the matrix is a correlation or covariance matrix at the time it is saved, and will save that information. Then FACTOR will automatically use the correct type.

A third way is useful if you have a correlation or covariance matrix from a printed source, and want to enter that matrix by hand. To do this, combine the INPUT and TYPE commands. For instance, suppose the matrix

is the covariance matrix for the four variables ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM. (Normally enter correlations or covariances to more significant digits than this.) In the DATA module you could type

SAVE MATH
INPUT ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM
TYPE COVARIANCE
RUN
.94
.62 .89
.47 .58 .97
.36 .29 .38 .87
QUIT

Notice that you input only the lower triangular portion of the matrix. In this example you input the diagonal, but if you are inputting a correlation matrix so that all diagonal entries are 1.0, then enter the command DIAGONAL ABSENT just before RUN, then omit the diagonal entries.

The fourth way, which won't work, is to enter or scan the correlation or covariance matrix into a word processor, then use SYSTAT's GET command to move the matrix into SYSTAT. In this method SYSTAT will not properly record the matrix TYPE, and will treat the matrix as a matrix of scores rather than correlations or covariances. Unfortunately, SYSTAT willgive you output in the format you expect, and there will be no obvious sign that the whole analysis has been done incorrectly.

### Commands for Factor Analysis

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM

To choose common factor analysis instead of principal components, add the option IPA for "iterated principal axis". All options are listed after a slash IPA is an option but the variable list is not. Thus a command might read

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM / IPA

The ITER (iteration) option determines the maximum number of iterations to estimate communalities in common factor analysis. Increase ITER if SYSTAT warns you that communality estimates are suspect the default is ITER = 25. The TOL option specifies a change in communality estimates below which FACTOR will stop trying to improve communality estimates default is TOL = .001. The PLOT option yields plots of factor loadings for pairs of factors or components. The number of such plots is m(m-1)/2, which may be large if m is large. A command using all these options might read

FACTOR / IPA, TOL = .0001, ITER = 60, PLOT

These are the only options to the FACTOR command all other instructions to the FACTOR program are issued as separate commands.

There are two commands you can use to control the number of factors: NUMBER and EIGEN. The command

instructs FACTOR to derive 4 factors. The command

instructs FACTOR to choose a number of factors equal to the number of eigenvalues above .5. Thus when you factor a correlation matrix, the command

implements the Kaiser rule for choosing the number of factors. The default is EIGEN = 0, which causes FACTOR to derive all possible factors. If you use both NUMBER and EIGEN commands, FACTOR will follow whichever rule produces the smaller number of factors.

The ROTATE command allows you to choose a method of rotation. The choices are

The differences among these methods are beyond the scope of this chapter. In any event, rotation does not affect a factor structure's fit to the data, so you may if you wish use them all and choose the one whose results you like best. In fact, that is commonly done. The default method for rotation is varimax, so typing just ROTATE implements varimax.

There are three options for saving the output of factor analysis into files. To do this, use the SAVE command before the FACTOR command. The command

saves scores on principal components into a file named MYFILE. This cannot be used with common factor analysis (the IPA option) since common factor scores are undefined. The command

saves the coefficients used to define components. These coefficients are in a sense the opposite of factor loadings. Loadings predict variables from factors, while coefficients define factors in terms of the original variables. If you specify a rotation, the coefficients are the ones defining the rotated components. The command

saves the matrix of factor loadings it may be used with either common factor analysis or component analysis. Again, if you specify a rotation, the loadings saved are for rotated factors.

### Output

• eigenvalues
• variance explained by factors (usually equal to eigenvalues)
• proportion of variance explained by factors
• initial communality estimates
• an index of changes in communality estimates
• final communality estimates
• Input correlation or covariance matrix R
• Matrix of residual covariances--the off-diagonal part of U
• a scree plot

### An Example

use usdata
rotate = varimax
sort
print long
number = 2
factor cardio, cancer, pulmonar, pneu_flu, diabetes, liver / ipa, plot

Except for a scree plot and a plot of factor loadings, which have been omitted, and a few minor edits I have made for clarity, these commands will produce the following output:

### REFERENCES

Gorsuch, Richard L. (1983) Factor Analysis. Hillsdale, NJ: Erlbaum

Morrison, Donald F. (1990) Multivariate Statistical Methods. New York: McGraw-Hill.

Rubenstein, Amy S. (1986). An item-level analysis of questionnaire-type measures of intellectual curiosity. Cornell University Ph. D. thesis.

## Total variance explained

Eigenvalue actually reflects the number of extracted factors whose sum should be equal to number of items which are subjected to factor analysis. The next item shows all the factors extractable from the analysis along with their eigenvalues.

The Eigenvalue table has been divided into three sub-sections, i.e. Initial Eigen Values, Extracted Sums of Squared Loadings and Rotation of Sums of Squared Loadings. For analysis and interpretation purpose we are only concerned with Extracted Sums of Squared Loadings. Here one should note that Notice that the first factor accounts for 46.367% of the variance, the second 18.471% and the third 17.013%. All the remaining factors are not significant (Table 5).

1. Component: As can be seen in the Communalities table 3 above, there 8 components shown in column 1 under table 3.
2. Initial Eigenvalues Total: Total variance.
3. Initial Eigenvalues % of variance: The percent of variance attributable to each factor.
4. Initial Eigenvalues Cumulative %: Cumulative variance of the factor when added to the previous factors.
6. Extraction Sums of Squared Loadings % of variance: The percent of variance attributable to each factor after extraction. This value is of significance to us and therefore we determine in this step that they are three factors which contribute towards why would someone by a particular product.
7. Extraction Sums of Squared Cumulative %: Cumulative variance of the factor when added to the previous factors after extraction.
8. Rotation of Sums of Squared Loadings Total: Total variance after rotation.
9. Rotation of Sums of Squared Loadings % of variance: The percent of variance attributable to each factor after rotation.
10. Rotation of Sums of Squared Loadings Cumulative %: Cumulative variance of the factor when added to the previous factors.

## Integrating Personality/Character Neuroscience with Network Analysis

### 3.1.1 Factor Analysis

Factor analysis conceptualizes the structure of associations in terms of latent variables or “factors” that give rise to observed, manifested, or measured variables. Factor analysis (and the closely-related principal components analysis) accomplishes this by identifying sets of observed variables that have more in common with each other than with other observed variables in the analysis. Factor analysis begins with a correlation matrix of bivariate associations among observed variables. Conceptually, factor analysis scans the matrix to identify which observed variables go together. It searches for clusters of observed variables that are strongly correlated with each other and that are weakly correlated with observed variables in other clusters. More technically, it extracts factors that account for as much variation in the observed variables as possible.

Exploratory factor analysis can be seen as steps that are often conducted in an iterative, back-and-forth manner: extraction, selection of a number of factors, rotation, and examination of factor loadings and (potentially) factor correlations. 79 The first step involves applying an “extraction method” that identifies combinations of observed variables, and these combinations are called factors. There are several types of extraction methods, but principal axis factor analysis and principal components analysis are the most frequently used. Extraction produces one eigenvalue for each potential factor, with as many potential factors as there are observed variables. A factor’s eigenvalue can be seen as the amount of variance in the observed variables explained by the factor.

In the second step, researchers decide on the number of factors that adequately summarize the relationships between the original variables. The “appropriate” number of factors can be ambiguous, but there are rules-of-thumb to aid in the process. 80 The rules-of-thumb generally depend on the relative magnitudes of the eigenvalues, but information from subsequent steps can be used to inform the decision (e.g., clarity of the factor loadings, see step 4).

In the third step, researchers usually use a “rotation” to clarify the psychological meaning of the factors. Rotation is intended to produce simple structure, a pattern of associations in which each observed variable associates strongly with (i.e., “loads on”) one factor and only one factor. There are two general types of rotation: orthogonal rotation generates factors that are uncorrelated, and oblique rotation generates factors that can be correlated with each other.

Fourth, researchers draw psychological conclusions based on key statistical outcomes, primarily factor loadings and (if relevant) interfactor correlations. Factor loadings are values representing associations between each observed variable and each factor. By noting which observed variables are most strongly associated with each factor, researchers can interpret the psychological meaning of the factors. There are several types of factor loadings that might be produced, but they are all roughly or literally on a correlational metric of −1 to +1, with values closer to −1 or +1 representing strong associations, and values close to 0 indicating no connection between an observed variable and a factor. Interfactor correlations are obtained when researchers extract more than one factor and implement an oblique rotation, and they reveal the degree to which the dimensions underlying the observed variables are themselves associated with each other.

## How To Calculate an Index Score from a Factor Analysis

One common reason for running Principal Component Analysis (PCA) or Factor Analysis (FA) is variable reduction.

In other words, you may start with a 10-item scale meant to measure something like Anxiety, which is difficult to accurately measure with a single question.

You could use all 10 items as individual variables in an analysis–perhaps as predictors in a regression model.

But you’d end up with a mess.

Not only would you have trouble interpreting all those coefficients, but you’re likely to have multicollinearity problems.

And most importantly, you’re not interested in the effect of each of those individual 10 items on your outcome. You’re interested in the effect of Anxiety as a whole.

So we turn to a variable reduction technique like FA or PCA to turn 10 related variables into one that represents the construct of Anxiety.

FA and PCA have different theoretical underpinnings and assumptions and are used in different situations, but the processes are very similar. We’ll use FA here for this example.

So let’s say you have successfully come up with a good factor analytic solution, and have found that indeed, these 10 items all represent a single factor that can be interpreted as Anxiety. There are two similar, but theoretically distinct ways to combine these 10 items into a single index.

### Factor Scores

Part of the Factor Analysis output is a table of factor loadings. Each item’s loading represents how strongly that item is associated with the underlying factor.

Some loadings will be so low that we would consider that item unassociated with the factor and we wouldn’t want to include it in the index.

But even among items with reasonably high loadings, the loadings can vary quite a bit. If those loadings are very different from each other, you’d want the index to reflect that each item has an unequal association with the factor.

One approach to combining items is to calculate an index variable via an optimally-weighted linear combination of the items, called the Factor Scores. Each item’s weight is derived from its factor loading. So each item’s contribution to the factor score depends on how strongly it relates to the factor.

Factor scores are essentially a weighted sum of the items. Because those weights are all between -1 and 1, the scale of the factor scores will be very different from a pure sum. I find it helpful to think of factor scores as standardized weighted averages.

### Factor-Based Scores

The second, simpler approach is to calculate the linear combination ignoring weights. Either a sum or an average works, though averages have the advantage as being on the same scale as the items.

In this approach, you’re running the Factor Analysis simply to determine which items load on each factor, then combining the items for each factor.

The technical name for this new variable is a factor-based score.

Factor based scores only make sense in situations where the loadings are all similar. In that case, the weights wouldn’t have done much anyway.

### Which Scores to Use?

It’s never wrong to use Factor Scores. If the factor loadings are very different, they’re a better representation of the factor. And all software will save and add them to your data set quickly and easily.

There are two advantages of Factor-Based Scores. First, they’re generally more intuitive. A non-research audience can easily understand an average of items better than a standardized optimally-weighted linear combination.

Second, you don’t have to worry about weights differing across samples. Factor loadings should be similar in different samples, but they won’t be identical. This will affect the actual factor scores, but won’t affect factor-based scores.

But before you use factor-based scores, make sure that the loadings really are similar. Otherwise you can be misrepresenting your factor.

## Factor Analysis

Factor analysis is a multivariate technique designed to analyze correlations among many observed variables and to explore latent factors. This chapter provides an overview of the evolution of factor analysis since the early 20 th century and a review of applied research in various fields. Today, factor analysis is widely used not only in the field of psychology but also in fields such as politics, literature, biology, and medical science. For example, in anthropology, morphological knowledge has been obtained through the factor analysis of correlations among the measured traits of human bones and the factor analysis of measured traits of animals and plants. The chapter introduces factor analysis model and deals with statistical inference in factor analysis. Formulae for the standard errors of parameter estimates in factor analysis are complicated or may not be expressed in closed forms. One of the advantages of the bootstrap methods is that they can be used without analytical derivations. However, caution is needed to use the bootstrap methods in factor analysis. The chapter also covers the various methods of factor rotation and estimation of factor scores.

## Communalities

The next item from the output is a table of communalities which shows how much of the variance (i.e. the communality value which should be more than 0.5 to be considered for further analysis. Else these variables are to be removed from further steps factor analysis) in the variables has been accounted for by the extracted factors. For instance over

90% of the variance in “Quality of product” is accounted for, while 73.5% of the variance in “Availability of product” is accounted for (Table 4).

## Content Preview

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables. In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally we like each variable to contribute significantly to only one component. A technique called factor rotation is employed towards that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

## Factor Analysis: A Short Introduction, Part 2–Rotations

An important feature of factor analysis is that the axes of the factors can be rotated within the multidimensional variable space. What does that mean?

Here is, in simple terms, what a factor analysis program does while determining the best fit between the variables and the latent factors: Imagine you have 10 variables that go into a factor analysis.

The program looks first for the strongest correlations between variables and the latent factor, and makes that Factor 1. Visually, one can think of it as an axis (Axis 1).

The factor analysis program then looks for the second set of correlations and calls it Factor 2, and so on.

Sometimes, the initial solution results in strong correlations of a variable with several factors or in a variable that has no strong correlations with any of the factors.

In order to make the location of the axes fit the actual data points better, the program can rotate the axes. Ideally, the rotation will make the factors more easily interpretable.

Here is a visual of what happens during a rotation when you only have two dimensions (x- and y-axis):

The original x- and y-axes are in black. During the rotation, the axes move to a position that encompasses the actual data points better overall.

Programs offer many different types of rotations. An important difference between them is that they can create factors that are correlated or uncorrelated with each other.

Rotations that allow for correlation are called oblique rotations rotations that assume the factors are not correlated are called orthogonal rotations. Our graph shows an orthogonal rotation.

Once again, let’s explore indicators of wealth.

Let’s imagine the orthogonal rotation did not work out as well as previously shown. Instead, we get this result:

 Variables Factor 1 Factor 2 Income 0.63 0.14 Education 0.47 0.24 Occupation 0.45 0.22 House value 0.39 0.25 Number of public parks in neighborhood 0.12 0.20 Number of violent crimes per year 0.21 0.18

Since our first attempt was an orthogonal rotation, we specified that Factor 1 and 2 are not correlated.

But it makes sense to assume that a person with a high “Individual socioeconomic status” (Factor 1) lives also in an area that has a high “Neighborhood socioeconomic status” (Factor 2). That means the factors should be correlated.

Consequently, the two axes of the two factors are probably closer together than an orthogonal rotation can make them. Here is a display of the oblique rotation of the axes for our new example, in which the factors are correlated with each other:

Clearly, the angle between the two factors is now smaller than 90 degrees, meaning the factors are now correlated. In this example, an oblique rotation accommodates the data better than an orthogonal rotation.

## I. Exploratory Factor Analysis (EFA)

• Introduction
1. Motivating example: The SAQ
2. Pearson correlation formula
3. Partitioning the variance in factor analysis
• Extracting factors
1. principal components analysis
2. common factor analysis
• principal axis factoring
• maximum likelihood
1. Simple Structure
2. Orthogonal rotation (Varimax)
3. Oblique (Direct Oblimin)