Principal Component Analysis (PCA)
- Reduces the number of original variables while retaining as much information as possible.
- Creates new variables called principal components, which are combinations of the original variables and are ranked by importance.
- Useful when original variables are correlated, to identify the most important factors driving outcomes.
Definition
Section titled “Definition”Principal Component Analysis (PCA) is a statistical technique used to reduce the number of variables in a dataset while still retaining as much information as possible. It does this by finding a new set of variables, called principal components, that are a combination of the original variables. The principal components are ranked by their importance, with the first principal component being the most important and the last principal component being the least important.
Explanation
Section titled “Explanation”PCA transforms the original correlated variables into a new set of uncorrelated variables (principal components). Each principal component is a linear combination of the original variables. Components are ordered so that the first captures the greatest amount of variation in the data, the second captures the next greatest amount, and so on. By selecting the top principal components, PCA reduces dataset complexity while preserving the most relevant information.
Examples
Section titled “Examples”Retail customer data
Section titled “Retail customer data”A retail company may have customer data such as age, income, location, and spending habits. These variables can be correlated (for example, customers with higher incomes may tend to spend more). Using PCA, the company could identify the most important customer characteristics for predicting spending habits. The first principal component might combine age and income as the most important predictors; the second principal component might combine location and spending habits as less important but still contributory factors. Identifying these components helps the company understand drivers of spending habits and make more informed business decisions.
Financial portfolio data
Section titled “Financial portfolio data”A portfolio of stocks may include indicators such as return on investment, price-to-earnings ratio, and market capitalization. These variables can also be correlated (for example, stocks with higher price-to-earnings ratios may tend to have higher returns on investment). PCA can identify the most important financial indicators for predicting returns. The first principal component might combine return on investment and price-to-earnings ratio as primary predictors; the second principal component might combine market capitalization and returns as secondary contributors. This helps a portfolio manager understand which factors drive returns and make more informed investment decisions.
Use cases
Section titled “Use cases”- Finance
- Marketing
- Social science research
Related terms
Section titled “Related terms”- Principal components
- Correlation