| |
:: Course SummaryPrincipal Component Analysis (PCA) is a multivariate method which can identify redundancy or correlation among a set of measurements or variables for the purpose of data reduction. This powerful exploratory tool provides insightful graphical summaries with ability to include additional information as well.
This training course discusses the limitations of traditional descriptive tools for exploring datasets with several variables and presents how PCA can:
Summarize large sets of data
Identify structure, trends in the data
Identify redundancy, correlation in the data
Produce insightful graphical displays of the results :: Learning ObjectivesUpon completion of this training course, participants will be able to:
Know when, why and how to use Principal Component Analysis (PCA)
Format the data in the appropriate fashion to perform the analysis
Choose between the options available based on the specifics of the dataset
Understand the software output
Extract the pertinent and relevant information from the output
Produce graphical summaries of the data like the Scree plot, correlation circle and biplot
Interpret the results and draw conclusions on the variables and observations:: Target AudienceAn applied training session in statistics intended for all scientific staff who collect large datasets and who wish to graphically summarizing them and identify redundancy for the purpose of data reduction.:: PrerequisiteThis training course introduces the important ideas in statistics and multivariate data analysis. It assumes that participants have no previous knowledge of statistics or that they have not used it for a long time.:: Notes and Other InformationIf you are interested in attending the course on Principal Component Analysis and the applied one-day workshop, please consult the bundles category.
If you are interested in:
- acquiring or furthering your skills in multivariate data analysis methods
- putting the theory of multivariate analysis to practice with a wide array
of case studies
- and doing so with your very own software of choice
then our our summer
school on multivariate data analysis is what you are looking for!
All our training sessions are available on-site.
Contact us to learn more. | | |
:: Topics Covered
- Basic Statistical Concepts
- Traditional Methods for Analyzing a Set of Measurements
- Fundamentals of PCA
- Application to Real-Life Data: Determining Components
- Software Output Through a Case Study
- Summary Statistics and Other Table Output
- Univariate statistics
- Variability Explained by Each Component
- Correlation and Variance/Covariance Matrix
- Loadings of the Initial Variables on the Components
- Coordinates of the Objects on the Components
- Communalities
- Graphical Output
- The Scree Plot
- Components Loadings
- Correlation Circle
- Object Map with the New Components
- Biplots
- Specific Analysis Issues
- Choosing an Adequate Measure of Variability
- Selecting a Subset of Components
- A Step-by-Step Approach to PCA
- Case Studies
- Software Tools for Performing PCA
- Summary
:: Course ContentPrincipal Component Analysis, commonly known as PCA, is a powerful multivariate data analysis method. Its main purpose is to summarize large datasets by removing any redundancy in the data. Results are then presented using graphical tools.
As the name implies, multivariate methods are used to analyze data in situations where multiple variables of interest are measured. When several continuous variables are measured, rarely is the data collected on each observation completely independent from one another. In other words, there is often a degree of overlap in the information provided by the data.
The classic measure of redundancy or association between two continuous variables is the correlation coefficient. The method of comparing two variables at a time quickly becomes cumbersome when the number of variables becomes larger than two and infeasible when we wish to explore more complex relationships in the data.
Principal component analysis constitutes a tool for evaluating and presenting the redundancies between several variables and is often used to graphically represent and summarize the key features of a dataset. Thanks to this descriptive method, datasets with a large number of variables can be analyzed and summarized graphically, revealing the underlying structure of the data.
Several different fields use PCA: chemometrics, sensory evaluation, market research and R&D divisions in general where large sets of data are collected, etc.
This training course begins a review of the relevant statistical concepts, like defining the types of variables in the dataset, and of traditional descriptive statistical tools such as the mean, standard deviation, correlation coefficient, histogram, box-plot, scatter plot, etc. The limitations of the traditional one-variable-at-a-time technique are discussed. The definition and premise of Principle Component Analysis is introduced with the notion of linear combinations of variables and the reduction of redundancy of information in the data. Issues in data formatting and missing data are also broached. The concept of principal components and amount of explained variability or information is explained through simple instructive examples. The different software options available for conducting the analysis are explained and compared and recommendations are provided. Tools, such as the Scree plot, for determining the number of principal components to retain are illustrated through case studies. The graphical output provided by PCA is examined carefully: correlation circle, object map, biplot. Emphasis is placed on understand the software output, interpreting the results and drawing conclusions from the graphs.
| |