by Kilem L. Gwet, PhD

In this book, you will learn how to perform Principal Component Analysis (PCA) using the
R software. R is used through an Integrated Development Environment (IDE) called RStudio.

Although having some knowledge of R will help, this book assumes no specific prerequisite, except for good
computer and analytical skills. PCA is a collection of advanced statistical techniques. Therefore, some
familiarity with statistical thinking is essential. However, this book is a non-mathematical presentation
of Principal Component Analysis, which is one of the essential skills that you must have to become a
productive analyst in the new world of data science.

**Preliminaries**: Preliminaries include the preface, about me, and the table of contents**Chapter 1**: Basics of Principal Components**Chapter 2**: Overview of R with RStudio**Chapter 3**: Computing Principal Components with R**Chapter 4**: Visualization of Principal Components**Chapter 5**: Basic Use of Principal Components**Chapter**6: Statistical Analysis Based on Principal Components**Appendix**: Performing Linear Discriminant Analysis with R**Back Matters**: Includes bibliography, author index, subject index

**R Scripts and Datasets for Download**

Chapter 1: Basics of Principal Components

- statex77.csv: Data sets related to the 50 states of the United States of America (in csv format)
- mtcars.csv: Motor Trend Car Road Test (in csv format)
- iris.csv: Edgar Anderson’s Iris Data (in csv format) -
- wdbc.data.csv: Breast Cancer Wisconsin (Diagnostic) Data Set (in csv format)
- x1x2data.csv: Measurements of 2 variables $X_1$ and $X_2$ taken on 50 subjects (in csv format)
- x1x2pcs.csv: Principal component scores calculated for 50 subjects (in csv format)

Chapter 2: Overview of R with RStudio

- employee.xls: Employee compensation data by gender, age and marital status in Excel.
- employee.csv: Employee compensation data by gender, age and marital status in CSV (Comma delimited) format.

Chapter 3: Computing Principal Components with R

- HWdata.xlsx: Excel file containing height and weight measurements of 15 individuals .

Chapter 4: Visualization of Principal Components

- wdbc.data.csv: Wisconsin Diagnostic Breast Cancer Dataset, in csv format.
- wdbc.xlsx: Wisconsin Diagnostic Breast Cancer Dataset, in Excel format.
- employee_demo.csv: Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).
- employee_name.csv: Names of 7 employees along with their State of residency, Employment Status, and Employee Id
- gdp_by_state.csv: 2020 US Quarterly GDP Data (in millions of dollars) and 2019 Population Data by State.

Chapter 5: Basic Use of Principal Components

- employee_demo.csv: Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).

Chapter 6: Statistical Analysis Based on Principal Components

- employee_demo.csv: Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).