Introductory Principal Component Analysis Using R
by
Kilem L. Gwet, PhD
Return to the Book Collection
In this book, you will learn how to perform Principal Component Analysis (PCA) using the
R software. R is used through an Integrated Development Environment (IDE) called RStudio.
Although having some knowledge of R will help, this book assumes no specific prerequisite, except for good
computer and analytical skills. PCA is a collection of advanced statistical techniques. Therefore, some
familiarity with statistical thinking is essential. However, this book is a non-mathematical presentation
of Principal Component Analysis, which is one of the essential skills that you must have to become a
productive analyst in the new world of data science.
Introductory Principal Component Analysis Using R
by Kilem L. Gwet, PhD
R Scripts and Datasets for Download
Chapter 1: Basics of Principal Components
- statex77.csv: Data sets related
to the 50 states of the United States of America (in csv format)
- mtcars.csv: Motor Trend Car Road
Test (in csv format)
- iris.csv: Edgar Anderson’s Iris Data (in csv format) -
- wdbc.data.csv: Breast Cancer
Wisconsin (Diagnostic) Data Set (in csv format)
- x1x2data.csv: Measurements of 2
variables $X_1$ and $X_2$ taken on 50 subjects (in csv format)
- x1x2pcs.csv: Principal component
scores calculated for 50 subjects (in csv format)
Chapter 2: Overview of R with RStudio
- employee.xls: Employee
compensation data by gender, age and marital status in Excel.
- employee.csv: Employee
compensation data by gender, age and marital status in CSV (Comma
delimited) format.
Chapter 3: Computing Principal Components with R
- HWdata.xlsx: Excel file
containing height and weight measurements of 15 individuals .
Chapter 4: Visualization of Principal Components
- wdbc.data.csv: Wisconsin
Diagnostic Breast Cancer Dataset, in csv format.
- wdbc.xlsx: Wisconsin Diagnostic Breast Cancer Dataset,
in Excel format.
- employee_demo.csv:
Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).
- employee_name.csv: Names
of 7 employees along with their State of residency, Employment Status,
and Employee Id
- gdp_by_state.csv: 2020 US
Quarterly GDP Data (in millions of dollars) and 2019 Population Data by State.
Chapter 5: Basic Use of Principal Components
- employee_demo.csv:
Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).
Chapter 6: Statistical Analysis Based on Principal Components
- employee_demo.csv:
Demographic data of 7 employess (Employee Id, Gender, Age and Marital status).