Introduction to Multivariate Data Analysis

Author

Pulong Ma

Published

Dec, 4, 2025

Preface

This book contains the course notes for STAT 4750/5750 (Introduction to Multivariate Data Analysis) at Iowa State University. This course is designed for undergraduate students in statistics and data science and graduate students from the applied sciences with majors outside statistics. The prerequisite for this course includes STAT 3010 or STAT 3260 (for undergraduates) or STAT 5101 for graduate students. Knowledge of matrix algebra is recommended but not required to understand the topics covered in this book.

The course STAT 3010 (Intermediate Statistical Concepts and Methods) covers statistical concepts and methods used in the analysis of observational data. Topics include analysis of single sample, two sample and paired sample data; simple and multiple linear regression; model building and analysis of residuals; one-way ANOVA, tests of independence for contingency tables, and logistic regression.

The course STAT 3260 (Introduction to Business Statistics II) covers multiple regression, regression diagnostics, model building, applications in analysis of variance and time series, random variables, conditional probability, and data visualization.

The course STAT 5101 (Statistical Methods for Research Workers) was renamed from the previous course STAT 5870 starting Fall 2025. STAT 5101 is a first course in statistics for graduate students from the applied sciences, and covers topics including basic experimental designs and analysis of variance, analysis of categorical data, logistic and log-linear regression, likelihood-based inference, and the use of simulation.

Students who have the needed backgrounds shall find the statistical concepts and methods in this book easy to follow and understand. All the methods are illustrated with various examples with step-by-step solutions and extensive R code. Exercises are also given at each chapter to help students better understand the statistical methods and apply these methods for real data analysis by adapting the corresponding R code in the book.

The materials in this book are largely influenced by previous course notes taught by several instructors including Yumou Qiu and Kenneth Koehler in the Department of Statistics at Iowa State University.