Statistical Learning Slides

Here are lecture slides from a 2-day short course on Statistical Machine Learning.

Day 1 Slides–Session-1-Stat Learning
Day 2 Slides–Session-2-Stat Learning

Here is a set of slides to accompany the Statistical Learning Notes.

Chapter 1-Introduction, Generalities, and Some Background Material

Section 1-Overview/Context

101-Introduction–Notation and Terminology, What is New Here, Representing What is Known

102-Optimal Predictors–Optimal (Unrealizable/Theoretical) Predictors

103-Nearest Neighbors–Nearest Neighbor Rules

104-Error Decompositions–General and SEL Decompositions of Expected Prediction Loss

105-Cross-Validation–Cross-Validation

106-Predictor Choice and CV–Choice of Predictor Complexity and Cross-Validation

107-Penalization and Complexity–Penalized Training Error Fitting and Predictor Complexity

108-Optimal Features for Classification–Classification Models and Optimal Features

109-Quantitative Features for Classification–Quantitative Representation of Qualitative Features for Classification

117-Document Features–Document Features

110-Functions as Features and Kernels–Abstract Feature Spaces (of Functions) and “Kernels”

111-Kernel Mechanics–Making Kernels

112-Feature Engineering (etc.) Perspective–Feature Selection/Engineering and Data “Pre-processing: Some perspective and Prediction of Predictor Efficacy

113-More Optimal 2 Class Classification–More on the Form of an Optimal 0-1 Loss Classifier for K=2

114-Other 2 Class Losses–Other Prediction Problems in 2-Class Classification Models

115-Voting Functions for 2 Class Classification–Voting Functions, Losses for them, and Expected 0-1 Loss

116-Density Estimation and Classification–Density Estimation and Approximately Optimal and Naive Bayes Classification

117-Document Features–Document Features

Section 2-Some Linear Theory, Linear Algebra, and Principal Components

201-Inner Product Spaces–Inner Product Spaces

202-Gram Schmidt and QR–The (General) Gram-Schmidt Process and the QR Decomposition of a rank=p Matrix X

203-SVD of X–The Singular Value Decomposition of X

204-SVD and Inner Product Spaces–The Singular Value Decomposition and General Inner Product Spaces

205-Ordinary PCs–“Ordinary” Principal Components

206-Kernel PCs–“Kernel” Principal Components

207-Graphical Spectral Features–“Graphical Spectral” Features

Chapter 2-Supervised Learning 1: Basic Prediction Methodology

Section 3-(Non-OLS) SEL Linear Predictors

301-Ridge Regression–Ridge Regression

302-LASSO etc–The Lasso, Etc.

303-PCR–Principal Components Regression

304-PLS–Partial Least Squares

Section 4-SEL Linear Predictors Using Basis Functions

401-p=1 Wavelet Bases–p=1 Wavelet Bases

402-p=1 Regression Splines–p=1 Piecewise Polynomials and Regression Splines

403-Tensor Product Bases and Prediction–Basis Functions and p-Dimensional Inputs (Tensor Product Bases and MARS)

Section 5-Smoothing Splines and SEL Prediction

501-p=1 Smoothing Splines–p=1 Smoothing Splines

502-Multi-Dimensional Smoothing Splines–Multi-Dimensional Smoothing Splines

503-Penalized Fitting in N-space–An Abstraction of Smoothing Splines and Penalized Fitting to N Responses

504-Graph-Based Penalized Smoothing and Semi-supervised Learning–Graph-Based Penalized Fitting/Smoothing (and Semi-Supervised Learning)

Section 6-Kernel and Local Regression Smoothing Methods and SEL Prediction

601-1D Kernel and Local Regression Smoothers–One-Dimensional Kernel and Local Regression Smoothers

602-Local Regression Smoothing in p Dimensions–Local Regression Smoothing in p Dimensions

Section 7-High-Dimensional Use of Low-Dimensional Smoothers and SEL Prediction

701-Structured Regression Functions–Additive Models and Other Structured Regression Functions

702-Projection Pursuit Regression–Projection Pursuit Regression

Section 8-Highly Non-Linear Parametric Regression Methods

801-Neural Network Regression–Neural Network Regression

802-Neural Network Classification–Neural Network Classification

803-Neural Network Fitting–The Back-Propagation Algorithm

804-Regularization of Neural Network Fitting–Formal Regularization of Neural Network Fitting

805-Convolutional Neural Networks–Convolutional Neural Networks

806-Recurrent Neural Networks–Recurrent Neural Networks

807-Radial Basis Function Networks–Radial Basis Function Networks

Section 9-Prediction Methods Based on Rectangles: Trees and PRIM

901-CART and PRIM–Prediction Based on Rectangles

902-Regression Trees–Regression Trees

903-Classification Trees–Classification Trees

904-Optimal Subtrees–Optimal Subtrees

905-Variable Importance for Tree Predictors–Measuring the Importance of Inputs for Trees

906-PRIM–PRIM

Section 10-Predictors Built on Bootstrap Samples

1001-Bagging Generalities–Bagging in General

1002-Random Forests–Random Forests: Special Bagging of Tree Predictors

1003-Measuring the Importance of Inputs for Bagged Predictors–Measuring the Importance of Inputs for Bagged Predictors

1004-Boruta–The Boruta Wrapper/Heuristic for Input Variable Selection

1005-Bumping and Active Set Selection–Bumping and “Active Set Selection”

Section 11-“Ensembles” of Predictors

1101-Bayes Model Averaging–Bayesian Model Averaging for Prediction

1102-Stacking for SEL and 0-1 Loss–Stacking: SEL … and 0-1 Loss

1103-Generalized Stacking and Deep Structures–“Generalized Stacking” and “Deep” Structures for Prediction

1104-Boosting-Successive Approximation in SML–Boosting: Successive Approximation in Prediction

1105-Boosting-AdaBoost.M1–AdaBoost.M1

1106-Qunilan’s Cubist and Divide and Conquer Strategies–Qinlan’s Cubist

Chapter 3-Supervised Learning II: More on Classification (Mostly Linear Methods)

Section 12-Basic Linear Methods

1201-Linear and Quadratic Discriminant Analysis–Linear (and a Bit on Quadratic) Discriminant Analysis

1202-Dimension Reduction in LDA–Dimension Reduction in LDA

1203-Logistic Regression and Classification–Logistic Regression

Section 13-Support Vector Machines

1301-SVMs 1 Maximum Margin Classifiers–The Linearly Separable Case: Maximum Margin Classifiers

1302-SVMs 2 Support Vector Classifiers–The Linearly Non-Separable Case: Support Vector Classifiers

1303-SVMs 3A Support Vector Machines Heuristics–SV Classifiers and Kernels: Heuristics

1304-SVMs 3B Support Vector Machines Penalized Fitting–SV Classifiers and Kernels: A Penalized Fitting Function Space Argument

1305-SVMs 3C Support Vector Machines Geometry–SV Classifiers and Kernels: A Function Space Geometry Argument

1306-SVMs 3D Support Vector Machines Perspective–SVMs : Some Perspective

1307-SVMs 4 Support Vector Machines Other Related Issues–Other SV Stuff

Section 14-Prototype and (More on) Nearest Neighbor Methods of Classification

1401-Prototype and Nearest Neighbor Classification–Prototype and Nearest Neighbor Methods

Chapter 4-More Theory Regarding Supervised Learning

Section 15-Reproducing Kernel Hilbert Spaces: Penalized/Regularized Fitting and Bayes Prediction

1501-RKHSs and Smoothing Splines–RKHSs and p=1 Cubic Smoothing Splines

1502-Development of Kernels from Linear Functionals and Differential Operators–What is Possible Beginning from Linear Functionals and Linear Operators for p=1

1503-Prediction Theory Beginning from a Kernel–What is Common Beginning Directly from a Kernel

1504-Gaussian Spatial Processes Kernels and Predictors–Gaussian Process “Priors,” Bayes Predictors, and RKHs

Chapter 5-Unsupervised Learning Methods

Section 17-Some Methods of Unsupervised Learning

1701-Association Rules-Market Basket Analysis–Association Rules/Market Basket Analysis

1702-The Apriori Algorithm and its Uses–The “Apriori” Algorithm

1704-Clustering Generalities–Clustering

1705-Partitioning Methods of Clustering–Clustering: Partitioning Methods (“Centroid”-Based Methods)

1706-Hierarchical Clustering Methods–Clustering: Hierarchical Methods

1707-Model-Based Clustering–Clustering: (Mixture) Model-Based Methods

1708-Biclustering–Clustering: Biclustering

1709-Self-Organizing Maps–Clustering: Self-Organizing Maps

1710-Multi-Dimensional Scaling–Multi-dimensional Scaling

1711-More Principal Components Ideas–Sparse Principal Components, Non-Negative Matrix Factorization, Archetypal Analysis, Independent Component Analysis

–Principal Curves and Surfaces

1712-Original PageRanks–Original Google™ PageRanks

Chapter 6-Miscellanea

The materials on this site may be used without charge for non-commercial personal educational purposes. Any copies made of the materials must bear printed acknowledgment of their Analytics Iowa LLC source.

Pages