Machine Learning/Statistical Learning, Spring 2021
Course Information
Course Name:
Machine Learning / Statistical Learning:
Instructor:
Instructor: Robert McCulloch, robert.mcculloch@asu.edu
Office hours (online): Thursday 9am.
TA: Shuyi Li, shuyili3@asu.edu
M, 2:30-4:00 pm; Tu, 2:00:3:30 pm in MCC (online)
Where we are and what I should be doing?
where and what
Miscellaneous
The Ultimate Scikit-Learn Machine Learning Cheatsheet
Scikit-Learn Cheat Sheet (2021)
Syllabus
Syllabus
Homework
How_to_Submit_Homework_in_Canvas.pdf
Homework 1, Due February 2
Homework 1, python code
Homework 2, Due February 12
Homework 2, solutions
Homework 3, Due February 22
Homework 3, solutions
Homework 3, solutions, python script
Homework 4, March 11
Homework 5, April 22
Notes
Readings:
Chapter 2 of either ISLR (Introduction to Statistical Learing)
and/or Chapter 2 of ESL (Elements of Statistical Learning)
would be helpful for the first two sections of notes.
But just kind of skim, you don't need to understand everthing in these
chapters at this point.
Note: I will sometimes give you the R code I used to make the notes.
In the R code, I use the following files of simple R functions:
robfuns.R
rob-utility-funs.R
mlfuns.R
lift-loss.R (simple lift functions and deviance loss)
So, I might have the line source("../../robfuns.R") near the top of a script.
Simply replace the ../../ with the correct path to where you have put the file.
You may also see source("notes-funs.R").
This is just to write stuff out in a way I can easily pop into a latex script.
It just has one function printfl which you can replace with a simple R print.
For completness here is the file:
notes-funs.R
Note that often my scripts are designed so that if you set dpl=FALSE at the top,
then you can just source the script and the whole thing will run.
If dpl=FALSE then printfl is just print.
This setup may look weird, but the scripts are actually designed to run in batch mode.
Probability Review and Naive Bayes
Simple Illustration of Naive Bayes on the sms data (pdf),
Simple Illustration of Naive Bayes on the sms data (Rmd)
This an ascii R script where I play around with the Naive Bayes text analysis in more detail:
naive-bayes_notes.R
Naive Bayes in Python:
NB_in_python.html
NB_in_python.py
NB_in_python.ipynb
KNN and the Bias Variance Tradeoff
R script to illustrate the bias-variance tradeoff
Simple R code to do cross validation: docv.R
Simple R code to get fold id's for cross validation
Python code to replicate what is the notes for the Boston example using sklearn
Note:
Both ESL and ISLR have an introductary overview chapter 2 in which general ideas are discussed,
then do regression and some other basic models and then later
discuss the practial (e.g. cross validation) and theoretical (e.g MLE) ideas
(ISLR Chapter 5, ESL Chapters 7 and 8).
I would encourage you do ``skip ahead'' and read/skim the discussion of cross-validation (and other topics).
More Probability, Decision Theory, and the Bias-Variance Tradeoff
MLE and Optimization
MLE and a little optimization
Introduction to Bayesian Statistics:
Introduction to Bayesian Statistics and the Beta/Bernoulli Inference
Introduction to Bayesian Regression
Bayesian Regression and Ridge Regression
Regularized Linear Regression:
Linear Models and Regularization, 598
The 1se rule
Simple python script to do Ridge and Lasso
Simple R script to do Ridge and Lasso
Properties of Linear Regression
Properties of Linear Regression (.Rmd)
R script to illustrate all subsets regression is package leaps
R script for ridge and lasso using glmnet, Hitters Data.
R script for reading in diabetes data and looking at y.
R script for Lasso on Diabetes.
R script for comparing Lasso,Ridge,Enet.
R script for forwards stepwise on Diabetes.
do-stepcv.R: R functions for doing stepwise.
R script to learn about formulas and model.matrix
(see Chapter 11, Statistical models in R, in the R-introduction Manual)
Note: AIC and BIC can be confusing. You can see different versions of the formulas.
Since you pick the smallest one, versions that differ by a constant are all correct.
This discusses things correctly and tells you how it works in R:
Cp, AIC, BIC or
the web link .
This link shows how confused the AIC vs. BIC discussion is:
  AIC vs. BIC
R script for seeing Ridge vs Lasso in simple Problem.
R script for plotting Ridge and Lasso shrinkage (thresholding function).
R script to see Lasso coefs plotted against lambda.
R script to see Ridge coef plotted against lambda.
Regularized Logistic Regression (598)
Simple script to illustrate regularized logit in R
Simple script to illustrate regularized logit in python
R script for Regularized logit fit to simulated data.
R script Lasso fit to w8there data.
R script Ridge fit to w8there data.
Classification Metrics
fglass.R: script using the forensic glass data
tab.R: script using the tabloid data
Trees
simple tree in R
simple tree in python
Random Forests and Gradient Boosting on the California Housing Data in python
Classification with Logit, Trees, Random Forests, and Gradient Boosting in sklearn
xgboost in python
tree-bagging.R
knn-bagging.R
boost-demo.R
R package for plotting rpart trees
Single Layer Neural Nets
Simple Boston Housing example, single layer, L2, keras, in python
simple example of R package magrittr used in keras to pipe
Simple Boston Housing example, single layer, L2, keras, in R
See "Deep Learning in Python, by Chollet", or "Deep Learning with R, Chollet and Allaire".
Single Layer Neural Nets (R code)
Single Layer Neural Nets XOR (R code)
plot.nnet.R
Deep Neural Nets
Simple single layer gradient computation with chain rule
Backpropagation
In python, Movie Reviews example in keras with 2 layers
In python, simple example of mnist data using keras
In R, keras_simple-Boston-lstat.R
Good discussion of Back-prop
Nice website with an overall discussion and pictures of the uncovered features
Nice visualization of a neural network
Nice tutorial on NN in R and the neuralnet R package
h2o
h2o : click on latest stable release
The R install instructions from the above link
Install h2o (and links to documentation)
h2o in R tutorial
Github for Darren Cook book on h2o
R examples h2o
Simple Example R script for Deep Neural nets in h2o
Similar to the simple script but done in Rmarkdown
the Rmarkdown
Do XOR with h2o and Deep Learning
Do Tabloid with h2o and Deep Learning
yet another version of lift code
deviance loss
Visualize MNIST digits
Fit MNIST digits
R examples keras
See "Deep Learning with R", by Chollet and Allaire.
Note that "Deep Learning with Python", by Chollet is a parallel book.
R-bloggers on classifying digits with keras in R
simple example of R package magrittr used in keras to pipe
keras_simple-Boston-lstat.R
Keras Cheat Sheet
Clustering: Hierarchical and K-means
Lectures:
Undirected Learning
Cereal Data
Distance
kmeans
Dimension Reduction: Principal Components and the Autoencoder
Lectures:
Introduction
Principal Components
Autoencoder
Latent Dirichlet Allocation
R
Information on R
Python
Information on Python
Sample Projects
A nice project option is the drug discovery data I used in ``BART: Bayesian Additive Regression Trees''.
The data is on Rob's data page: rob's data page.
Here is the BART paper where we used the Drug Discovery data, see section 5.3:
BART paper
Some old projects:
Predicting Soybean Yield (pdf)
Predicting Soybean Yield (Rmd)
Drug Discovery Data
Credit Card Fraud
Autoencoder on the MNIST Data
Cancer Classification using Microarray Data
Note that the data for this project is on Rob's data webpage, search for Janss.
Data
Rob's Data Web Page
Sources for example data sets:
This has many data sets collected from different R packages but smallish n and p:
R data sets
Note:: this copied from page 34 of
``Hands-On Machine Learning with Scikit-Learn and TensorFlow'' by Geron.
UC Irvine Machine Learning Repository
Kaggle Data Sets
Amazon's AWS datasets
Meta Portals
dataportals.org
open data monitor