##################################################
##################################################
###Machine Learning
Modern computing power has enabled the development of powerful
tools for uncovering complex high dimensional relationships.
These tools form a basic component of the interrelated areas known as
statistics, Machine Learning, big data, artificial intellegence and data science.
##################################################
##Books:
Our two basic reference books are:
ISL:
``An Introduction to Statistical Learning with Applications in R''
(James, Witten, Hastie, and Tibshirani).
H2O:
``Practical Machine Learning with H20,
Powerful, scalable techniques for AI and Deep Learning''
(Cook).
##################################################
##Topics:
Our syllabus will be:
1. Optimal Bayes Rules and Naive Bayes
(ISL 2.2.3; H2O: 10)
2. K Nearest Neighbors, the Bias-Variance Trade-off, and Cross Validation
(ISL: 2,5 ; H20: 4)
3. Regression: Linear, Logit, and Multinomial
(ISL: 3, 4.3; H20:7)
4. Metrics for Classification
(ISL: 4.4.3; H2O: 4)
5. Regularized Generalized Linear Models
(ISL: 6; H2O: 7)
6. Deep Neural Nets
(H20: 8)
7. Trees and Ensemble Methods: Random Forests and Boosting
(ISL: 8; H2O: 5,6)
8. Clustering
(ISL: 10.3; H2O: 9)
9. Principal Components
(ISL: 10.2; H2O: 9)
10. The Autoencoder
(H2O: 9)
Topics 1-7 are called directed learning in which we develop models
to predict outcomes given observed variables.
Topics 8-9 are called undirected learning in which we observe a set
of variables and look for some kind of simplifying structure.
Time permitting, additional topics might be:
Graphical Models, Support Vector Machines, Recommender Systems,
Graphical Methods, Causal Inference, Latent Dirichlet Allocation.
##################################################
##Prerequisites:
Computing:
Computing lies at the heart of our course.
R and Python dominate applied statistical science.
Primarily, we will use R.
Hopefully, we will also spend some time with Python.
The H2O book gives R and Python equal time so that you
can want to work in just R or just Python.
You can use any software environment you want but we will
spend class time discussing the R implementations so this
will be a good opportunity to learn R.
If you are not comfortable working fairly intensively with
a language like R or Python you should not take the class.
If you have programming experience, you should be able to pick
up R quickly.
A background in statistics will be helpful but not essential.
##################################################
##Additional References:
Other potentially useful books:
``Machine Learning with R''
(Brett Lantz)
``Mastering Predictive Analytics''
(Rui Miguel Forte)
``R for Data Science''
(Wickham, Grolemund)
``Machine Learning in Python''
(Bowles)
``Introduction to Machine Learning with Python''
(Muller and Guido)
##################################################
##Grades:
Grades will be based on homework and Projects
which may be done in groups.