################################################## ################################################## ###Machine Learning Modern computing power has enabled the development of powerful tools for uncovering complex high dimensional relationships. These tools form a basic component of the interrelated areas known as statistics, Machine Learning, big data, artificial intellegence and data science. ################################################## ##Books: Our two basic reference books are: ISL: ``An Introduction to Statistical Learning with Applications in R'' (James, Witten, Hastie, and Tibshirani). H2O: ``Practical Machine Learning with H20, Powerful, scalable techniques for AI and Deep Learning'' (Cook). ################################################## ##Topics: Our syllabus will be: 1. Optimal Bayes Rules and Naive Bayes (ISL 2.2.3; H2O: 10) 2. K Nearest Neighbors, the Bias-Variance Trade-off, and Cross Validation (ISL: 2,5 ; H20: 4) 3. Regression: Linear, Logit, and Multinomial (ISL: 3, 4.3; H20:7) 4. Metrics for Classification (ISL: 4.4.3; H2O: 4) 5. Regularized Generalized Linear Models (ISL: 6; H2O: 7) 6. Deep Neural Nets (H20: 8) 7. Trees and Ensemble Methods: Random Forests and Boosting (ISL: 8; H2O: 5,6) 8. Clustering (ISL: 10.3; H2O: 9) 9. Principal Components (ISL: 10.2; H2O: 9) 10. The Autoencoder (H2O: 9) Topics 1-7 are called directed learning in which we develop models to predict outcomes given observed variables. Topics 8-9 are called undirected learning in which we observe a set of variables and look for some kind of simplifying structure. Time permitting, additional topics might be: Graphical Models, Support Vector Machines, Recommender Systems, Graphical Methods, Causal Inference, Latent Dirichlet Allocation. ################################################## ##Prerequisites: Computing: Computing lies at the heart of our course. R and Python dominate applied statistical science. Primarily, we will use R. Hopefully, we will also spend some time with Python. The H2O book gives R and Python equal time so that you can want to work in just R or just Python. You can use any software environment you want but we will spend class time discussing the R implementations so this will be a good opportunity to learn R. If you are not comfortable working fairly intensively with a language like R or Python you should not take the class. If you have programming experience, you should be able to pick up R quickly. A background in statistics will be helpful but not essential. ################################################## ##Additional References: Other potentially useful books: ``Machine Learning with R'' (Brett Lantz) ``Mastering Predictive Analytics'' (Rui Miguel Forte) ``R for Data Science'' (Wickham, Grolemund) ``Machine Learning in Python'' (Bowles) ``Introduction to Machine Learning with Python'' (Muller and Guido) ################################################## ##Grades: Grades will be based on homework and Projects which may be done in groups.