Where are we and what should I be doing?
April 1:
In the single layer neural net notes, about to discuss the basic optimization issues.
Should look at some of the tree code in python.
March 25:
About to do variable importance with trees.
March 18:
Finished notes on metrics, about to start trees.
March 16:
About to start multinomial regression.
March 11:
Let's start next time at Regularized Logistic Regression.
No homework due at this time.
March 4:
We are about to do Lasso with one x, in 7. Understanding the Lasso Solution.
About to do the slide on standardization.
Homework 4 is on the webpage and due March 11th.
February 25:
We are about to start section 5 of the Linear Models and Regularization notes.
February 18:
Finished on max a'x st ||x|| =1 slide.
Homework 3 deadline extended to Monday.
February 16:
We stopped at the beginning of section 5 of the MLE and a little optimization notes.
February 11:
We stopped at the beginning of 3. Statistical Decision Theory.
Homework 3 is on the webpage and is due February 19.
February 9:
finished notes on knn and bias variance tradeoff.
Next time we will have a look at the python code for knn with the Boston housing data.
This should give us a good idea of how sklearn works.
Then we will launch into the next set of notes, More Probability, Decision Theory, and the Bias-Variance Tradeoff.
Homework 2 due this Friday, February 12.
February 4:
Got to about slide 64 of the KNN, Bias-Variance Tradeoff notes.
Homework 2 on webpage, due February 12.
February 2:
About to start section 5, Cross Validation in the KNN notes.
January, 28:
Finished NB.
January, 26:
Just about to (finally!) do Naive Bayes.
Note that there was a bad typo in the homework1.
You want to see if year provides addition information given that mileage is in the model.
January, 21:
We stopped at about "Conditionals from Joints" in the Probability/Naive Bayes notes.
Homework 1 is on the webpage and it is due February 2.
January, 19:
We are working throught the R Hello World.
We are about to start the section on how R handles categorical variables.
January, 14:
Still have to look at the statmod example in the python hello world script.
Next we will go through the R hello world script.
Right now you need to be deciding what software you will use to take
the class. For example, we are looking at R and python.
Notable alternatives are Matlab and Julia, put I don't know if they support
all the tools we need.
If you have to learn R, have a look at the links on the webpage.
I think swirl is the easiest way to go.
I'm not sure what is the best way to learn python.
Some of the links I have on my python page look pretty good, for example the
A Whirlwind Tour of Python, by Jake VanderPlas
(A Whirlwind Tour of Python, I really like the book).
Again, the help on the Python/Numpy/Pandas/Scikit Learn/ pages is pretty impressive
and the help links in the Jupyter Notebook look great as well.
For my research I use a combination for R and C++.
I just been picking up Python "randomly".
Overall, I think R is easier, but if you have a programming background you might have a preference for Python.
Clearly, R has more statistics, but Python has scikitlearn and the neural net stuff seems to more a python thing.