Where are we and what should I be doing?


April 1:

In the single layer neural net notes, about to discuss the basic optimization issues.

Should look at some of the tree code in python.


March 25:

About to do variable importance with trees.


March 18:

Finished notes on metrics, about to start trees.


March 16:

About to start multinomial regression.


March 11:

Let's start next time at Regularized Logistic Regression.

No homework due at this time.


March 4:

We are about to do Lasso with one x, in 7. Understanding the Lasso Solution.
About to do the slide on standardization.


Homework 4 is on the webpage and due March 11th.



February 25:

We are about to start section 5 of the Linear Models and Regularization notes.


February 18:

Finished on max a'x st ||x|| =1 slide.

Homework 3 deadline extended to Monday.


February 16:

We stopped at the beginning of section 5 of the MLE and a little optimization notes.


February 11:

We stopped at the beginning of 3. Statistical Decision Theory.

Homework 3 is on the webpage and is due February 19.


February 9:

finished notes on knn and bias variance tradeoff.
Next time we will have a look at the python code for knn with the Boston housing data.
This should give us a good idea of how sklearn works.
Then we will launch into the next set of notes, More Probability, Decision Theory, and the Bias-Variance Tradeoff.

Homework 2 due this Friday, February 12.


February 4:

Got to about slide 64 of the KNN, Bias-Variance Tradeoff notes.
Homework 2 on webpage, due February 12.


February 2:

About to start section 5, Cross Validation in the KNN notes.


January, 28:

Finished NB.


January, 26:

Just about to (finally!) do Naive Bayes.

Note that there was a bad typo in the homework1.
You want to see if year provides addition information given that mileage is in the model.


January, 21:

We stopped at about "Conditionals from Joints" in the Probability/Naive Bayes notes.

Homework 1 is on the webpage and it is due February 2.


January, 19:

We are working throught the R Hello World.

We are about to start the section on how R handles categorical variables.


January, 14:

Still have to look at the statmod example in the python hello world script.

Next we will go through the R hello world script.

Right now you need to be deciding what software you will use to take
the class. For example, we are looking at R and python.
Notable alternatives are Matlab and Julia, put I don't know if they support
all the tools we need.

If you have to learn R, have a look at the links on the webpage.
I think swirl is the easiest way to go.

I'm not sure what is the best way to learn python.
Some of the links I have on my python page look pretty good, for example the
A Whirlwind Tour of Python, by Jake VanderPlas (A Whirlwind Tour of Python, I really like the book).
Again, the help on the Python/Numpy/Pandas/Scikit Learn/ pages is pretty impressive
and the help links in the Jupyter Notebook look great as well.

For my research I use a combination for R and C++.
I just been picking up Python "randomly".
Overall, I think R is easier, but if you have a programming background you might have a preference for Python.
Clearly, R has more statistics, but Python has scikitlearn and the neural net stuff seems to more a python thing.