Problem 1. Basic Optimization, MLE for IID Poisson Data

Suppose \(y_i\) is a count then a very common model is to assume the Poisson disttribuion: \[ P(Y=y \;|\; \lambda) = \frac{e^{-\lambda} \, \lambda^y}{y!}, \; y = 0,1,2,\ldots \]

Given \(Y_i \sim Poisson(\lambda)\) iid, (that is, \(Y_i = y_i\)), what is the MLE of \(\lambda\)?

Problem 2. Constrained Optimization, Minimum Variance Portfolio

Suppose we are considering investing in \(p\) stocks where the uncertain return on the \(i^{th}\) stock is denoted by \(R_i\), \(i=1,2,\ldots,p\). Let \(R=(R_1,R_2,\ldots,R_p)'\).

A portfolio is a given by \(w=(w_1,w_2,\ldots,w_p)'\) where \(w_i\) is the fraction of wealth invested in asset \(i\).

The \(\{w_i\}\) must satisfy \(\sum w_i = 1\).

The return on the portfolio is then \[ P = w'R = \sum w_i R_i. \]

We want to find the global minimum variance portfolio: \[ \underset{w}{\min} \, Var(P), \;\; \text{subject to} \sum w_i = 1. \]

If we let \(\iota = (1,1,\ldots,1)'\), the vector of ones, and \(Var(R) = \Sigma\) then our problem is \[ \underset{w}{\min} \, w'\Sigma w \;\; \text{subject to} \; w' \iota= 1. \] Find the global minimum variance portfolio in terms of \(\Sigma\) and \(\iota\).

Problem 3. Polynomial Regression

A basic idea in nonlinear regression is to use polynomial terms.

With one \(x\) variable, this means we consider the models: \[ Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \ldots + \beta_p x_i^p + \epsilon_i \]

Using the simple used cars data (with \(n\)=1,000) with Y= price and x=mileage, find the best choice of \(p\).

Fit your chosen polynomial mode using all the data and plot the fit on top of the data. Do you like it? Also plot the fits for a \(p\) that is “way to big”. Whais wrong with it?

Problem 4. Regularized Regression

Let’s try ridge and LASSO on the car price data.

cd = read.csv("http://www.rob-mcculloch.org/data/usedcars.csv")
print(dim(cd))
## [1] 20063    11

Note that this version of the cars data has 20 thousand observations and 11 variables.

In addition many of the x variables are categorial so you will have to dummy them up.

sapply(cd,is.numeric)
##        price         trim   isOneOwner      mileage         year        color 
##         TRUE        FALSE        FALSE         TRUE         TRUE        FALSE 
## displacement         fuel       region  soundSystem    wheelType 
##         TRUE        FALSE        FALSE        FALSE        FALSE

displacement is actually categorical.

table(cd$displacement)
## 
##    3  3.2  3.5  3.7  4.2  4.3  4.6    5  5.4  5.5  5.8    6  6.3 
##  204  274  227  141  239 2787 2794 2661  356 9561  112  213  494

(a)

Use the LASSO to relate log of price to the features.

(b)

Use ridge regression to relate log of price to the features.

Note that is R, you use glmnet for LASSO and Ridge.

Here is the glmnet help on the parameter alpha.

alpha   :
The elasticnet mixing parameter, with 0 <= alpha <= 1 . The penalty is defined as
(1-alpha)/2||beta||_2^2+alpha||beta||_1.
alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.