hw4-494-sp23

Let’s use the LASSO and polynomial terms to fit the cars data (susedcars.csv).

cd = read.csv("http://www.rob-mcculloch.org/data/susedcars.csv")
names(cd)
## [1] "price"        "trim"         "isOneOwner"   "mileage"      "year"        
## [6] "color"        "displacement"

Note that only mileage and year are numeric features.
trim, isOneOwner, color, and displacement are all categorical.

Previously we saw that adding mileage squared helped the model.

But, if you look at the predictions for large values of mileage, the predicted price actually went up as the mileage went up.
That does not sound right!!

Build a model for predicting the car prices using all of the features available in the data set.
You will have to dummy (one-hot encode) all of the categorical features.
Add in at least the square, cube, and fourth powers of mileage and year and year times mileage.
That is at least 7 more “x” variables.

Use the LASSO to regularize the model, and see how good an out-of-sample rmse you can get.

Both R and sklearn have utilities for creating polynomial terms.
See what you can find !!

hw4-494-sp23

Rob McCulloch

2/27/23