glmnet for L1 and L2 Linear Regression in R

Let’s use the glmnet package in R to do L1 regression in R.
To do L2 regression you would just change the alpha parameter used below from 1 to 0.

The same package can also be used for regularized logistic, poisson, and multinomial regression.
The package can do the elastic net with L1 (LASSO) and L2 (ridge) as special cases.

The package does a few important things automatically:

Note that when you elect to have glmnet stardardize the x’s, the coefficients are returned on the original scale.

Let’s used our beloved used cars data set as an example.

Note that the vignetter for glmnet is pretty good.
Try browseVignettes in R to find it.

Get the data

## read in the data
cd = read.csv("http://www.rob-mcculloch.org/data/susedcars.csv")

## a lot of the variables are categorical
sapply(cd,is.factor)
##        price         trim   isOneOwner      mileage         year        color 
##        FALSE        FALSE        FALSE        FALSE        FALSE        FALSE 
## displacement 
##        FALSE
sapply(cd,is.numeric)
##        price         trim   isOneOwner      mileage         year        color 
##         TRUE        FALSE        FALSE         TRUE         TRUE        FALSE 
## displacement 
##        FALSE
# for example, let's look at trim, trim is categorical with 11 categories or levels.
table(cd$trim)
## 
##   430   500   550 other 
##   143   127   591   139

#let's make the categorical variables factors in R
iifac = c(2,3,6,7)
for(i in iifac) cd[,i] = as.factor(cd[,i])
summary(cd)
##      price          trim     isOneOwner    mileage            year     
##  Min.   :  995   430  :143   f:841      Min.   :  1997   Min.   :1994  
##  1st Qu.:12995   500  :127   t:159      1st Qu.: 40133   1st Qu.:2004  
##  Median :29800   550  :591              Median : 67920   Median :2007  
##  Mean   :30583   other:139              Mean   : 73652   Mean   :2007  
##  3rd Qu.:43992                          3rd Qu.:100138   3rd Qu.:2010  
##  Max.   :79995                          Max.   :255419   Max.   :2013  
##     color     displacement
##  Black :415   4.6  :137   
##  other :227   5.5  :476   
##  Silver:213   other:387   
##  White :145               
##                           
## 
#Let’s try a simple regression:
# R will automaticall dummy up the factor trim
sreg = lm(price~mileage+trim,cd)
summary(sreg)
## 
## Call:
## lm(formula = price ~ mileage + trim, data = cd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -24984  -6270  -1182   5068  36846 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.062e+04  1.239e+03  32.773   <2e-16 ***
## mileage     -2.669e-01  8.499e-03 -31.403   <2e-16 ***
## trim500     -7.689e+02  1.143e+03  -0.673    0.501    
## trim550      1.411e+04  1.006e+03  14.019   <2e-16 ***
## trimother    9.966e+03  1.145e+03   8.706   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9372 on 995 degrees of freedom
## Multiple R-squared:  0.7419, Adjusted R-squared:  0.7409 
## F-statistic:   715 on 4 and 995 DF,  p-value: < 2.2e-16

# we can see the dummies just give a different intercept for linear relationship
#    between price and mileage for each level of trim
plot(cd$mileage,sreg$fitted.values,col=as.integer(cd$trim))

Look’s like the trim variable could matter!!!

Now let’s try the LASSO with all the variables.
First we need to get the \(X\) matrix since glmnet does not use forumulas.

# Now let’s try the LASSO. We will use the package glmnet.
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-4

#glmnet does not use the “formula approach”. 
#You have to give it a matrix x and response vector y corresponding to the model y=Xβ+ϵ.

#For our simple example above, this means we need the matrix with all the dummies. 
#R has a simple way of getting the x matrix given the model formula using the model.matrix function.

x = model.matrix(price~mileage+trim,cd)[,-1] # [,-1] to drop the first column which is the one vector
print(dim(x))
## [1] 1000    4
head(x)
##   mileage trim500 trim550 trimother
## 1   36858       0       1         0
## 2   46883       0       1         0
## 3  108759       0       1         0
## 4   35187       0       1         0
## 5   48153       0       1         0
## 6  121748       1       0         0
#Now we are ready to try using glmnet to run the LASSO. 
#Note that the vignette for glmnet is pretty good (see > browseVignettes() in R.)
#We will use all the x variables so I will call model.matrix again but this time use the data frame cd. 
#We first call cv.glmnet. This will do the cross validation needed to choose λ.

y = cd$price/1000
x = model.matrix(price~.,cd)[,-1]
print(dim(x))
## [1] 1000   11
head(x)
##   trim500 trim550 trimother isOneOwnert mileage year colorother colorSilver
## 1       0       1         0           0   36858 2008          0           1
## 2       0       1         0           0   46883 2012          0           0
## 3       0       1         0           0  108759 2007          0           0
## 4       0       1         0           0   35187 2007          0           0
## 5       0       1         0           0   48153 2007          0           0
## 6       1       0         0           0  121748 2002          1           0
##   colorWhite displacement5.5 displacementother
## 1          0               1                 0
## 2          0               0                 0
## 3          1               1                 0
## 4          0               1                 0
## 5          0               1                 0
## 6          0               0                 1

So we have 11 variables after we dummy up the categorical variables.

Ok, let’s do the LASSO. We actually use the function cv.glment, since it will do the cross validation.

set.seed(14) # Dave Keon
cars.gcv = cv.glmnet(x, y, type.measure = "mse", nfolds = 10,alpha=1)
#alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.
# nfolds= 10 gives 10-fold cross-validation?
plot(cars.gcv)

In this case the cross-validation min is obtained at the largest lambda = the least shinkage = most complicated model.
And the most min lambda does not set any of the coefficients to 0.
The 1se lambda is perhaps more interesting since it suggests there might be a simpler model what does as well.

Let’s pull off the min lambda value and the 1se lambda value.

## let's get the good lambda values
lmin = cars.gcv$lambda.min
l1se = cars.gcv$lambda.1se
cat("lambda min and lambda 1se are: ",lmin,l1se,"\n")
## lambda min and lambda 1se are:  0.02406165 0.2966433

Wow, they are pretty different !!!

Let’s plot the rmse since it is more interpretable than the mse.

#Let’s plot the RMSE.
crmse = sqrt(cars.gcv$cvm) #glmnet used mse
plot(log(cars.gcv$lambda),crmse,xlab="log(lambda)",ylab="rmse")

cat("the min rmse is:",min(crmse),"\n")
## the min rmse is: 6.47855

Let’s look at the LASSO coefficient values.
We could usd the glmnet function, but we can just pull the LASSO fit out of the data structure returned by cv.glmnet.

#Now lets get the actual LASSO fit. 
#We can run the function glmnet or we can get the fit on all the data from the cv results. 
#Let’s just pull it out of the cv results and have a look at the coefficients.

cars.las = cars.gcv$glmnet.fit
plot(cars.las)

Now let’s get the coefficients at the min and 1se lambda.

## get the 1se coefficients
bhatL1 = coef(cars.las,s=l1se)[,1] #[,1] gets rid of sparse matrix format
names(bhatL1[abs(bhatL1)==0]) #which one are 0??
## [1] "trim500"    "colorother"

## min mse lambda
bhatLm = coef(cars.las,s=lmin)[,1] #[,1] gets rid of sparse matrix format
names(bhatLm[abs(bhatLm)==0]) #which one are 0??
## character(0)

The min lambda has no zero coefficient, while the 1se lambda has two zero coefficients.

Now let’s get fitted values at a choice of lambda.
We’ll give it x so we will be fitting in-sample training data fits.
Of course if we wanted to get out-of-sample prediction, we would change the input x.

##Now let’s get some predictions. 
##We’ll predict at our train x, but of course we would be using different x if we were predicting out of sample.

fmat = predict(cars.las,newx=x,s=c(lmin,l1se))
fmat = cbind(y,fmat,sreg$fitted.values)
colnames(fmat) = c("y","yhatmin","yhat1se","linear")
pairs(fmat)