Default Data

To illustrate logistic regression in R, let’s use the default data form the ISLR package.

library(ISLR)
data(Default)
head(Default)
##   default student   balance    income
## 1      No      No  729.5265 44361.625
## 2      No     Yes  817.1804 12106.135
## 3      No      No 1073.5492 31767.139
## 4      No      No  529.2506 35704.494
## 5      No      No  785.6559 38463.496
## 6      No     Yes  919.5885  7491.559
summary(Default)
##  default    student       balance           income     
##  No :9667   No :7056   Min.   :   0.0   Min.   :  772  
##  Yes: 333   Yes:2944   1st Qu.: 481.7   1st Qu.:21340  
##                        Median : 823.6   Median :34553  
##                        Mean   : 835.4   Mean   :33517  
##                        3rd Qu.:1166.3   3rd Qu.:43808  
##                        Max.   :2654.3   Max.   :73554

We observation corresponds to a credit card account.

We want to relate the binary outcome default to x=balance.
Is the outstanding balance on the card related to whether or not the account holder defaults.

Plot the data

boxplot(balance~default,Default)

Fit a logit

lgm = glm(default~balance,Default,family=binomial(link = "logit"))
summary(lgm)
## 
## Call:
## glm(formula = default ~ balance, family = binomial(link = "logit"), 
##     data = Default)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2697  -0.1465  -0.0589  -0.0221   3.7589  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.065e+01  3.612e-01  -29.49   <2e-16 ***
## balance      5.499e-03  2.204e-04   24.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2920.6  on 9999  degrees of freedom
## Residual deviance: 1596.5  on 9998  degrees of freedom
## AIC: 1600.5
## 
## Number of Fisher Scoring iterations: 8
phat = predict(lgm,type="response")
summary(phat)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000237 0.0003346 0.0021888 0.0333000 0.0142324 0.9810081
plot(Default$balance,phat)

Let’s check it!!

bhat = lgm$coefficients
cat("bhat:",bhat,"\n")
## bhat: -10.65133 0.005498917
eta = bhat[1] + bhat[2] * Default$balance

phat0 = 1/(1+exp(-eta))
summary(phat-phat0)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.110e-16 -3.390e-21  0.000e+00 -1.299e-19  0.000e+00  1.110e-16

How do I check bhat is the MLE??