Maybe the easiest way to do neural nets in R is with the nnet package.
This package has been around a long time and is relatively easy to use.
The downside is that is just does a single layer.
Let’s do a simple example with a numeric output.
Let use the Boston housing data with x=lstat, y=mdev for a simple example.
library(MASS)
data(Boston)
head(Boston)
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
x = Boston$lstat
xs = scale(x)
cat('check mean is 0: ',mean(xs), ' , check sd is 1: ',sd(xs),'\n')
## check mean is 0: -7.052778e-17 , check sd is 1: 1
y = Boston$medv
plot(xs,y)
library(nnet)
ddf = data.frame(xs,y)
set.seed(99)
nnfit = nnet(y~xs,ddf,size=5,decay=.1,linout=T,maxit=1000)
## # weights: 16
## initial value 295271.940971
## iter 10 value 18804.601760
## iter 20 value 14732.914419
## iter 30 value 13439.088307
## iter 40 value 13262.996876
## iter 50 value 13252.422595
## iter 60 value 13245.401109
## iter 70 value 13244.580698
## iter 80 value 13244.573093
## iter 90 value 13244.553851
## iter 100 value 13244.345324
## iter 110 value 13243.229851
## iter 120 value 13241.492153
## iter 130 value 13240.555915
## iter 140 value 13239.309174
## iter 150 value 13239.094884
## iter 160 value 13237.604361
## iter 170 value 13235.116316
## iter 180 value 13234.562584
## iter 190 value 13233.493623
## iter 200 value 13233.199440
## iter 200 value 13233.199354
## iter 200 value 13233.199334
## final value 13233.199334
## converged
Note this package does not use stochastic gradient descent.
Give the relative simplicity of the single layer model it uses a different optimization but it is still iterative.
Let’s check the in sample fit.
yhat = predict(nnfit,ddf)
plot(xs,y)
points(xs,yhat,col='red')
Out of sample.
ddfp = data.frame(xs=c(-1,0,1))
ypred = predict(nnfit,ddfp)
plot(xs,y)
points(xs,yhat,col='red')
points(ddfp$x,ypred,col="blue",cex=2,pch=16)
Let’s run it again and see if we get something similar. This time without regularization.
set.seed(14)
nnfit1 = nnet(y~xs,ddf,size=5,decay=0,linout=T,maxit=1000)
## # weights: 16
## initial value 273888.612829
## iter 10 value 15532.811481
## iter 20 value 13560.919681
## iter 30 value 13208.332308
## iter 40 value 13166.714402
## iter 50 value 13163.069366
## iter 60 value 13142.226065
## iter 70 value 13106.448501
## iter 80 value 13093.748147
## iter 90 value 13090.484990
## iter 100 value 13088.782103
## iter 110 value 13084.433639
## iter 120 value 13083.886813
## iter 130 value 13083.743524
## iter 140 value 13083.589854
## final value 13083.530899
## converged
yhat1 = predict(nnfit1,ddf)
plot(yhat,yhat1)
abline(0,1,col="red")
plot(xs,y)
points(xs,yhat,col='red')
points(xs,yhat1,col='green')
Quite similar, but a little different.
Let’s try more units, with and without regularization.
No regularization.
set.seed(99)
nnfit11 = nnet(y~xs,ddf,size=100,decay=0,linout=T,maxit=1000)
## # weights: 301
## initial value 284217.335015
## iter 10 value 21983.550364
## iter 20 value 13490.837387
## iter 30 value 13141.644232
## iter 40 value 13079.142522
## iter 50 value 13016.819628
## iter 60 value 12910.796359
## iter 70 value 12834.433625
## iter 80 value 12772.581967
## iter 90 value 12693.807995
## iter 100 value 12650.754775
## iter 110 value 12600.450082
## iter 120 value 12526.420081
## iter 130 value 12473.376973
## iter 140 value 12433.888918
## iter 150 value 12393.833282
## iter 160 value 12356.811560
## iter 170 value 12320.966862
## iter 180 value 12284.785652
## iter 190 value 12251.506529
## iter 200 value 12236.303953
## iter 210 value 12215.139191
## iter 220 value 12176.668301
## iter 230 value 12145.965042
## iter 240 value 12103.077067
## iter 250 value 12061.187876
## iter 260 value 12021.070968
## iter 270 value 11971.793668
## iter 280 value 11928.148753
## iter 290 value 11898.987839
## iter 300 value 11875.547578
## iter 310 value 11852.163328
## iter 320 value 11829.833303
## iter 330 value 11807.186972
## iter 340 value 11778.183263
## iter 350 value 11745.383210
## iter 360 value 11705.738068
## iter 370 value 11676.605650
## iter 380 value 11645.374594
## iter 390 value 11624.033360
## iter 400 value 11605.421281
## iter 410 value 11583.615477
## iter 420 value 11568.320011
## iter 430 value 11539.706277
## iter 440 value 11520.457436
## iter 450 value 11491.296958
## iter 460 value 11474.190307
## iter 470 value 11452.794327
## iter 480 value 11423.389193
## iter 490 value 11394.179587
## iter 500 value 11354.091069
## iter 510 value 11311.436084
## iter 520 value 11256.310741
## iter 530 value 11185.161119
## iter 540 value 11128.291335
## iter 550 value 11087.630726
## iter 560 value 11040.855450
## iter 570 value 10981.339831
## iter 580 value 10905.364627
## iter 590 value 10843.312191
## iter 600 value 10763.458516
## iter 610 value 10716.244800
## iter 620 value 10703.245724
## iter 630 value 10692.104558
## iter 640 value 10681.660322
## iter 650 value 10673.065580
## iter 660 value 10660.410648
## iter 670 value 10648.182657
## iter 680 value 10627.779156
## iter 690 value 10610.290267
## iter 700 value 10588.471305
## iter 710 value 10558.218905
## iter 720 value 10496.592130
## iter 730 value 10398.919688
## iter 740 value 10317.892477
## iter 750 value 10239.155032
## iter 760 value 10154.073408
## iter 770 value 10063.675357
## iter 780 value 9995.743909
## iter 790 value 9915.867944
## iter 800 value 9836.298070
## iter 810 value 9730.237927
## iter 820 value 9647.088337
## iter 830 value 9600.203870
## iter 840 value 9560.202912
## iter 850 value 9535.839449
## iter 860 value 9511.972589
## iter 870 value 9490.689000
## iter 880 value 9461.729666
## iter 890 value 9420.505641
## iter 900 value 9385.414068
## iter 910 value 9371.477306
## iter 920 value 9361.549867
## iter 930 value 9349.360656
## iter 940 value 9337.695082
## iter 950 value 9326.537148
## iter 960 value 9316.836383
## iter 970 value 9303.506454
## iter 980 value 9293.580649
## iter 990 value 9280.705425
## iter1000 value 9270.868840
## final value 9270.868840
## stopped after 1000 iterations
Not converged. We could try more iterations.
With L2 regularization.
set.seed(99)
nnfit12 = nnet(y~xs,ddf,size=100,decay=.5,linout=T,maxit=1000)
## # weights: 301
## initial value 284241.526620
## iter 10 value 14682.093555
## iter 20 value 14213.790691
## iter 30 value 14162.802738
## iter 40 value 14123.716615
## iter 50 value 14082.192672
## iter 60 value 14038.388089
## iter 70 value 13985.940486
## iter 80 value 13932.933815
## iter 90 value 13866.651102
## iter 100 value 13811.870787
## iter 110 value 13774.539591
## iter 120 value 13739.042924
## iter 130 value 13707.194842
## iter 140 value 13680.477965
## iter 150 value 13653.281745
## iter 160 value 13629.976556
## iter 170 value 13613.261024
## iter 180 value 13599.314320
## iter 190 value 13585.714543
## iter 200 value 13576.023330
## iter 210 value 13570.684218
## iter 220 value 13564.875711
## iter 230 value 13560.576113
## iter 240 value 13556.903618
## iter 250 value 13554.422378
## iter 260 value 13552.428051
## iter 270 value 13551.026799
## iter 280 value 13549.785831
## iter 290 value 13549.222294
## iter 300 value 13548.535472
## iter 310 value 13547.935992
## iter 320 value 13547.534305
## iter 330 value 13547.127067
## iter 340 value 13546.722205
## iter 350 value 13546.304004
## iter 360 value 13545.828396
## iter 370 value 13545.447129
## iter 380 value 13545.060781
## iter 390 value 13544.660355
## iter 400 value 13544.278799
## iter 410 value 13543.879969
## iter 420 value 13543.570838
## iter 430 value 13543.321293
## iter 440 value 13543.097847
## iter 450 value 13542.920727
## iter 460 value 13542.708184
## iter 470 value 13542.442361
## iter 480 value 13542.145593
## iter 490 value 13541.944566
## final value 13541.924838
## converged
yhat11 = predict(nnfit11,ddf)
yhat12 = predict(nnfit12,ddf)
plot(xs,y)
points(xs,yhat11,col='red',pch=16,cex=.5)
points(xs,yhat12,col='green',pch=16,cex=.5)
legend('topright',legend=c('100 units, no L2','100 unit L2'),col=c('red','green'),pch=c(16,16))
With 100 units we have a complex model, but we can make it simpler by using L2 regularization.
Let’s try a binary classification problem.
library(ISLR2)
##
## Attaching package: 'ISLR2'
## The following object is masked _by_ '.GlobalEnv':
##
## Boston
## The following object is masked from 'package:MASS':
##
## Boston
data(Default)
y = Default$default
x = Default$balance
xs = scale(x)
ddf = data.frame(xs,y)
First let’s do the logit.
lgfit = glm(y~.,ddf,family=binomial)
phatlg = predict(lgfit,type='response')
Now let’s try 2 neural net fits where we change the amount of regularization.
set.seed(99)
nnbfit = nnet(y~.,ddf,size=50,decay=.1,maxit=1000)
## # weights: 151
## initial value 7439.984766
## iter 10 value 805.955919
## iter 20 value 802.530610
## iter 30 value 802.052846
## iter 40 value 801.663396
## iter 50 value 801.370022
## iter 60 value 801.258790
## iter 70 value 801.176774
## iter 80 value 801.137932
## iter 90 value 801.092142
## iter 100 value 800.976669
## iter 110 value 800.941842
## iter 120 value 800.925124
## iter 130 value 800.910568
## iter 140 value 800.900573
## iter 150 value 800.891577
## iter 160 value 800.885951
## iter 170 value 800.880966
## iter 180 value 800.871832
## iter 190 value 800.864372
## iter 200 value 800.858635
## iter 210 value 800.854236
## iter 220 value 800.851419
## iter 230 value 800.848112
## iter 240 value 800.847236
## iter 250 value 800.846657
## iter 260 value 800.845601
## iter 270 value 800.845172
## iter 280 value 800.844939
## iter 280 value 800.844932
## iter 280 value 800.844930
## final value 800.844930
## converged
set.seed(99)
nnbfit1 = nnet(y~.,ddf,size=50,decay=.001,maxit=1000)
## # weights: 151
## initial value 7437.618705
## iter 10 value 799.100959
## iter 20 value 798.171005
## iter 30 value 797.528647
## iter 40 value 796.717417
## iter 50 value 796.478999
## iter 60 value 796.373442
## iter 70 value 796.199470
## iter 80 value 795.937477
## iter 90 value 795.463423
## iter 100 value 794.640970
## iter 110 value 794.237046
## iter 120 value 794.062577
## iter 130 value 793.895390
## iter 140 value 793.723762
## iter 150 value 793.651329
## iter 160 value 793.545982
## iter 170 value 793.365401
## iter 180 value 793.145390
## iter 190 value 792.992474
## iter 200 value 792.854976
## iter 210 value 792.780028
## iter 220 value 792.704691
## iter 230 value 792.652022
## iter 240 value 792.627415
## iter 250 value 792.562894
## iter 260 value 792.322482
## iter 270 value 791.884316
## iter 280 value 791.398555
## iter 290 value 790.833481
## iter 300 value 790.448905
## iter 310 value 790.232298
## iter 320 value 790.035418
## iter 330 value 789.830977
## iter 340 value 789.613534
## iter 350 value 789.517344
## iter 360 value 789.448542
## iter 370 value 789.402009
## iter 380 value 789.386229
## iter 390 value 789.377381
## iter 400 value 789.371269
## iter 410 value 789.365039
## iter 420 value 789.358665
## iter 430 value 789.354676
## iter 440 value 789.351823
## iter 450 value 789.349175
## iter 460 value 789.347738
## iter 470 value 789.345781
## iter 480 value 789.344183
## iter 490 value 789.342351
## iter 500 value 789.340689
## iter 510 value 789.338649
## iter 520 value 789.337286
## iter 530 value 789.336426
## iter 540 value 789.335633
## iter 550 value 789.334921
## iter 560 value 789.334345
## iter 570 value 789.334028
## iter 580 value 789.333323
## iter 590 value 789.333016
## iter 600 value 789.332655
## final value 789.332451
## converged
Now let’s plot the in-sample fitted phats from the two neural net fits and the logit.
phat = predict(nnbfit,ddf)[,1] #with regularization
phat1 = predict(nnbfit1,ddf)[,1] #without
plot(xs,phat,ylim=range(c(phat,phatlg)))
points(xs,phatlg,col='red') #logit
points(xs,phat1,col='green')
legend('topleft',legend=c('reg nn','noreg nn','logit'),col=c('black','green','red'),pch=c(1,1,1),bty='n')
Maybe the logit is ok.