We have data on the sales price of 500 houses.
There are three columns for the variables y, x, and d.
y is the sales price in thousands of dollar, x is the size of the house
in thousands of square feet and d is a dummy variable which is 1 if the
house has a view of the nearby mountains and 0 otherwise.
ddf = read.csv('shousp.csv')
print(dim(ddf))
## [1] 500 3
head(ddf)
## x d y
## 1 2.254136 0 326.8835
## 2 0.841345 0 309.7134
## 3 2.552794 0 419.8512
## 4 3.477526 0 387.6628
## 5 2.104981 0 183.0673
## 6 3.399842 0 293.6572
Here is a plot of x=size vs. y=price with the houses with a view plotted with a red triangle and the houses without a view plotted with a black circle.
plot(ddf$x,ddf$y,col=ddf$d+1,pch=ddf$d+1,cex=.8,xlab='x=size',ylab='y=price')
legend('topleft',legend=c('view','no view'),col=c(2,1),pch=c(2,1))
Here is the output for the regression of y on x and d.
lmf = lm(y~.,ddf)
summary(lmf)
##
## Call:
## lm(formula = y ~ ., data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -183.987 -39.163 -3.546 45.054 216.835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 197.007 6.969 28.27 <2e-16 ***
## x 51.814 3.146 16.47 <2e-16 ***
## d 96.200 8.797 10.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 61.53 on 497 degrees of freedom
## Multiple R-squared: 0.4346, Adjusted R-squared: 0.4323
## F-statistic: 191 on 2 and 497 DF, p-value: < 2.2e-16
Here is a plot of the residuals vs the fitted values.
plot(lmf$fitted.values,lmf$residuals,xlab='fitted values',ylab='residuals')