House price data

We have data on the sales price of 500 houses.

There are three columns for the variables y, x, and d.
y is the sales price in thousands of dollar, x is the size of the house in thousands of square feet and d is a dummy variable which is 1 if the house has a view of the nearby mountains and 0 otherwise.

ddf = read.csv('shousp.csv')
print(dim(ddf))
## [1] 500   3
head(ddf)
##          x d        y
## 1 2.254136 0 326.8835
## 2 0.841345 0 309.7134
## 3 2.552794 0 419.8512
## 4 3.477526 0 387.6628
## 5 2.104981 0 183.0673
## 6 3.399842 0 293.6572

Here is a plot of x=size vs. y=price with the houses with a view plotted with a red triangle and the houses without a view plotted with a black circle.

plot(ddf$x,ddf$y,col=ddf$d+1,pch=ddf$d+1,cex=.8,xlab='x=size',ylab='y=price')
legend('topleft',legend=c('view','no view'),col=c(2,1),pch=c(2,1))

Here is the output for the regression of y on x and d. 

lmf = lm(y~.,ddf)
summary(lmf)
## 
## Call:
## lm(formula = y ~ ., data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -183.987  -39.163   -3.546   45.054  216.835 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  197.007      6.969   28.27   <2e-16 ***
## x             51.814      3.146   16.47   <2e-16 ***
## d             96.200      8.797   10.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61.53 on 497 degrees of freedom
## Multiple R-squared:  0.4346, Adjusted R-squared:  0.4323 
## F-statistic:   191 on 2 and 497 DF,  p-value: < 2.2e-16

Here is a plot of the residuals vs the fitted values.

plot(lmf$fitted.values,lmf$residuals,xlab='fitted values',ylab='residuals')