In python, the simple code we looked at that uses sklearn to do cross validation is at:
In R, you can follow the code on slided 81-84 of the knn notes if you wish.
Note that you need the code in
We are going to use the used cars data again.
Previously, we used the “eye-ball” method to choose k for a kNN fit for mileage predicting price.
Use 5-fold cross-validation to choose k. How does your fit compare with the eyeball method?
Plot the data and then add the fit using the k you chose using cross-validation and the k you choose by eye-ball.
Use kNN with the k you chose using cross-validation to get a prediction for a used car with 100,000 miles on it. Use all the observations as training data to get your prediction (given your choice of k).
Use kNN to get a prediction for a 2008 car with 75,000 miles on it!
Is your predictive accuracy better using (mileage,year) than it was with just mileage?
In our notes examples we used kernel=“rectangular” when calling the R function kknn.
In R, have a look at the help for kknn (?kknn).
In python, the help for KNeighborsRegressor
in sklearn has
n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for :meth:`kneighbors` queries.
weights : str or callable
weight function used in prediction. Possible values:
- 'uniform' : uniform weights. All points in each neighborhood
are weighted equally.
- 'distance' : weight points by the inverse of their distance.
in this case, closer neighbors of a query point will have a
greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an
array of distances, and returns an array of the same shape
containing the weights.
Uniform weights are used by default.
So, you can weight the y values at the neighbors equally, or weight the closer ones more heavily. Typically default is equal weights.
Using the used cars data and predictors (features!!) (mileage,year) try a weighting option other than uniform.