This homework is going to focus on using R to make KNN predictions and somewhat reproduce the plots I used in lecture 7.

1 Setup

For the lecture I used the following code to generate data:

n = 50
set.seed(101)
x1 = runif(n)
x2 = runif(n)
prob  = ifelse(x1 < 0.5 & x1 > 0.25 & x2 > 0.25 & x2<0.75,0.8,0.3)
y  = as.factor(rbinom(n,1,prob))
levels(y) = c("1","2")
df = data.frame(y=y,x1=x1,x2=x2)

And then I used the following function to make KNN predictions:

knn_pred = function(point,x1,x2,y,k=5) {
  dists = sqrt((x1-point[1])^2+(x2-point[2])^2) #Find all distances to current obs
  bound = sort(dists)[k]                #Find kth smallest distance
  indices = which(dists <= bound)       #Find which obs have dists 1:k
  outcomes = as.integer(y[indices])     #Find corresponding outcomes y
  round(mean(outcomes)) #Taking advantage of 2 outcomes. If more 2s, this gives 2, if more 1s this gives 1. 
}

This code builds a grid of points, and then makes predictions for each of those points.

grid.fineness = 201
sequence = seq(0,1,length.out=grid.fineness)
grid = expand.grid(sequence,sequence)
colnames(grid) = c("x1","x2")
yhat = apply(grid,1,knn_pred,x1=x1,x2=x2,y=y,k=5)
yhat = as.factor(yhat)

With those predictions, we can build a dataframe, and plot.

df = as.data.frame(grid)
df$y = yhat
ggplot(df,aes(x=x1,y=x2,col=y))+geom_point(size=0.08)