This homework is going to focus on using R to make KNN predictions and somewhat reproduce the plots I used in lecture 7

# 1 Setup

For the lecture I used the following code to generate data:

``````n = 50
set.seed(101)
x1 = runif(n)
x2 = runif(n)
prob  = ifelse(x1 < 0.5 & x1 > 0.25 & x2 > 0.25 & x2<0.75,0.8,0.3)
y  = as.factor(rbinom(n,1,prob))
levels(y) = c("1","2")
df = data.frame(y=y,x1=x1,x2=x2)``````

And then I used the following function to make KNN predictions:

``````knn_pred = function(point,x1,x2,y,k=5) {
dists = sqrt((x1-point)^2+(x2-point)^2) #Find all distances to current obs
bound = sort(dists)[k]                #Find kth smallest distance
indices = which(dists <= bound)       #Find which obs have dists 1:k
outcomes = as.integer(y[indices])     #Find corresponding outcomes y
round(mean(outcomes)) #Taking advantage of 2 outcomes. If more 2s, this gives 2, if more 1s this gives 1.
}``````

This code builds a grid of points, and then makes predictions for each of those points.

``````grid.fineness = 101
sequence = seq(0,1,length.out=grid.fineness)
grid = expand.grid(sequence,sequence)
colnames(grid) = c("x1","x2")
yhat = apply(grid,1,knn_pred,x1=x1,x2=x2,y=y,k=5)
yhat = as.factor(yhat)``````

With those predictions, we can build a dataframe, and plot.

``````grid.df = as.data.frame(grid)
grid.df\$y = yhat
ggplot(grid.df,aes(x=x1,y=x2,col=y))+geom_point()``````