EXERCISE 1:
Let \(Z=e^{\beta_0+\beta_1X}\),
Equation (4.2) becomes
Step 1: \(p(X) = \frac{Z}{1+Z}\)
Step 2: \(\frac{1}{p(X)} = \frac{1+Z}{Z} = 1+\frac{1}{Z}\)
Step 3: \(Z = \frac{1}{\frac{1}{p(X)}-1} = \frac{1}{\frac{1-p(X)}{p(X)}} = \frac{p(X)}{1-p(X)}\)
EXERCISE 2:
Equation (4.12): \(p_k(x) = \frac {\pi_k \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) } {\sum { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }}\)
Substitute \(C = \frac { \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x^2)) } {\sum { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }}\) as this term does not vary across \(k\)
Step 1: Equation becomes \(p_k(x) = C \pi_k \exp(- \frac {1} {2 \sigma^2} (\mu_k^2 - 2x \mu_k))\)
Step 2: Take log of both sides \(log(p_k(x)) = log(C) + log(\pi_k) + (- \frac {1} {2 \sigma^2} (\mu_k^2 - 2x \mu_k))\)
Step 3: Simplify and rearrange \(log(p_k(x)) = (\frac {2x \mu_k} {2 \sigma^2} -\frac {\mu_k^2} {2 \sigma^2}) + log(\pi_k) + log(C)\)
EXERCISE 3:
If \(\sigma\) varies by \(k\) then Equation (4.12) becomes: \(p_k(x) = \frac {\pi_k \frac {1} {\sqrt{2 \pi} \sigma_k} \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_k)^2) } {\sum { \pi_l \frac {1} {\sqrt{2 \pi} \sigma_k} \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_l)^2) }}\)
The constant term that does not vary by \(k\) becomes \(C' = \frac { \frac {1} {\sqrt{2 \pi}}} {\sum { \pi_l \frac {1} {\sqrt{2 \pi} \sigma_k} \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_l)^2) }}\)
Step 1: Equation becomes \(p_k(x) = C' \frac{\pi_k}{\sigma_k} \exp(- \frac {1} {2 \sigma_k^2} (x - \mu_k)^2)\)
Step 2: Take log of both sides \(log(p_k(x)) = log(C') + log(\pi_k) - log(\sigma_k) + (- \frac {1} {2 \sigma_k^2} (x - \mu_k)^2)\)
Step 3: Simplify and rearrange \(log(p_k(x)) = (- \frac {1} {2 \sigma_k^2} (x^2 + \mu_k^2 - 2x\mu_k)) + log(\pi_k) - log(\sigma_k) + log(C')\)
There’s the \(x^2\).
EXERCISE 4:
Part a)
If \(X\) is uniformly distributed, then (0.65-0.55)/(1-0) = 10%
Part b)
For two features, \(10\% \times 10\% = 1\%\)
Part c)
For 100 features, \(10\%^{100}=\) a very small number
Part d)
When there are a large number of dimensions, the percentage of observations that can be used to predict with KNN becomes very small. This means that for a set sample size, more features leads to fewer neighbors.
Part e)
This is saying that when the number of features is high (i.e. p=100), to use on average 10% of the training observations would mean that we would need to include almost the entire range of each individual feature.
EXERCISE 5:
Part a)
If the actual decision boundary is linear, then we would expect LDA to perform better on the test set. For the training set, QDA has a chance of performing better if it overfits.
Part b)
QDA would likely perform better on both the training set and the test set.
Part c)
In general a large sample size is more beneficial for QDA so would expect QDA accuracy to increase more than LDA.
Part d)
FALSE: We might achieve a better error rate on the training set but not on the test set because if the true decision boundary is linear then the QDA is not flexible in any predictive way.
EXERCISE 6:
Part a)
For logistic regression, \(p(X) = \frac{e^{\beta_0+\beta_1 X_1+\beta_2 X_2}}{1+e^{\beta_0+\beta_1 X_1+\beta_2 X_2}}\)
Plugging in the values \(p(X) = \frac{e^{-6 + 0.05 \times 40 + 1 \times 3.5}}{1+e^{-6+0.05 \times 40 + 1 \times 3.5}} =\)
exp(-6+0.05*40+1*3.5)/(1+exp(-6+0.05*40+1*3.5)) #0.38
## [1] 0.3775407
Part b)
Solve this equation \(0.5 = \frac{e^{-6 + 0.05 X_1 + 1 \times 3.5}}{1+e^{-6+0.05 X_1 + 1 \times 3.5}}\)
Which equates to solving the logit equation \(log(\frac{0.5}{1-0.5}) = -6 + 0.05 X_1 + 1 \times 3.5\)
(log(0.5/(1-0.5)) + 6 - 3.5*1)/0.05 #50
## [1] 50
Student needs to study for 50 hours.
EXERCISE 7:
For constant variance, \(p_k(x) = \frac {\pi_k \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_k)^2) } {\sum { \pi_l \frac {1} {\sqrt{2 \pi} \sigma} \exp(- \frac {1} {2 \sigma^2} (x - \mu_l)^2) }}\)
Evaluating this becomes \(p_{yes}(4) = \frac {0.8 \exp(- \frac {1} {2 \times 36} (4 - 10)^2)} {0.8 \exp(- \frac {1} {2 \times 36} (4 - 10)^2) + (1-0.8) \exp(- \frac {1} {2 \times 36} (4 - 0)^2)}\)
(0.8*exp(-1/(2*36)*(4-10)^2))/(0.8*exp(-1/(2*36)*(4-10)^2)+(1-0.8)*exp(-1/(2*36)*(4-0)^2))
## [1] 0.7518525
Probability is 75.2%
EXERCISE 8:
There’s not enough information to say which method is better. With such a high error rate for the logistic regression, it’s possible that the true decision boundary is not linear, so KNN=1 might have a better fit. On the other hand, KNN=1 has a high propensity to overfit. With KNN=1 having an average error of 18%, it’s possible that the training error is close to 0% and the test error is more than 30%. If we are selecting the model with only error rate data, then we want to know which model has the lower test error rate.
EXERCISE 9:
Part a)
We want to solve \(0.37 = \frac{p_{default}}{1-p_{default}}\)
Rearranging, this becomes \(\frac{1}{0.37} = \frac{1-p_{default}}{p_{default}} = \frac{1}{p_{default}}-1\)
Finally \(p_{default} = \frac{1}{\frac{1}{0.37}+1}\)
1/(1/0.37+1)
## [1] 0.270073
Probability of default is 27.0%
Part b)
0.16/(1-0.16)
## [1] 0.1904762
Odds of defaulting is 0.19
EXERCISE 10:
Part a)
require(ISLR)
data(Weekly)
summary(Weekly)
## Year Lag1 Lag2 Lag3
## Min. :1990 Min. :-18.1950 Min. :-18.1950 Min. :-18.1950
## 1st Qu.:1995 1st Qu.: -1.1540 1st Qu.: -1.1540 1st Qu.: -1.1580
## Median :2000 Median : 0.2410 Median : 0.2410 Median : 0.2410
## Mean :2000 Mean : 0.1506 Mean : 0.1511 Mean : 0.1472
## 3rd Qu.:2005 3rd Qu.: 1.4050 3rd Qu.: 1.4090 3rd Qu.: 1.4090
## Max. :2010 Max. : 12.0260 Max. : 12.0260 Max. : 12.0260
## Lag4 Lag5 Volume
## Min. :-18.1950 Min. :-18.1950 Min. :0.08747
## 1st Qu.: -1.1580 1st Qu.: -1.1660 1st Qu.:0.33202
## Median : 0.2380 Median : 0.2340 Median :1.00268
## Mean : 0.1458 Mean : 0.1399 Mean :1.57462
## 3rd Qu.: 1.4090 3rd Qu.: 1.4050 3rd Qu.:2.05373
## Max. : 12.0260 Max. : 12.0260 Max. :9.32821
## Today Direction
## Min. :-18.1950 Down:484
## 1st Qu.: -1.1540 Up :605
## Median : 0.2410
## Mean : 0.1499
## 3rd Qu.: 1.4050
## Max. : 12.0260
pairs(Weekly)
Year
and Volume
are positively correlated similar to the Smarket
data set.
Part b)
fit.logit <- glm(Direction~., data=Weekly[,c(2:7,9)], family=binomial)
summary(fit.logit)
##
## Call:
## glm(formula = Direction ~ ., family = binomial, data = Weekly[,
## c(2:7, 9)])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6949 -1.2565 0.9913 1.0849 1.4579
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.26686 0.08593 3.106 0.0019 **
## Lag1 -0.04127 0.02641 -1.563 0.1181
## Lag2 0.05844 0.02686 2.175 0.0296 *
## Lag3 -0.01606 0.02666 -0.602 0.5469
## Lag4 -0.02779 0.02646 -1.050 0.2937
## Lag5 -0.01447 0.02638 -0.549 0.5833
## Volume -0.02274 0.03690 -0.616 0.5377
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1496.2 on 1088 degrees of freedom
## Residual deviance: 1486.4 on 1082 degrees of freedom
## AIC: 1500.4
##
## Number of Fisher Scoring iterations: 4
Lag2
seems to have statistically significant predictive value
Part c)
logit.prob <- predict(fit.logit, Weekly, type="response")
logit.pred <- ifelse(logit.prob > 0.5, "Up", "Down")
table(logit.pred, Weekly$Direction)
##
## logit.pred Down Up
## Down 54 48
## Up 430 557
(54+557)/nrow(Weekly) # Accuracy=0.56
## [1] 0.5610652
Model is has higher accuracy when the prediction is “Up”
Part d)
train.yrs <- Weekly$Year %in% (1990:2008)
train <- Weekly[train.yrs,]
test <- Weekly[!train.yrs,]
fit2 <- glm(Direction~Lag2, data=train, family=binomial)
fit2.prob <- predict(fit2, test, type="response")
fit2.pred <- ifelse(fit2.prob > 0.5, "Up", "Down")
table(fit2.pred, test$Direction)
##
## fit2.pred Down Up
## Down 9 5
## Up 34 56
mean(fit2.pred == test$Direction) # Accuracy=0.625
## [1] 0.625
Part e)
require(MASS)
fit.lda <- lda(Direction~Lag2, data=train)
fit.lda.pred <- predict(fit.lda, test)$class
table(fit.lda.pred, test$Direction)
##
## fit.lda.pred Down Up
## Down 9 5
## Up 34 56
mean(fit.lda.pred == test$Direction) # Accuracy=0.625
## [1] 0.625
Part f)
fit.qda <- qda(Direction~Lag2, data=train)
fit.qda.pred <- predict(fit.qda, test)$class
table(fit.qda.pred, test$Direction)
##
## fit.qda.pred Down Up
## Down 0 0
## Up 43 61
mean(fit.qda.pred == test$Direction) # Accuracy=0.587
## [1] 0.5865385
Part g)
require(class)
set.seed(1)
train.X <- as.matrix(train$Lag2)
test.X <- as.matrix(test$Lag2)
knn.pred <- knn(train.X, test.X, train$Direction, k=1)
table(knn.pred, test$Direction)
##
## knn.pred Down Up
## Down 21 30
## Up 22 31
mean(knn.pred == test$Direction) # Accuracy=0.500
## [1] 0.5
Part h)
The Logistic Regression and LDA models produced the best results
Part i)
knn.pred <- knn(train.X, test.X, train$Direction, k=5)
table(knn.pred, test$Direction)
##
## knn.pred Down Up
## Down 15 20
## Up 28 41
mean(knn.pred == test$Direction)
## [1] 0.5384615
knn.pred <- knn(train.X, test.X, train$Direction, k=10)
table(knn.pred, test$Direction)
##
## knn.pred Down Up
## Down 17 19
## Up 26 42
mean(knn.pred == test$Direction)
## [1] 0.5673077
knn.pred <- knn(train.X, test.X, train$Direction, k=20)
table(knn.pred, test$Direction)
##
## knn.pred Down Up
## Down 21 20
## Up 22 41
mean(knn.pred == test$Direction)
## [1] 0.5961538
knn.pred <- knn(train.X, test.X, train$Direction, k=30)
table(knn.pred, test$Direction)
##
## knn.pred Down Up
## Down 19 21
## Up 24 40
mean(knn.pred == test$Direction)
## [1] 0.5673077
Higher k values for KNN (around 20) seemed to produce the best results when using only Lag2 as predictor.
fit.lda <- lda(Direction~Lag2+I(Lag1^2), data=train)
fit.lda.pred <- predict(fit.lda, test)$class
table(fit.lda.pred, test$Direction)
##
## fit.lda.pred Down Up
## Down 8 2
## Up 35 59
mean(fit.lda.pred == test$Direction) # Accuracy=0.644
## [1] 0.6442308
EXERCISE 11:
Part a)
require(ISLR)
data(Auto)
mpg01 <- ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
mydf <- data.frame(Auto, mpg01)
Part b)
pairs(mydf)
displacement
, horsepower
, weight
and acceleration
seem to be highly correlated
Part c)
set.seed(1)
trainid <- sample(1:nrow(mydf), nrow(mydf)*0.7 , replace=F) # 70% train, 30% test
train <- mydf[trainid,]
test <- mydf[-trainid,]
Part d)
fit.lda <- lda(mpg01~displacement+horsepower+weight+acceleration, data=train)
fit.lda.pred <- predict(fit.lda, test)$class
table(fit.lda.pred, test$mpg01)
##
## fit.lda.pred 0 1
## 0 47 0
## 1 10 61
mean(fit.lda.pred != test$mpg01) # error rate
## [1] 0.08474576
Part e)
fit.qda <- qda(mpg01~displacement+horsepower+weight+acceleration, data=train)
fit.qda.pred <- predict(fit.qda, test)$class
table(fit.qda.pred, test$mpg01)
##
## fit.qda.pred 0 1
## 0 48 3
## 1 9 58
mean(fit.qda.pred != test$mpg01) # error rate
## [1] 0.1016949
Part f)
fit.logit <- glm(mpg01~displacement+horsepower+weight+acceleration, data=train, family=binomial)
logit.prob <- predict(fit.logit, test, type="response")
logit.pred <- ifelse(logit.prob > 0.5, 1, 0)
table(logit.pred, test$mpg01)
##
## logit.pred 0 1
## 0 50 3
## 1 7 58
mean(logit.pred != test$mpg01) # error rate
## [1] 0.08474576
Part g)
train.X <- cbind(train$displacement, train$horsepower, train$weight, train$acceleration)
test.X <- cbind(test$displacement, test$horsepower, test$weight, test$acceleration)
knn.pred <- knn(train.X, test.X, train$mpg01, k=1)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 47 9
## 1 10 52
mean(knn.pred != test$mpg01)
## [1] 0.1610169
knn.pred <- knn(train.X, test.X, train$mpg01, k=10)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 43 2
## 1 14 59
mean(knn.pred != test$mpg01)
## [1] 0.1355932
knn.pred <- knn(train.X, test.X, train$mpg01, k=20)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 42 2
## 1 15 59
mean(knn.pred != test$mpg01)
## [1] 0.1440678
knn.pred <- knn(train.X, test.X, train$mpg01, k=30)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 49 3
## 1 8 58
mean(knn.pred != test$mpg01)
## [1] 0.09322034
knn.pred <- knn(train.X, test.X, train$mpg01, k=50)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 46 2
## 1 11 59
mean(knn.pred != test$mpg01)
## [1] 0.1101695
knn.pred <- knn(train.X, test.X, train$mpg01, k=100)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 49 2
## 1 8 59
mean(knn.pred != test$mpg01)
## [1] 0.08474576
knn.pred <- knn(train.X, test.X, train$mpg01, k=200)
table(knn.pred, test$mpg01)
##
## knn.pred 0 1
## 0 41 2
## 1 16 59
mean(knn.pred != test$mpg01)
## [1] 0.1525424
KNN performs best around k=30 and k=100
EXERCISE 12:
Part a)
Power <- function() {
print(2^3)
}
Power()
## [1] 8
Part b)
Power2 <- function(x, a) {
print(x^a)
}
Power2(3,8)
## [1] 6561
Part c)
Power2(10,3)
## [1] 1000
Power2(8,17)
## [1] 2.2518e+15
Power2(131,3)
## [1] 2248091
Part d)
Power3 <- function(x, a) {
return(x^a)
}
Power3(3,8)
## [1] 6561
Part e)
x <- 1:10
plot(x, Power3(x,2), log="y", main="log(x^2) vs. x",
xlab="x", ylab="log(x^2)")
Part f)
PlotPower <- function(x, a) {
plot(x, Power3(x,2), main="x^a versus x",
xlab="x", ylab=paste0("x^",a))
}
PlotPower(1:10,3)
EXERCISE 13:
data(Boston)
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
crim01 <- ifelse(Boston$crim > median(Boston$crim), 1, 0)
mydf <- data.frame(Boston, crim01)
pairs(mydf) # pred1 = age, dis, lstat, medv
sort(cor(mydf)[1,]) # pred2 = tax, rad (highest correlations with crim)
## medv black dis rm zn chas
## -0.38830461 -0.38506394 -0.37967009 -0.21924670 -0.20046922 -0.05589158
## ptratio age indus crim01 nox lstat
## 0.28994558 0.35273425 0.40658341 0.40939545 0.42097171 0.45562148
## tax rad crim
## 0.58276431 0.62550515 1.00000000
set.seed(1)
trainid <- sample(1:nrow(mydf), nrow(mydf)*0.7 , replace=F) # 70% train, 30% test
train <- mydf[trainid,]
test <- mydf[-trainid,]
train.X1 <- cbind(train$age, train$dis, train$lstat, train$medv)
test.X1 <- cbind(test$age, test$dis, test$lstat, test$medv)
train.X2 <- cbind(train$tax, train$rad)
test.X2 <- cbind(test$tax, test$rad)
# Logistic Regression models
fit.logit1 <- glm(crim01~age+dis+lstat+medv, data=train, family=binomial)
logit1.prob <- predict(fit.logit1, test, type="response")
logit1.pred <- ifelse(logit1.prob > 0.5, 1, 0)
mean(logit1.pred != test$crim01) # error rate
## [1] 0.1644737
fit.logit2 <- glm(crim01~tax+rad, data=train, family=binomial)
logit2.prob <- predict(fit.logit2, test, type="response")
logit2.pred <- ifelse(logit2.prob > 0.5, 1, 0)
mean(logit2.pred != test$crim01) # error rate
## [1] 0.2434211
# LDA models
fit.lda1 <- lda(crim01~age+dis+lstat+medv, data=train)
fit.lda1.pred <- predict(fit.lda1, test)$class
mean(fit.lda1.pred != test$crim01) # error rate
## [1] 0.1776316
fit.lda2 <- lda(crim01~tax+rad, data=train)
fit.lda2.pred <- predict(fit.lda2, test)$class
mean(fit.lda2.pred != test$crim01) # error rate
## [1] 0.2763158
# QDA models
fit.qda1 <- qda(crim01~age+dis+lstat+medv, data=train)
fit.qda1.pred <- predict(fit.qda1, test)$class
mean(fit.qda1.pred != test$crim01) # error rate
## [1] 0.1776316
fit.qda2 <- qda(crim01~tax+rad, data=train)
fit.qda2.pred <- predict(fit.qda2, test)$class
mean(fit.qda2.pred != test$crim01) # error rate
## [1] 0.2631579
# KNN models
set.seed(1)
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=1)
mean(knn1.pred != test$crim01)
## [1] 0.25
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=5)
mean(knn1.pred != test$crim01)
## [1] 0.1907895
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=10)
mean(knn1.pred != test$crim01)
## [1] 0.2039474
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=20)
mean(knn1.pred != test$crim01)
## [1] 0.1842105
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=50)
mean(knn1.pred != test$crim01)
## [1] 0.1842105
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=100)
mean(knn1.pred != test$crim01)
## [1] 0.1842105
knn1.pred <- knn(train.X1, test.X1, train$crim01, k=200)
mean(knn1.pred != test$crim01)
## [1] 0.1907895
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=1)
mean(knn2.pred != test$crim01)
## [1] 0.06578947
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=5)
mean(knn2.pred != test$crim01)
## [1] 0.1118421
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=10)
mean(knn2.pred != test$crim01)
## [1] 0.1710526
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=20)
mean(knn2.pred != test$crim01)
## [1] 0.1513158
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=50)
mean(knn2.pred != test$crim01)
## [1] 0.2894737
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=100)
mean(knn2.pred != test$crim01)
## [1] 0.2894737
knn2.pred <- knn(train.X2, test.X2, train$crim01, k=200)
mean(knn2.pred != test$crim01)
## [1] 0.2763158
Surprisingly, the KNN model with two predictors tax
and rad
and k=1 had the best error rate