Support Vector Machine in R: Hyperplane and Maximal Margin Classifier (Part 01)

5 min readJun 17, 2023

A boundary between river and rock. Taken at Yellowstone Park — 2017

Support vector machine (SVM) is a machine learning algorithm designed to construct an optimal ‘hyperplane’ to separate data into different classes.

What is a hyperplane?

In n-dimension space, a hyperplane is a flat affine subspace with an n-1 dimension. For instance, in 2-dimension space, a hyperplane is a 1-dimensional line (Figure 1). The mathematical definition of a 2-dimension hyperplane is simple and defined by the equation (1.1)

1.1 Equation. A line in 2 dimensions

In the p-dimensional space, the dimension of the hyperplane is p-1, as illustrated in equation (1.2)

1.2 Equation. A line in p dimensions

2. Classification using a hyperplane

A hyperplane (black solid line: 2+3 * x_1 + 2 * x_2 = 0 (Equ. 1.3)) in two-dimensional space is shown in Figure 1. We have two observations in two classes — that is y_1,…,y_n ∈ { 1,-1}. Green points and red points are the points that do not satisfy (1.3), for example,

1.4 Equation. Red: points in class 1; Green: points in class -1

## user-defined function to draw 2-d hyperplane
hyper_func = function(intercept, slope1, slope2, min,max, length){
  data_ = list()
  data_[[1]] <- seq(min, max, length.out = length)
  data_[[2]] <- (intercept + slope1 * data_[[1]]) / -slope2
  return(data_)
}

## randomly generate points for a specific range
x1 = runif(200, min=-2, max=2)
x2 = runif(200, min=-2, max=2)
input_data = cbind(x1, x2)
plot(x1,x2, pch = 16)

plane_func = 2+3 * x1 + 2 * x2
input_data = cbind(input_data, plane_func)
## calculate the class for each points based on which side they locate.  
group_data = lapply(input_data[,3], function(x) if(x>0){return('red')}else{
  return('green')})
input_data = cbind(input_data, groups = unlist(group_data))
plot(input_data[,'x1'],input_data[,'x2'],col = input_data[,'groups'], pch = 16, xlab = 'x1', ylab = 'x2')

##draw a 2-D hyperplane,beta_0 = 2, beta_1 = 3, beta_2 = 2. min & max = 2. Length = 200  
pts_plane = hyper_func(2,3,2,-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2)

Figure 1. The hyperplane 2+3 * x1 + 2 * x2 = 0 is shown in the black line. Red points are corresponding to 2+3 * x1 + 2 * x2 > 0. Green points are corresponding to 2+3 * x1 + 2 * x2 < 0.

3. The Maximal Margin Classifier

We can have an infinite number of hyperplanes since even a slight adjustment to the hyperplane does not intersect any of the observations in the training set. For example, three possible hyperplanes as shown in Figure 2.

#filter some points to illustrate this sample
sample_pts = input_data[which(abs(as.numeric(input_data[,'plane_func'])) > 3),]
plot(sample_pts[,'x1'],sample_pts[,'x2'], col = sample_pts[,'groups'],pch = 16, xlab = 'x1', ylab = 'x2')

# draw three possible hyperplane
pts_plane = hyper_func(2,3,2,-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2)

pts_plane = hyper_func(1.5,1.5,2,-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2)

pts_plane = hyper_func(2.2,2.2,2,-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2)

We have two group points, one is presented in red color that mainly distributed at right-upper corner, and secodn group is presented in green color that mainly distributed at left-bottome corner. We have three hyperplane that can separate the two group points. And neither of them interact with any points. — Figure 2. Three possible hyperplanes (solid black lines) that separate the points into two different classes

The rule we employ to choose the optimal hyperplane, known as the Maximal Margin Hyperplane (MMH), is based on maximizing the margin. The margin refers to the minimum distance between the points in the training set and the hyperplane (d+ is the shortest distance from the hyperplane to the closest positive point, and d- is the shortest distance from the hyperplane to the closest negative point. So margin = d+ plus d-).

By seeking the hyperplane with the largest margin, we aim to find the one that has the farthest minimum distance to the training points. In essence, the MMH represents the linear hyperplane that optimally separates the data. Once the MMH is identified, we can classify the testing points by determining which side of the hyperplane they fall on. This approach is commonly referred to as the Maximal Margin Classifier.

##This example illustrate the classification for two classes based on svm. 
##so x variable should be numeric format, y variable should be factor. 
svm_data = data.frame(x1 = as.numeric(sample_pts[,'x1']), x2 = as.numeric(sample_pts[,'x2']),groups = as.factor(sample_pts[,'groups'])) 
##using svm to find the optimal hyperplane. 
svmfit = svm(groups~x1+x2, data = svm_data, kernel = 'linear', cost = 5, scale = FALSE)

##generate a regular grid dots
make.grid = function(x, n = 75) {
  grange = apply(x, 2, range)
  x1 = seq(from = grange[1,1], to = grange[2,1], length = n)
  x2 = seq(from = grange[1,2], to = grange[2,2], length = n)
  expand.grid(x1 = x1, x2 = x2)
}
##x is the x_1 and x_2. 
x = svm_data[,c(1,2)]
xgrid = make.grid(svm_data[,c(1,2)])
##predict class for regular grid dots
ygrid = predict(svmfit, xgrid)
##plot regular grid dots
plot(xgrid, col = c("green","red")[as.numeric(ygrid)], pch = 20, cex = .2)
##plots points in two class
points(svm_data, col = as.character(svm_data[,3]), pch = 19)
##plot support vector, svmfit$index can provide index for support vector 
points(x[svmfit$index,], pch = 5, cex = 2, col = 'blue')

## get coefficients of optimal hyperplane (beta_0 (intercept), beta_1 and beta_2)
coeffis = coef(svmfit)
## draw optimal hyperplane
pts_plane = hyper_func(coeffis[1],coeffis[2],coeffis[3],-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2)
## draw margin for negative class, so intercept plus 1. Why plus 1? See in Equation 1.5
pts_plane = hyper_func(coeffis[1]+1,coeffis[2],coeffis[3],-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2, lty = 3)
## draw margin for positive class, so intercept minus 1. Why minus 1? See in Equation 1.5
pts_plane = hyper_func(coeffis[1]-1,coeffis[2],coeffis[3],-2,2,200)
lines(pts_plane[[1]], pts_plane[[2]], col = "black", lwd = 2, lty = 3)

Figure 3. Optimal hyperplane, maximal margin, plane, and support vector

In Figure 3, we observe three points positioned at equal distances from the maximal margin hyperplane, aligning with the dashed lines. These three points are support vectors to ‘support’ the maximal margin hyperplane (each class must have at least one support vector). In other words, any minor change in the position of a support vector would consequently shift the optimal hyperplane, while changes in the other points would not affect the hyperplane

The two dashed lines (plane: H_1 and H_-1)are the boundary of the optimal hyperplane.