In this article, we will produce the following R plot that represents the decision boundary of a logistic regression model:

Here’s the full code used to generate it:

set.seed(1) x1 = rnorm(50) x2 = rnorm(50) y = (x1 + x2 + rnorm(50)) > 0 model = glm(y ~ x1 + x2, family = binomial) x1_ticks = seq(min(x1), max(x1), length.out=80) x2_ticks = seq(min(x2), max(x2), length.out=80) background_grid = expand.grid(x1_ticks, x2_ticks) names(background_grid) = c('x1', 'x2') grid_predictions = predict(model, newdata = background_grid, type = 'response') plot(background_grid, pch = 20, cex = 0.5, col = ifelse(grid_predictions > 0.5, '#E75A7C', '#5398BE')) points(x1, x2, pch = 19, col = ifelse(y == TRUE, '#E75A7C', '#5398BE')) slope = coef(model)[2]/(-coef(model)[3]) intercept = coef(model)[1]/(-coef(model)[3]) clip(min(x1),max(x1), min(x2), max(x2)) abline(intercept, slope, lwd = 2, lty = 2)

## Code explanation

First, we create some data (2 continuous variables *x1* and *x2*, and 1 binary variable *y*) and run a logistic regression:

# create data set.seed(1) x1 = rnorm(50) x2 = rnorm(50) y = (x1 + x2 + rnorm(50)) > 0 # run logistic regression model = glm(y ~ x1 + x2, family = binomial)

Next, we will create 2 variables *x1_ticks* and *x2_ticks* based on the ranges of *x1* and *x2*, that contain 80 equally spaced values each. Then, we create a dataframe called *background_grid* that contains all combinations of *x1_ticks* and *x2_ticks* (so 80 × 80 = 6400 rows):

# create 2 variables based on x1 and x2 that contain equally spaced values x1_ticks = seq(min(x1), max(x1), length.out=80) x2_ticks = seq(min(x2), max(x2), length.out=80) # extract all combinations of x1_ticks and x2_ticks background_grid = expand.grid(x1_ticks, x2_ticks) # change the names of the variables of background_grid names(background_grid) = c('x1', 'x2') head(background_grid) # x1 x2 # 1 -2.214700 -1.804959 # 2 -2.166472 -1.804959 # 3 -2.118245 -1.804959 # 4 -2.070017 -1.804959 # 5 -2.021789 -1.804959 # 6 -1.973562 -1.804959

Predict the *y* values of *background_grid* according to the logistic regression model, and plot the background:

# predict y values of background_grid grid_predictions = predict(model, newdata = background_grid, type = 'response') # plot background plot(background_grid, pch = 20, # points shape cex = 0.5, # points size col = ifelse(grid_predictions > 0.5, '#E75A7C', '#5398BE')) # points colored according to their predicted class

**Output:**

Next, we will plot on top of the background grid the data points (x1, x2) and color them according to their real y values:

points(x1, x2, pch = 19, # points shape col = ifelse(y == TRUE, '#E75A7C', '#5398BE')) # points colored according to their real y values

**Output:**

Finally, we will add the decision boundary:

(For information on the formula used to calculate the slope and intercept of the decision boundary line, see this article on StackExchange)

# calculate the slope and intercept of the decision boundary line slope = coef(model)[2]/(-coef(model)[3]) intercept = coef(model)[1]/(-coef(model)[3]) # limit the decision boundary line # by the min and max values of x1 and x2 clip(min(x1),max(x1), min(x2), max(x2)) # plot the decision boundary abline(intercept, slope, lwd = 2, # line thickness lty = 2) # type dashed

**Output:**

The plot shows that the logistic regression model misclassified some points, so if you want, you can try adding quadratic terms to the model to see if you get a better fit.