# Plot Logistic Regression Decision Boundary in R

In this article, we will produce the following R plot that represents the decision boundary of a logistic regression model:

Here’s the full code used to generate it:

```set.seed(1)
x1 = rnorm(50)
x2 = rnorm(50)
y = (x1 + x2 + rnorm(50)) > 0

model = glm(y ~ x1 + x2, family = binomial)

x1_ticks = seq(min(x1), max(x1), length.out=80)
x2_ticks = seq(min(x2), max(x2), length.out=80)

background_grid = expand.grid(x1_ticks, x2_ticks)
names(background_grid) = c('x1', 'x2')

grid_predictions = predict(model,
newdata = background_grid,
type = 'response')

plot(background_grid, pch = 20, cex = 0.5, col = ifelse(grid_predictions > 0.5, '#E75A7C', '#5398BE'))
points(x1, x2, pch = 19, col = ifelse(y == TRUE, '#E75A7C', '#5398BE'))

slope = coef(model)/(-coef(model))
intercept = coef(model)/(-coef(model))

clip(min(x1),max(x1), min(x2), max(x2))
abline(intercept, slope, lwd = 2, lty = 2)```

## Code explanation

First, we create some data (2 continuous variables x1 and x2, and 1 binary variable y) and run a logistic regression:

```# create data
set.seed(1)
x1 = rnorm(50)
x2 = rnorm(50)
y = (x1 + x2 + rnorm(50)) > 0

# run logistic regression
model = glm(y ~ x1 + x2, family = binomial)```

Next, we will create 2 variables x1_ticks and x2_ticks based on the ranges of x1 and x2, that contain 80 equally spaced values each. Then, we create a dataframe called background_grid that contains all combinations of x1_ticks and x2_ticks (so 80 × 80 = 6400 rows):

```# create 2 variables based on x1 and x2 that contain equally spaced values
x1_ticks = seq(min(x1), max(x1), length.out=80)
x2_ticks = seq(min(x2), max(x2), length.out=80)

# extract all combinations of x1_ticks and x2_ticks
background_grid = expand.grid(x1_ticks, x2_ticks)

# change the names of the variables of background_grid
names(background_grid) = c('x1', 'x2')

#          x1        x2
# 1 -2.214700 -1.804959
# 2 -2.166472 -1.804959
# 3 -2.118245 -1.804959
# 4 -2.070017 -1.804959
# 5 -2.021789 -1.804959
# 6 -1.973562 -1.804959```

Predict the y values of background_grid according to the logistic regression model, and plot the background:

```# predict y values of background_grid
grid_predictions = predict(model,
newdata = background_grid,
type = 'response')

# plot background
plot(background_grid,
pch = 20, # points shape
cex = 0.5, # points size
col = ifelse(grid_predictions > 0.5, '#E75A7C', '#5398BE'))
# points colored according to their predicted class```

Output:

Next, we will plot on top of the background grid the data points (x1, x2) and color them according to their real y values:

```points(x1, x2,
pch = 19, # points shape
col = ifelse(y == TRUE, '#E75A7C', '#5398BE'))
# points colored according to their real y values```

Output:

Finally, we will add the decision boundary:

(For information on the formula used to calculate the slope and intercept of the decision boundary line, see this article on StackExchange)

```# calculate the slope and intercept of the decision boundary line
slope = coef(model)/(-coef(model))
intercept = coef(model)/(-coef(model))

# limit the decision boundary line
# by the min and max values of x1 and x2
clip(min(x1),max(x1), min(x2), max(x2))

# plot the decision boundary
abline(intercept, slope,
lwd = 2, # line thickness
lty = 2) # type dashed```

Output:

The plot shows that the logistic regression model misclassified some points, so if you want, you can try adding quadratic terms to the model to see if you get a better fit.