The conditional distribution of a variable, for example heights, is the distribution of heights given the value of another variable, for example gender. Plotting the conditional distribution of heights given gender is a way of visualizing the relationship between the 2 variables.
The marginal distribution of heights is the distribution of heights for everybody, independent of gender. Plotting the marginal distribution of heights is a way of describing the probability of occurrence of each value of the variable.
If heights was a numerical variable
The density plot below represents the marginal and conditional distributions of heights:
- The dashed curve is the marginal distribution of heights — the distribution of everybody’s heights regardless of their gender.
- The pink and blue curves are the conditional distributions:
- The pink curve is the distribution of heights given that the gender is female.
- The blue curve is the distribution of heights given that the gender is male.

The conditional distribution of males is shifted to the right which means that, on average, males are taller than females. The marginal distribution is bimodal which reflects the fact that it is the combination of the 2 conditional distributions.
If heights was a categorical variable
Now consider the case where heights is a binary categorical variable that can take on 2 values: “Tall” or “Normal/Short”.
The following bar plot represents the marginal distribution of heights (regardless of gender):

The bar plot shows that 13% of all participants are tall, and 87% are either normal or short.
The following stacked bar plot represents the distribution of heights conditional on gender:

Given that the person is female, then the probability of being tall is only 3%; And given that the person is male, this probability is 23%.
The conditional distribution of a categorical variable can also be represented in a table:
Female | Male | |
---|---|---|
Normal/Short | 97% | 77% |
Tall | 3% | 23% |
R code
Here’s the R code that generated these plots:
## marginal and conditional distributions of a numerical variable ################################################################# set.seed(1) # sample female heights from a normal distribution # with mean = 63 and std = 2.5 female.heights = rnorm(n = 100, mean = 63, sd = 2.5) # sample male heights from a normal distribution # with mean = 69 and std = 3 male.heights = rnorm(n = 100, mean = 69, sd = 3) # combine female and male data all.heights = c(male.heights, female.heights) plot(density(all.heights), # plot density of all heights lwd = 2, # line thickness ylim = c(0, 0.2), # y-axis limits lty = 2, # dashed line main = '', # remove title xlab = 'Height (inch)') # x-axis label polygon(density(male.heights), # plot density of male heights col = rgb(0.325, 0.596, 0.745, alpha = 0.5), # fill color border = '#5398BE', # line color lwd = 2) # line thickness polygon(density(female.heights), # plot density of female heights col = rgb(0.906, 0.353, 0.486, alpha = 0.5), # fill color border = '#E75A7C', # line color lwd = 2) # line width legend(x = 75, y = 0.2, # coordinates of the legend legend = c("females", "males", "everyone"), col = c("#E75A7C", "#5398BE", "black"), lty = c(1, 1, 2), lwd = 2) ## marginal and conditional distributions of a categorical variable ################################################################### dat = data.frame(gender = c(rep('Female', 100), rep('Male', 100)), heights = c(female.heights, male.heights)) # males above 71 inches are tall dat$isTall[dat$gender == 'Male' & dat$heights >= 71] = 'Tall' # males below 71 inches are normal/short dat$isTall[dat$gender == 'Male' & dat$heights < 71] = 'Normal/Short' # females above 67 inches are tall dat$isTall[dat$gender == 'Female' & dat$heights >= 67] = 'Tall' # females below 67 inches are normal/short dat$isTall[dat$gender == 'Female' & dat$heights < 67] = 'Normal/Short' # marginal distribution of isTall library(scales) # to show percentages on the y-axis x = barplot(prop.table(table(dat$isTall)), col = rep(c('#86DEB7', '#5398BE')), legend = TRUE, yaxt = 'n', # remove y-axis ylab = 'Percent of participants', ylim = c(0, 1), main = 'Marginal distribution of heights') # creating a y-axis with percentages yticks = seq(0, 0.9, by = 0.05) axis(2, at = yticks, lab = percent(yticks)) # showing the values of each category y = prop.table(table(dat$isTall)) text(x[1], y[1]/2, labels = paste0(as.character(y[1]*100), '%')) # placing Normal/short values text(x[2], y[2]/2, labels = paste0(as.character(y[2]*100), '%')) # placing Tall values # distribution of isTall conditional on gender library(scales) # to show percentages on the y-axis x = barplot(prop.table(table(dat$isTall, dat$gender), 2), col = rep(c('#86DEB7', '#5398BE')), legend = TRUE, yaxt = 'n', # remove y-axis ylab = 'Percent of participants', ylim = c(0, 1), main = 'Conditional distribution of heights') # creating a y-axis with percentages yticks = seq(0, 1, by = 0.05) axis(2, at = yticks, lab = percent(yticks)) # showing the values of each category y = prop.table(table(dat$isTall, dat$gender), 2) text(x, y[1,]/2, labels = paste0(as.character(y[1,]*100), '%')) # placing Normal/short values text(x, y[1,] + y[2,]/2, labels = paste0(as.character(y[2,]*100), '%')) # placing Tall values
For a tutorial on how to use the functions table()
and prop.table()
, see: How to Describe/Summarize Categorical Data in R (Example).