Working with Sets in R (Tutorial)

A set is an unordered collection of unique elements. It is helpful to keep track of distinct objects.

In this tutorial, you will learn how to:

  1. Create a set
  2. Manipulate sets
  3. Work with subsets
  4. Apply set operations

1. Create a set

1.1. Create a set from scratch

# load library
library(sets)

# create a set: A
A = set(1,2,3)

# A is now the set: {1, 2, 3}

A set can contain different types of elements, such as: strings, functions, sets, and vectors.

# a set can contain different types of elements
A = set('hi', print, 8, set(3,4), c(1,2))

# print(A) outputs : {"hi", <<function>>, 8, {3, 4}, <<numeric(2)>>}

Notice that the vector c(1,2) is printed as: <<numeric(2)>>

Which is not a clear output.

So, for aesthetic reasons we can use the function tuple to create vectors inside a set:

# use "tuple" to create vectors inside a set
A = set('hi', print, 8, set(3,4), tuple(1,2))

# A is now: {"hi", <<function>>, 8, {3, 4}, (1, 2)}

1.2. Create a set from objects of other types

Use the function as.set() to create a set from other objects.

# create a set from a vector
A = as.set(c(3,4,5))

# A is now: {3, 4, 5}

1.3. Create a set from letters in a word

word = "noon"

# first split the word into a vector of individual characters
chars = strsplit(word, split = "")[[1]]
# chars is now: "n" "o" "o" "n"

# then use "as.set" to create the set
A = as.set(chars)

# A is now: {"n", "o"}

1.4. Create an empty set

# create an empty set: A = ∅
A = set() # print(A) outputs: {}

# test whether a set is empty
set_is_empty(A) # outputs: TRUE

2. Manipulate sets

2.1. Select and replace elements of a set

The elements of a set are unordered, and so we cannot select an element based on its position in the set.

But we can use the value of that element as an index to select it.

For example:

# here's a set A
A = set(print, 4, 5, 6, set(7, 8))
# A is now: {<<function>>, 4, 5, 6, {7, 8}}

# the value 4 is the second element in A, but:
A[2] # outputs an empty set: {}
# since the value 2 is not an element of A

# to select 4, use:
A[[4]] # outputs the value: 4
# note that if you use single square brackets:
A[4] # the output will be a set: {4}

# to select the set {7,8} inside A, use:
A[[set(7,8)]] # outputs: {7, 8}

# to change the value 4 to 1, use:
A[[4]] = 1
# A becomes: {<<function>>, 1, 5, 6, {7, 8}}

# to change the value 7 to 9, use:
A[[set(7,8)]][[7]] = 9
# A is now: {<<function>>, 1, 5, 6, {8, 9}}

2.2. Check the number of elements in a set

# number of elements in a set
A = set(print, 4, 5, 6, set(7, 8))
length(A) # outputs: 5

Exercise: How many digits of π should we print to have all numbers from 0-9 appear on the screen?

# here are the first 100 digits of Pi (as characters)
Pi = "3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067"

# removing the decimal point
digits = gsub(".", "", Pi, fixed = TRUE) # we set fixed = True to exactly match the dot "."

for (i in 1:100) {
  # extract the first i digits of Pi
  digits_char = substr(digits, 1, i)
  
  # create a vector from the first i digits of Pi
  digits_vec = strsplit(digits_char, split = "")[[1]]
  
  # create a set from these first i digits
  digits_set = as.set(digits_vec)
  
  # check if this set has all 10 digits (0-9)
  if (length(digits_set) == 10) {
    print(paste("The answer is:", i, "digits"))
    break
  }
}

# output: "The answer is: 33 digits"

2.3. Apply a built-in function to a set

# here's a set A
A = set(1, 3, 5)

# find the mean of the set A
mean(A) # outputs: 3

# find the median of the set A
median(A) # outputs: 3

# find the range of the set A
range(A) # outputs: 1 5

# transform all elements of A to characters
as.character(A) # outputs: "1" "3" "5"

2.4. Apply a custom function to all elements of a set

Use the sapply function to apply a custom function to all elements of a set:

# here's a set A
A = set(-1, 1, 2)


# apply a function that returns the square root of its input
# only if the number if positive
sapply(A, function(x) {ifelse(x > 0, sqrt(x), x)})

# outputs: -1.000000  1.000000  1.414214

3. Work with subsets

3.1. Check if an element is in a set

Use %e% to check if an element is in a set.

A = set(1, 2, 3)

# check if 1 is an element of A
1 %e% A

# outputs: TRUE

3.2. Check if a set is a subset of another set

Use <= to check if a set is a subset of another set.

Use < to check if a set is a proper subset of another set. A is a proper subset of B if all elements of A are in B and B has at least 1 element that is not in A.

A = set(1, 2, 3)
B = set(1, 2, 3)
C = set(1, 2, 3, 4)

A < B # outputs: FALSE
A <= B # outputs: TRUE

A < C # outputs: TRUE
A <= C # outputs: TRUE

3.3. Check if 2 sets are equal

Two sets A and B are equal if all elements of A are in B and all elements of B are in A. So, A must be a subset of B and B a subset of A.

A = set(1, 2, 3)
B = set(1, 2, 3)
C = set(1, 2, 3, 4)

A == B # outputs: TRUE
A == C # outputs: FALSE

3.4. Find all subsets of a set (the power set)

The power set of a set A, denoted P(A), is the set of all subsets of A.

The power set always has 2n elements.

For example:

A = set(1, 2)

# find the power set of A
set_power(A) # outputs: {{}, {1}, {2}, {1,2}}

# you can also use the command 2^A to get the power set of A

# get subsets of a specific length:
# to get the subsets of A of length 1 only, use:
set_combn(A, 1) # outputs: {{1}, {2}}

4. Apply set operations

Suppose we have 2 sets A and B, such that:

A = set("I", "Love", "R")
B = set("I", "Love", "Sets")

Here’s a Venn diagram that represents the relationship between A and B:

Venn diagram of the 2 sets: A, B

4.1. Union of sets

The union of 2 sets A and B, denoted A ∪ B, is the set of all elements that are either in A or B.

The shaded region below represents A ∪ B:

Venn diagram representing the union of 2 sets: A and B
# use | or the function c() to find the union of 2 or more sets
A | B # outputs: {"I", "Love", "R", "Sets"}

c(A, B) # also outputs: {"I", "Love", "R", "Sets"}

4.2. Intersection of sets

The intersection of A and B, denoted A ∩ B, is the set of all elements that are common between A and B.

The shaded region below represents A ∩ B:

Venn diagram representing the intersection of 2 sets: A and B
# use & to find the intersection of 2 or more sets
A & B # outputs: {"I", "Love"}

Two sets are said to be disjoint if they are non-overlapping, i.e. their intersection is the empty set.

For example:

X = set(1, 2)
Y = set(3, 4)

X & Y # outputs: {}

# so X and Y are disjoint,
# since they don't have any elements in common

4.3. Set difference

The difference between B and A, denoted B – A, is the set of all elements of B which are not in A.

The shaded region below represents B – A:

Venn diagram representing the difference of 2 sets: B and A
B - A # outputs: {"Sets"}

A - B # outputs: {"R"}

4.4. Symmetric difference

Symmetric difference of A and B, denoted A Δ B, is: (A – B) ∪ (B – A).

The shaded region below represents A Δ B:

Venn diagram representing the symmetric difference of 2 sets: A and B
# use %D% or the function set_symdiff to find the symmetric difference of A and B
set_symdiff(A, B) # outputs: {"R", "Sets"}

B %D% A # also outputs: {"R", "Sets"}

4.5. Complement of a set

The complement of a set A, denoted Ac, is the set of all elements of the universal set U that are not present in A.

The shaded region below represents Ac:

Venn diagram representing the complement of a set: A
# for simplicity, let the universal set U be: A ∪ B
U = A | B

# to find the complement of A, use:
set_complement(A, U)

# outputs: {"Sets"}

Further reading