A set is an unordered collection of unique elements. It is helpful to keep track of distinct objects.
In this tutorial, you will learn how to:
1. Create a set
1.1. Create a set from scratch
# load library library(sets) # create a set: A A = set(1,2,3) # A is now the set: {1, 2, 3}
A set can contain different types of elements, such as: strings, functions, sets, and vectors.
# a set can contain different types of elements A = set('hi', print, 8, set(3,4), c(1,2)) # print(A) outputs : {"hi", <<function>>, 8, {3, 4}, <<numeric(2)>>}
Notice that the vector c(1,2) is printed as: <<numeric(2)>>
Which is not a clear output.
So, for aesthetic reasons we can use the function tuple to create vectors inside a set:
# use "tuple" to create vectors inside a set A = set('hi', print, 8, set(3,4), tuple(1,2)) # A is now: {"hi", <<function>>, 8, {3, 4}, (1, 2)}
1.2. Create a set from objects of other types
Use the function as.set() to create a set from other objects.
# create a set from a vector A = as.set(c(3,4,5)) # A is now: {3, 4, 5}
1.3. Create a set from letters in a word
word = "noon" # first split the word into a vector of individual characters chars = strsplit(word, split = "")[[1]] # chars is now: "n" "o" "o" "n" # then use "as.set" to create the set A = as.set(chars) # A is now: {"n", "o"}
1.4. Create an empty set
# create an empty set: A = ∅ A = set() # print(A) outputs: {} # test whether a set is empty set_is_empty(A) # outputs: TRUE
2. Manipulate sets
2.1. Select and replace elements of a set
The elements of a set are unordered, and so we cannot select an element based on its position in the set.
But we can use the value of that element as an index to select it.
For example:
# here's a set A A = set(print, 4, 5, 6, set(7, 8)) # A is now: {<<function>>, 4, 5, 6, {7, 8}} # the value 4 is the second element in A, but: A[2] # outputs an empty set: {} # since the value 2 is not an element of A # to select 4, use: A[[4]] # outputs the value: 4 # note that if you use single square brackets: A[4] # the output will be a set: {4} # to select the set {7,8} inside A, use: A[[set(7,8)]] # outputs: {7, 8} # to change the value 4 to 1, use: A[[4]] = 1 # A becomes: {<<function>>, 1, 5, 6, {7, 8}} # to change the value 7 to 9, use: A[[set(7,8)]][[7]] = 9 # A is now: {<<function>>, 1, 5, 6, {8, 9}}
2.2. Check the number of elements in a set
# number of elements in a set A = set(print, 4, 5, 6, set(7, 8)) length(A) # outputs: 5
Exercise: How many digits of π should we print to have all numbers from 0-9 appear on the screen?
# here are the first 100 digits of Pi (as characters) Pi = "3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067" # removing the decimal point digits = gsub(".", "", Pi, fixed = TRUE) # we set fixed = True to exactly match the dot "." for (i in 1:100) { # extract the first i digits of Pi digits_char = substr(digits, 1, i) # create a vector from the first i digits of Pi digits_vec = strsplit(digits_char, split = "")[[1]] # create a set from these first i digits digits_set = as.set(digits_vec) # check if this set has all 10 digits (0-9) if (length(digits_set) == 10) { print(paste("The answer is:", i, "digits")) break } } # output: "The answer is: 33 digits"
2.3. Apply a built-in function to a set
# here's a set A A = set(1, 3, 5) # find the mean of the set A mean(A) # outputs: 3 # find the median of the set A median(A) # outputs: 3 # find the range of the set A range(A) # outputs: 1 5 # transform all elements of A to characters as.character(A) # outputs: "1" "3" "5"
2.4. Apply a custom function to all elements of a set
Use the sapply function to apply a custom function to all elements of a set:
# here's a set A A = set(-1, 1, 2) # apply a function that returns the square root of its input # only if the number if positive sapply(A, function(x) {ifelse(x > 0, sqrt(x), x)}) # outputs: -1.000000 1.000000 1.414214
3. Work with subsets
3.1. Check if an element is in a set
Use %e% to check if an element is in a set.
A = set(1, 2, 3) # check if 1 is an element of A 1 %e% A # outputs: TRUE
3.2. Check if a set is a subset of another set
Use <= to check if a set is a subset of another set.
Use < to check if a set is a proper subset of another set. A is a proper subset of B if all elements of A are in B and B has at least 1 element that is not in A.
A = set(1, 2, 3) B = set(1, 2, 3) C = set(1, 2, 3, 4) A < B # outputs: FALSE A <= B # outputs: TRUE A < C # outputs: TRUE A <= C # outputs: TRUE
3.3. Check if 2 sets are equal
Two sets A and B are equal if all elements of A are in B and all elements of B are in A. So, A must be a subset of B and B a subset of A.
A = set(1, 2, 3) B = set(1, 2, 3) C = set(1, 2, 3, 4) A == B # outputs: TRUE A == C # outputs: FALSE
3.4. Find all subsets of a set (the power set)
The power set of a set A, denoted P(A), is the set of all subsets of A.
The power set always has 2n elements.
For example:
A = set(1, 2) # find the power set of A set_power(A) # outputs: {{}, {1}, {2}, {1,2}} # you can also use the command 2^A to get the power set of A # get subsets of a specific length: # to get the subsets of A of length 1 only, use: set_combn(A, 1) # outputs: {{1}, {2}}
4. Apply set operations
Suppose we have 2 sets A and B, such that:
A = set("I", "Love", "R") B = set("I", "Love", "Sets")
Here’s a Venn diagram that represents the relationship between A and B:
4.1. Union of sets
The union of 2 sets A and B, denoted A ∪ B, is the set of all elements that are either in A or B.
The shaded region below represents A ∪ B:
# use | or the function c() to find the union of 2 or more sets A | B # outputs: {"I", "Love", "R", "Sets"} c(A, B) # also outputs: {"I", "Love", "R", "Sets"}
4.2. Intersection of sets
The intersection of A and B, denoted A ∩ B, is the set of all elements that are common between A and B.
The shaded region below represents A ∩ B:
# use & to find the intersection of 2 or more sets A & B # outputs: {"I", "Love"}
Two sets are said to be disjoint if they are non-overlapping, i.e. their intersection is the empty set.
For example:
X = set(1, 2) Y = set(3, 4) X & Y # outputs: {} # so X and Y are disjoint, # since they don't have any elements in common
4.3. Set difference
The difference between B and A, denoted B – A, is the set of all elements of B which are not in A.
The shaded region below represents B – A:
B - A # outputs: {"Sets"} A - B # outputs: {"R"}
4.4. Symmetric difference
Symmetric difference of A and B, denoted A Δ B, is: (A – B) ∪ (B – A).
The shaded region below represents A Δ B:
# use %D% or the function set_symdiff to find the symmetric difference of A and B set_symdiff(A, B) # outputs: {"R", "Sets"} B %D% A # also outputs: {"R", "Sets"}
4.5. Complement of a set
The complement of a set A, denoted Ac, is the set of all elements of the universal set U that are not present in A.
The shaded region below represents Ac:
# for simplicity, let the universal set U be: A ∪ B U = A | B # to find the complement of A, use: set_complement(A, U) # outputs: {"Sets"}