Extract Numbers from Strings in R

The functions parse_integer(), parse_double(), and parse_number() from the readr library transform a character vector into a numeric vector.

  • Use parse_integer() when all characters in a string can be transformed into integers, for example: “1” and “-2”.
  • Use parse_double() when all characters in a string can be transformed into numbers, for example: “1”, “1.2”, and “1e2”.
  • Use parse_number() when you want to extract the first number from a string that contains characters other than numbers, for example: “text1”.

Here’s an example that compares these 3 functions:


n = c('1',

#outputs: 1 NA NA NA NA NA NA NA

#outputs: 1.0 1.2 100.0 NA NA NA NA NA

#outputs: 1.0 1.2 100.0 1000.0 12.0 1.0 -1.2 NA


1. Extract the number 1000000 from “1 000 000”

Not all characters in this string can be transformed into an integer (since we have white spaces), so we will use parse_number(), but we will have to set grouping_mark:

n1 = "1 000 000"
#outputs: 1

             locale = locale(grouping_mark = " "))
#outputs: 1e+06

By default grouping_mark = "," which makes parse_number(1,000) output: 1000.

2. Extract the number 123456 from “123-456”

One simple way to deal with the hyphen is to replace it with an empty string using str_replace() from the stringr library and then pass its output to parse_integer():

n1 = "123-456"
parse_number(n1) #outputs: 123

n2 = str_replace(n1, "-", "")
# n2 is now: "123456"

#outputs: 123456

3. Extract the number 1000 from “1*10^3”

The trick here is to realize that “*10^” can be replaced with “e” which has the same effect but can be read by parse_double() or parse_number():

n1 = "1*10^3"

n2 = str_replace(n, "\\*10\\^", 'e')
# n2 is now "1e3"

#outputs: 1000

Since * and ^ have special meanings in regular expressions, we had to escape, using double backslashes \\ before each of these special characters, to match them specifically.

Further reading