# Extract Numbers from Strings in R

The functions `parse_integer()`, `parse_double()`, and `parse_number()` from the readr library transform a character vector into a numeric vector.

• Use `parse_integer()` when all characters in a string can be transformed into integers, for example: “1” and “-2”.
• Use `parse_double()` when all characters in a string can be transformed into numbers, for example: “1”, “1.2”, and “1e2”.
• Use `parse_number()` when you want to extract the first number from a string that contains characters other than numbers, for example: “text1”.

Here’s an example that compares these 3 functions:

```library(readr)

n = c('1',
'1.2',
'1e2',
'1,000',
'1,2',
'1/2',
'text-1.2text',
'text')

parse_integer(n)
#outputs: 1 NA NA NA NA NA NA NA

parse_double(n)
#outputs: 1.0 1.2 100.0 NA NA NA NA NA

parse_number(n)
#outputs: 1.0 1.2 100.0 1000.0 12.0 1.0 -1.2 NA```

## Exercises

### 1. Extract the number 1000000 from “1 000 000”

Not all characters in this string can be transformed into an integer (since we have white spaces), so we will use `parse_number()`, but we will have to set grouping_mark:

```n1 = "1 000 000"
parse_number(n1)
#outputs: 1

parse_number(n1,
locale = locale(grouping_mark = " "))
#outputs: 1e+06```

By default `grouping_mark = ","` which makes `parse_number(1,000)` output: 1000.

### 2. Extract the number 123456 from “123-456”

One simple way to deal with the hyphen is to replace it with an empty string using `str_replace()` from the stringr library and then pass its output to `parse_integer()`:

```n1 = "123-456"
parse_number(n1) #outputs: 123

library(stringr)
n2 = str_replace(n1, "-", "")
# n2 is now: "123456"

parse_integer(n2)
#outputs: 123456```

### 3. Extract the number 1000 from “1*10^3”

The trick here is to realize that “*10^” can be replaced with “e” which has the same effect but can be read by `parse_double()` or `parse_number()`:

```n1 = "1*10^3"

library(stringr)
n2 = str_replace(n, "\\*10\\^", 'e')
# n2 is now "1e3"

parse_double(n2)
#outputs: 1000```

Since * and ^ have special meanings in regular expressions, we had to escape, using double backslashes \\ before each of these special characters, to match them specifically.