While processing text data, it may be a situation that we have to extract numbers from the text data. In python, we process text data using strings. So, the task we have to do is to find and split a number in a string. While extracting the numbers, we can classify the string into two types. The first type will contain only numbers which are space separated and the second type of strings will also contain alphabets and punctuation marks along with the numbers. In this article, we will see how we can extract numbers from both the types of strings one by one. So, let’s dive into it.

Split a number in a string when the string contains only space separated numbers.

When the string contains only space separated numbers in string format, we can simply split the string at the spaces using python string split operation. The split method when invoked on any string returns a list of sub strings which will be numbers in string format in our case. After getting the numbers in the string format in the list, we can convert all the strings to integers using int() function. This can be done as follows.

num_string="10 1 23 143 234 108 1117"
print("String of numbers is:")
print(num_string)
str_list=num_string.split()
print("List of numbers in string format is:")
print(str_list)
num_list=[]
for i in str_list:
    num_list.append(int(i))    
print("Output List of numbers is:")
print(num_list)

Output:

String of numbers is:
10 1 23 143 234 108 1117
List of numbers in string format is:
['10', '1', '23', '143', '234', '108', '1117']
Output List of numbers is:
[10, 1, 23, 143, 234, 108, 1117]

We can also perform the above operation using list comprehension as follows.

num_string="10 1 23 143 234 108 1117"
print("String of numbers is:")
print(num_string)
str_list=num_string.split()
print("List of numbers in string format is:")
print(str_list)
num_list=[int(i) for i in str_list] 
print("Output List of numbers is:")
print(num_list)

Output:

String of numbers is:
10 1 23 143 234 108 1117
List of numbers in string format is:
['10', '1', '23', '143', '234', '108', '1117']
Output List of numbers is:
[10, 1, 23, 143, 234, 108, 1117]

We can also use map() function to convert the strings in the list to integers. The map() function takes as input a function and an iterable as argument and executes the function on each element of the iterable object and returns  the output map object which can be converted into any iterable. Here we will provide the int() function as first argument and the list of strings as second argument so that the strings can be converted to integers. This can be done as follows.

num_string="10 1 23 143 234 108 1117"
print("String of numbers is:")
print(num_string)
str_list=num_string.split()
print("List of numbers in string format is:")
print(str_list)
num_list=list(map(int,str_list))
print("Output List of numbers is:")
print(num_list)

Output:

String of numbers is:
10 1 23 143 234 108 1117
List of numbers in string format is:
['10', '1', '23', '143', '234', '108', '1117']
Output List of numbers is:
[10, 1, 23, 143, 234, 108, 1117]

Split a number in a string when the string contains Alphabets.

When the string contains alphabets, we will first extract the numbers out of the string using regular expressions and then we will convert the numbers to integer form.

For extracting the numbers, we will use findall() method from re module. re.findall() takes the pattern (one or more digits in our case)  and string as input and returns the list of substrings where the pattern is matched. After extracting the list of numbers in the form of string, we can convert the strings into integers as follows.

import re
num_string="I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117"
print("Given String is:")
print(num_string)
pattern="\d+"
str_list=re.findall(pattern,num_string)
print("List of numbers in string format is:")
print(str_list)
num_list=[]
for i in str_list:
    num_list.append(int(i)) 
print("Output List of numbers is:")
print(num_list)

Output:

Given String is:
I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117
List of numbers in string format is:
['20', '102', '23', '120', '1117']
Output List of numbers is:
[20, 102, 23, 120, 1117]

We can use list comprehension to perform the above operation as follows.

import re
num_string="I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117"
print("Given String is:")
print(num_string)
pattern="\d+"
str_list=re.findall(pattern,num_string)
print("List of numbers in string format is:")
print(str_list)
num_list=[int(i) for i in str_list]
print("Output List of numbers is:")
print(num_list)

Output:

Given String is:
I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117
List of numbers in string format is:
['20', '102', '23', '120', '1117']
Output List of numbers is:
[20, 102, 23, 120, 1117]

We can perform the same operation using map() function as follows.

import re
num_string="I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117"
print("Given String is:")
print(num_string)
pattern="\d+"
str_list=re.findall(pattern,num_string)
print("List of numbers in string format is:")
print(str_list)
num_list=list(map(int,str_list))
print("Output List of numbers is:")
print(num_list)

Output:

Given String is:
I have solved 20 ques102tions in last 23 days and have scored 120 marks with rank 1117
List of numbers in string format is:
['20', '102', '23', '120', '1117']
Output List of numbers is:
[20, 102, 23, 120, 1117]

Conclusion

In this article, we have seen how to split a number in a string and extract it into another list using different methods like list comprehension and regular expressions. Stay tuned for more informative articles.

Leave a Reply

Your email address will not be published. Required fields are marked *