How to remove punctuation from a Python String

Often during data analysis tasks, we come across text data that needs to be processed so that useful information can be derived from the data. During text processing, we may have to extract or remove certain text from the data to make it useful or we may also need to replace certain symbols and terms with other text to extract useful information. In this article, we will study punctuation marks and will look at the methods to remove punctuation marks from python strings.

Also Read: Mining Engineering Notes

What is a punctuation mark?

There are several symbols in English grammar which include comma, hyphen, question mark, dash, exclamation mark, colon, semicolon, parentheses, brackets etc which are termed as punctuation marks. These are used in English language for grammatical purposes but when we perform text processing in python we generally have to omit the punctuation marks from our strings. Now we will see different methods to remove punctuation marks from a string in Python.

Removing punctuation marks from string using for loop

In this method,first we will create an empty python string which will contain the output string. Then we will simply iterate through each character of the python string and check if it is a punctuation mark or not. If the character will be a punctuation mark, we will leave it. Otherwise we will include it in our output string using string concatenation.

For Example, In the code given below, we have each punctuation mark kept in a string named punctuation. We iterate through the input string myString using for loop and then we check if the character is present in the punctuation string or not. If it is not present, the character is included in the output string newString .

punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
print("The punctuation marks are:")
print(punctuation)
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
newString=""
for x in myString:
    if x not in punctuation:
        newString=newString+x
print("Output String is:")
print(newString)

Output

The punctuation marks are:
!()-[]{};:'"\, <>./?@#$%^&*_~
Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners

Remove punctuation marks from python string using regular expressions

We can also remove punctuation marks from strings in python using regular expressions. For this we will use re module in python which provides functions for processing strings using regular expressions.

In this method, we will substitute each character which is not an alphanumeric or space character by an empty string using re.sub() method  and hence all of the punctuation will be removed.

The syntax for sub() method is re.sub(pattern1, pattern2,input_string) where pattern1 denotes the pattern of the characters which will be replaced. In our case, we will provide a pattern which denotes characters which  is not an alphanumeric or space character. pattern2 is the final pattern by which characters in pattern1 will be replaced. In our case pattern2 will be empty string as we just have to remove the punctuation marks from our python string. input_string is the string which has to be processed to remove punctuation.

Example:

import re
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
newString=re.sub(r'[^\w\s]',emptyString,myString)
print("Output String is:")
print(newString)

Output:

Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners

Remove punctuation marks from python string using replace() method

Python string replace() method takes initial pattern and final pattern as parameters when invoked on a string and returns a resultant string where characters of initial pattern are replaced by characters in final pattern.

We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.

The syntax for replace() method is replace(character1,character2) where character1 is the character which will be replaced by given character in the parameter character2. In our case, character1 will contain punctuation marks and character2 will be an empty string.

punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
for x in punctuation:
    myString=myString.replace(x,emptyString)
print("Output String is:")
print(myString)

Output:

Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners

Remove punctuation marks from python string using translate() method

The translate() method replaces characters specified in the input string with new characters according to the translation table provided to the function as parameter. The translation table should contain the mapping of which characters have to be replaced by which characters. If the table does not have the mapping for any character, the character will not be replaced.

The syntax for translate() method is translate(translation_dictionary) where the translation_dictionary will be a python dictionary containing mapping of characters in the input string to the characters by which they will be replaced.

To create the translation table, we can use maketrans() method. This method takes the initial characters to be replaced, final characters and characters to be deleted from the string in the form of string as optional input and returns a python dictionary which works as translation table.

The syntax for maketrans() method is maketrans(pattern1,pattern2,optional_pattern). Here pattern1 will be a string containing all the characters which are to be replaced. pattern2 will be a string containing the characters by which characters in pattern1 will be replaced. Here the length of pattern1 should be equal to length of pattern2optional_pattern is a string containing the characters which have to be deleted from the input text. In our case, pattern1 and pattern2 will be empty strings while optional_pattern will be a string containing punctuation marks.

To create a translation table for removing punctuation from python string, we can leave empty the first two parameters of maketrans() function and include the punctuation marks in the list of characters to be excluded. In this way all the punctuation marks will be deleted and output string will be obtained.

Example

punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
translationTable= str.maketrans("","",punctuation)
newString=myString.translate(translationTable)
print("Output String is:")
print(newString)

Output:

Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners

Conclusion

In this article, we have seen how to remove punctuation marks from strings in python using for loop , regular expressions and inbuilt string methods like replace() and translate(). Stay tuned for more informative articles.

Leave a Reply

Your email address will not be published. Required fields are marked *