python regular expression

Python Regular Expressions

Python Regular Expressions or Regex is a sequence of characters that forms a search pattern. They can be used to check if a string in a program contains the specified search pattern or not. They are mostly used in the UNIX world. The re module raises the exception re.error if an error occurs while compiling or using a regular expression. Its main use is to offer a search, where it takes a regex and a string. However, it either returns the first match or else returns none.

Python search(), start() and end() functions

Please check the below example to understand the different functions.

Example :

import re

s = 'Developer Helps: First solve, then write the code'
match ='write', s)
print('Start Index:', match.start())
print('End Index:', match.end())

Output :

Start Index: 35
End Index: 40

Here r character (r’write’) stands for raw, not regex. The raw string is slightly different from a regular string, it won’t interpret the \ character as an escape character. This is because the regular expression engine uses \ character for its own escaping purpose.

Metacharacters in python regular expressions

They are the special characters that affect how the regular expressions around them are interpreted. Metacharacters don’t match themselves. Instead, they indicate some rules. Characters or signs like | , + , or * , are special characters.

\drops the special meaning of character following it
[]represents a class of the character
^character matches the beginning
$ character matches the end
.matches any characters except newline
?means 0 or no occurrence

match function syntax

re.match(pattern, string, flags=0)

This function attempts to match the RE pattern to string with optional flags. Below is the description of the paraments used under the ‘match’ function.

  • pattern: This is the regular expression that the user wants to match.
  • string: This is the string that the user would search to match the pattern at the beginning of the string.
  • flag: You can specify different flags using bitwise OR (|). These are also called modifiers.

Python program to understand re.match()

Please check the below example to understand match() function.

Example :

import re
Substring ='string' 
Stringa ='''We are learning regex with Developer Helps
         regex is very useful for string matching.
          It is fast too.'''
Stringb ='''string We are learning regex with Developer Helps
         regex is very useful for string matching.
          It is fast too.'''
print(re.match(Substring, Stringa, re.IGNORECASE))
print(re.match(Substring, Stringb, re.IGNORECASE))

Output :

<re.Match object; span=(0, 6), match='string'>

Python Regular Expression Modifiers

Regular expression literals can have an optional modifier to control various aspects of matching. The modifiers can be put in use as an optional flag. You can provide multiple modifiers using exclusive OR (|). Below is a table to understand the concept better:

re.IIt performs case-sensitive matching
re.LIt interprets words according to the current scenario
re.MIt matches $ at the end of the line and matches ^ at the start of any line
re.SIt makes a period match any line which can include the current line also
re.Umatch letters according to the Unicode character set.
re.XPermits “cuter” regular expression syntax. It ignores whitespace

The Match object has properties and methods used to retrieve information about the search, and the result:

.span() It will return a tuple containing the start-, and end positions of the match.
.string returns the string passed into the function
.group() returns the part of the string where there was a match