Introduction to Regular Expression (RE, regex, regexp)
An expression that expresses the set of strings according to certain syntax rules is considered a regular expression. These expressions have certain notations which we can describe in the regular language. This notation involves a combination of strings of symbols from some alphabet Σ, parentheses, and the operators +, ·, and ∗.
In terms of the theory of computation, the regular can be defined as a language or string accepted by a Finite Automata. We know that a finite automaton consists of 5-tuples {Q, Σ, δ, q0, F}. A regular expression is a string on Σ, i.e., it consists of only input alphabets.
Let Σ be an alphabet. The regular expression (r.e) over Σ and the sets they denote are:
- Ф is an r.e and denotes an empty set
- є is an r.e and denotes the set { є }
- For each ‘a’ in Σ, a+ is an r.e and denotes the set {a}
If ‘r’ and ‘s’ are r.e. denoting the languages R and S respectively the (r+s), (r.s) and (r*) are r.e that denote the sets RUS, R.S, and R* respectively
Some Formal Recursive Definitions of RE
Any terminals, i.e., the symbols that belong to Σ are RE. The null string(Λ) and the null set(Ø) are also RE.
- If P and Q are two REs, then the union of the two REs denoted by P+Q is also an RE.
- If P and Q are two REs, then their concatenation denoted by PQ is also an RE.
- If P is an RE, then the iteration (repetition or closure) denoted by P* is also an RE.
- If P is an RE, then (P) is an RE.
Basic Operations on Regular Expression
Three basic operations are union, concatenation, and closure.
The operation of the union on RE:
If R1 and R2 are two REs over Σ, then L(R1 ∪ R2) is denoted by L(R1 + R2).
L(R1 ∪ R2) is a string from R1 or a string from R2.
L(R1 + R2) = L(R1) ∪ L(R2).
The operation of concatenation on RE:
If R1 and R2 are two REs over Σ, then L(R1 ∩ R2) is denoted by L(R1R2).
L(R1 ∩ R2) is a string from R1 followed by a string from R2.
L(R1.R2) = L(R1).L(R2)
The operation of closure on RE:
If R1 and R2 are two REs over Σ, then L(R*) is a string obtained by concatenating n elements for n ≥ 0.
Kleene’s Closure on regular expression:
It is defined as a set of all strings including null (Λ) obtained from the alphabets of the set Σ. It is denoted as Σ*.
If Σ = {0}, then Σ* = {Λ, 0, 00, 000…}
If Σ = {0, 1}, then Σ* = {Λ, 0, 1, 01, 00, 11, 010, 011, 100…}
Example to understand Regular Language and its corresponding Regular Expression
Regular Language | Corresponding Regular Expression |