Regular Expression in Theory of Computation

Introduction to Regular Expression (RE, regex, regexp)

An expression that expresses the set of strings according to certain syntax rules is considered a regular expression. These expressions have certain notations which we can describe in the regular language. This notation involves a combination of strings of symbols from some alphabet Σ, parentheses, and the operators +, ·, and ∗.

In terms of the theory of computation, the regular can be defined as a language or string accepted by a Finite Automata. We know that a finite automaton consists of 5-tuples {Q, Σ, δ, q0, F}. A regular expression is a string on Σ, i.e., it consists of only input alphabets.

Let Σ be an alphabet. The regular expression (r.e) over Σ and the sets they denote are:

    • Ф is an r.e and denotes an empty set
    • є is an r.e and denotes the set { є }
    • For each ‘a’ in Σ, a+ is an r.e and denotes the set {a}

 If ‘r’ and ‘s’ are r.e. denoting the languages R and S respectively the (r+s), (r.s) and (r*)     are r.e that denote the sets RUS, R.S, and R* respectively

Some Formal Recursive Definitions of RE

Any terminals, i.e., the symbols that belong to Σ are RE. The null string(Λ) and the null set(Ø) are also RE.

  1. If P and Q are two REs, then the union of the two REs denoted by P+Q is also an RE.
  2. If P and Q are two REs, then their concatenation denoted by PQ is also an RE.
  3. If P is an RE, then the iteration (repetition or closure) denoted by P* is also an RE.
  4. If P is an RE, then (P) is an RE.

Basic Operations on Regular Expression

Three basic operations are union, concatenation, and closure.

 The operation of the union on RE: 

If R1 and R2 are two REs over Σ, then L(R1 ∪ R2) is denoted by L(R1 + R2).

L(R1 ∪ R2) is a string from R1 or a string from R2.

L(R1 + R2) = L(R1) ∪ L(R2).

The operation of concatenation on RE: 

If R1 and R2 are two REs over Σ, then L(R1 ∩ R2) is denoted by L(R1R2).

L(R1 ∩ R2) is a string from R1 followed by a string from R2.

L(R1.R2) = L(R1).L(R2)

The operation of closure on RE: 

If R1 and R2 are two REs over Σ, then L(R*) is a string obtained by concatenating n elements for n ≥ 0.

Kleene’s Closure on regular expression:

It is defined as a set of all strings including null (Λ) obtained from the alphabets of the set Σ. It is denoted as Σ*.

If Σ = {0}, then Σ* = {Λ, 0, 00, 000…}

If Σ = {0, 1}, then Σ* = {Λ, 0, 1, 01, 00, 11, 010, 011, 100…}   

Example to understand Regular Language and its corresponding Regular Expression

Regular Language

{Λ}
{a, b}∗
{aab}∗{a, ab}
({aa, bb} ∪ {ab, ba}{aa, bb}∗{ab, ba})

Corresponding Regular Expression

Λ
(a
+ b)
(aab)(a + ab)
(aa
+ bb + (ab + ba)(aa + bb)(ab + ba))

Leave a Reply

Your email address will not be published. Required fields are marked *