RU beehive logo promo banner for Computing & Info Sciences
CS 420
2024fall
ibarland

Regular Expressions
defn; FSM equivalence

Definition: A regular expression (over alphabet Σ) is:

  1. φ — L(φ) = φ
  2. ε — L(ε) = {ε}
  3. c — L(c) = {c}, for any c ∈ Σ

  4. α* — L(α*) = L(α)*, where α is any regexp
  5. α+ — L(α+) = L(α)L(α)*, where α is any regexp
  6. αβ — L(αβ) = L(α)L(β), where α,Β are any regexp
  7. α|β — L(α|β) = L(α) ∪ L(β), where α,Β are any regexp
  8. (α) — L((α)) = L(α), where α is any regexp

Note how the first three cases are base-cases, and the last five are recursively defined. (Also: note that rules 5 and 2 are redundant: can be expressed as syntactic sugar.)

If you give me a regexp, I can create its parse tree:

ab*(aa|bc+)

   concat
  /     \
 a      concat
       /   \
      *    paren
      |    / | \
      b   (  ∪  )
            / \
       concat  concat
        /  \    /   \
        a   a  b     +
                     |
                     c
Note: | is low-precedence.
TODO in class: parse tree for (ab*c|d)+

Although we won't worry about a regexp's parse tree in this class, it will be the basis of any proofs-by-induction: we can induct on height-of-tree, or (more generally) "Structural Induction": E.g. show P(αβ) holds, given the inductive hypothesis P(α) and P(β) hold. (So a proof-by-structural induction for regexps will have 3 base cases, and 5 recursive cases.) The Principle of Structural Induction is equivalent to Mathematical Induction (see structural def'n of ℕ); it dispenses with the need to artificially shoe-horn tree processing into height-of-tree processing.

regexps are equivalent to NDFSMs

=> for a regexp, ∃ equivalent NDFSM

Claim: For any regexp γ, there is an equivalent NDFSM M such that L(γ) = L(M).

Proof by Structural induction on γ (8 cases; only some are shown):

  1. Suppose γ=c (where c ∈ Σ). (Show M; argue L(γ) ⊆ L(M) and L(M) ⊆ L(γ).)
  2. Suppose γ=αβ, where α,Β are regexps.

    By inductive-hypothesis, there are FSMs with Mα, Mβ where L(α)=L(Mα) and L(β)=L(Mβ).

    We construct a NDFSM Mγ, out of Mα and Mβ: [SEE BOARD]. Mγ = ⟨Kγ, Σγ, sγ, Δγ, Aγ

    Now we need to argue that our construction guarantees L(γ) = L(Mγ):

    1. Show L(γ) ⊆ L(Mγ):
      1. Take w ∈ L(γ) = L(αβ) = L(α)L(β) (by semantics of regexp L(αβ)).
      2. This means (by def'n of language-concat) that ∃u,v Σ* w=uv, u∈L(α), v∈L(β).
      3. s∈L(α) means that it is accepted by Mα: i.e. there is a computation ⟨sα,u⟩ ⊢* ⟨aα,ε⟩ for some aα∈Aα.
    2. Show L(Mγ) ⊆ L(γ).

      See board. Sketch: a computation from <s_gamma,w> in M_gamma must reach an end-state of a_a, then epsilon-transit so s_b, then reach a_b in A_b. break the computation down in to the sections u, epsilon, v. So we have w=uεv=uv where u,v in L(a),L(b) respl., w in L(a)L(b) by def'n of concat-langs.

<= for a NDFSM M, ∃ equivalent regexp

Claim: For any NDSFM M, there is an equivalent regexp γ such that L(γ) = L(M).
(See slides)

α* NDFSM construction

If α is regular, then α* is regular. Again, we can construct M' based on M. We can express this either in math, or in code:

Given a machine M1 = ⟨K1, Σ, s1, Δ1, A1, construct M0 from it:
[see slide 53]

def KleeneStarOf( M1 ):
    """Return a NDFSM M0 which accepts L1*,
       given a NDFSM M1 which accepts L1."""
    # first, extract & name the elements inside the tuple M1 (python's pattern-matching)
    (K1, Σ, Δ1, s1, A1) = M1

    s0 = "a new state, not in K1"      #(could also just concat all statenames of K1 and add then a letter)
    K0 = union(K1, set(s0))
    A0 = union(A1, set(s0))
    Δ0 = set(Δ1)     # a *non*frozen set, for the moment
    Δ0.add( (s0, ε, s1) )
    for s in A1:
        Δ0.add( (s, ε, s0) )

    M0 = (K0, Σ, frozenset(Δ0), s0, A0)
    return M0

# This function accepts and returns a FSM, where
# type “FSM” is:  tuple<set<state>, set<char>, set<tuple<state,char,state>>, state, set<state>>
#
#    where: type `state` is (say) a string,
#    and by "char" we mean a string-of-length-one *or* ε, where:
ε = “just a sentinel-value representing the empty transition”

logo for creative commons by-attribution license
This page licensed CC-BY 4.0 Ian Barland
Page last generated
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.