[ITOC-ch2] CONTEXT-FREE LANGUAGES

[ITOC-ch2] CONTEXT-FREE LANGUAGES

In this chapter we present context-free grammars to describe certain features that have a recursive structure.

help us organize and understand relationships of terms.

Application: compilers and interpreters

  • Most compilers and interpreters contain a parser that extracts the meaning of a program before generating the compiled code or performing the interpreted execution.

2.1 CONTEXT-FREE GRAMMAR

Context Free Grammar

  • consists of a collection of substitution rules

Variable

  • symbols on the left hand side

Terminal

  • other symbols
  • can’t appear in the left hand side in CFG ( Ex: aB --> XX )


Derivation

  • The sequence of substitutions to obtain a string
  • u derives v, written $ u ⇒^{ * } v $, if u = v or if a sequence $u 1 , u 2 , . . . , u k $ exists for k ≥ 0 and

$ u ⇒ u 1 ⇒ u 2 ⇒ . . . ⇒ u k ⇒ v. $


Context Free Language

  • strings generated by CFG
  • $ \{ w ∈ Σ^{ ∗ } | S ⇒^{ * } w \}. $

?Parsing

  • the compiler extracts the ?meaning of the code

Parser Tree

  • Representation of the meaning of the code


leftmost derivation

  • Derivation that at every step the leftmost remaining variable is the one replaced

AMBIGUITY

DEF

  • A string w is derived ambiguously in context-free grammar G if it has two or more different leftmost derivations.
  • Grammar G is ambiguous if it generates some string ambiguously.

CHOMSKY NORMAL FORM

Simplified form for CFG, can generate all CFL.


2.2 PUSHDOWN AUTOMATA

nondeterministic PDA (with extra stack )

  • DEF
  • (transition functions return sets)


Computation under PDA

  • A pushdown automaton $ M = (Q, Σ, Γ, δ, q_0 , F ) $ while input $ \in $ $ Σ_ε $, states $ \in Q $, stack strings $ \in Γ^{ ∗ } $


Example


EQUIVALENCE WITH CONTEXT-FREE GRAMMARS

  • 2.21 If a language is context free, then some pushdown automaton recognizes it.
  • 2.27 If a pushdown automaton recognizes some language, then it is context free.

2.21

  • Construct PDA P to accept all strings generated by CFG G

Description of P

  1. stack push: marker $ than Start variable
  2. while ( stack.top != ‘$’ )
    • (stack.top == Variable A) nondeterministically select one of the rules A --> w and substitute A by w
    • (stack.top == Terminal a) read next symbol and compare it to a
  3. pop ‘$’, Accept


2.27

  • construct CFG G that generates all the strings PDA P accepts (go from its Start state to an Accept state)

Prepare: simplify P to have following features

  1. has single accept state $ q_{accept} $
  2. empties its stack before accepting
  3. each transition either push or pop a symbol, but never do both

Description of G

  1. Variables of G are $ \{A_{pq} \ | \ p, q ∈ Q \ \} $. The start variable is $ A_{q0} ,q_{accept} $.

  2. Rules

    • For each $ p ∈ Q $, put the rule $ A_{pp} → ε $ in G.

    • For each $ p, q, r, s ∈ Q $, $ u ∈ Γ $ , and $ a, b ∈ Σ_ε $ , if $ δ(p, a, ε) $ contains $ (r, u) $ and $ δ(s, b, u) $ contains $ (q, ε)$, put the rule $ A_{pq} → aA_{rs} b $ in G.

    • For each $ p, q, r ∈ Q $, put the rule $ A_{pq} → A_{pr} A_{rq} $ in G.

Idea

  • variable $ A_{pq} $ generates all the strings that can take P from p with an empty stack to q with an empty stack.
  • 递归生成串?最后从最开始的变量 $ A_{q0} ,q_{accept} $ 出发生成的串就是PDA所有ac的串?

Proof

  • prove that this construction works by demonstrating that $ A_{pq} $ generates x if and only if (iff) x can bring P from p with empty stack to q with empty stack.
  • Two direction. Induction

2.3 NON-CONTEXT-FREE LANGUAGES

THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES

DEF

  • If A is a context-free language, then there is a number p (the pumping length) where, if s is any string in A of length at least p, then s may be divided into five pieces $ s = uvxyz $ satisfying the conditions


Proof

  • Let A be a CFL and let G be a CFG that generates it.

  • We assign the pumping length p to be $ b^{|V| + 1 } $

    • b is maximum number of symbols in the right-hand side of a rule ( maximum number of children per node )
    • |V| is the number of variables in G
    • $ b^{|V| + 1 } $ is the length of the string when its parse tree is|V| + 1 high
    • τ is a parse tree that has the smallest number of nodes.
  • We show that any strings in A of length at least p may be broken into the five pieces uvxyz, satisfying our three conditions.

    1. When τ is at least |V| + 1 high, due to pigeon hole principle, some variable R must appears more than once on that path. And we can divide s as follows

    1. Contradiction: if both y and v are ε than τ won’t be the smallest tree generate s
    2. The subtree where R generates vxy is at most |V| + 1 high, therefore has length at most $ b^{|V| + 1 } = p $

Example

$ C = \{a_i b_j c_k | 0 ≤ i ≤ j ≤ k \ \} $


2.4 DETERMINISTIC CONTEXT-FREE LANGUAGES