[ITOCch2] CONTEXTFREE LANGUAGES
In this chapter we present contextfree grammars to describe certain features that have a recursive structure.
help us organize and understand relationships of terms.
Application: compilers and interpreters
 Most compilers and interpreters contain a parser that extracts the meaning of a program before generating the compiled code or performing the interpreted execution.
2.1 CONTEXTFREE GRAMMAR
Context Free Grammar
 consists of a collection of substitution rules
Variable
 symbols on the left hand side
Terminal
 other symbols
 can’t appear in the left hand side in CFG ( Ex: aB > XX )
Derivation
 The sequence of substitutions to obtain a string
 u derives v, written $ u ⇒^{ * } v $, if u = v or if a sequence $u 1 , u 2 , . . . , u k $ exists for k ≥ 0 and
$ u ⇒ u 1 ⇒ u 2 ⇒ . . . ⇒ u k ⇒ v. $
Context Free Language
 strings generated by CFG
 $ \{ w ∈ Σ^{ ∗ }  S ⇒^{ * } w \}. $
?Parsing
 the compiler extracts the ?meaning of the code
Parser Tree
 Representation of the meaning of the code
leftmost derivation
 Derivation that at every step the leftmost remaining variable is the one replaced
AMBIGUITY
DEF
 A string w is derived ambiguously in contextfree grammar G if it has two or more different leftmost derivations.
 Grammar G is ambiguous if it generates some string ambiguously.
CHOMSKY NORMAL FORM
Simplified form for CFG, can generate all CFL.
2.2 PUSHDOWN AUTOMATA
nondeterministic PDA (with extra stack )
 DEF
 (transition functions return sets)
Computation under PDA
 A pushdown automaton $ M = (Q, Σ, Γ, δ, q_0 , F ) $ while input $ \in $ $ Σ_ε $, states $ \in Q $, stack strings $ \in Γ^{ ∗ } $
Example
EQUIVALENCE WITH CONTEXTFREE GRAMMARS
 2.21 If a language is context free, then some pushdown automaton recognizes it.
 2.27 If a pushdown automaton recognizes some language, then it is context free.
2.21
 Construct PDA P to accept all strings generated by CFG G
Description of P
 stack push: marker $ than Start variable
 while ( stack.top != ‘$’ )
 (stack.top == Variable A) nondeterministically select one of the rules A > w and substitute A by w
 (stack.top == Terminal a) read next symbol and compare it to a
 pop ‘$’, Accept
2.27
 construct CFG G that generates all the strings PDA P accepts (go from its Start state to an Accept state)
Prepare: simplify P to have following features
 has single accept state $ q_{accept} $
 empties its stack before accepting
 each transition either push or pop a symbol, but never do both
Description of G

Variables of G are $ \{A_{pq} \  \ p, q ∈ Q \ \} $. The start variable is $ A_{q0} ,q_{accept} $.

Rules

For each $ p ∈ Q $, put the rule $ A_{pp} → ε $ in G.

For each $ p, q, r, s ∈ Q $, $ u ∈ Γ $ , and $ a, b ∈ Σ_ε $ , if $ δ(p, a, ε) $ contains $ (r, u) $ and $ δ(s, b, u) $ contains $ (q, ε)$, put the rule $ A_{pq} → aA_{rs} b $ in G.

For each $ p, q, r ∈ Q $, put the rule $ A_{pq} → A_{pr} A_{rq} $ in G.

Idea
 variable $ A_{pq} $ generates all the strings that can take P from p with an empty stack to q with an empty stack.
 递归生成串？最后从最开始的变量 $ A_{q0} ,q_{accept} $ 出发生成的串就是PDA所有ac的串？
Proof
 prove that this construction works by demonstrating that $ A_{pq} $ generates x if and only if (iff) x can bring P from p with empty stack to q with empty stack.
 Two direction. Induction
2.3 NONCONTEXTFREE LANGUAGES
THE PUMPING LEMMA FOR CONTEXTFREE LANGUAGES
DEF
 If A is a contextfree language, then there is a number p (the pumping length) where, if s is any string in A of length at least p, then s may be divided into five pieces $ s = uvxyz $ satisfying the conditions
Proof

Let A be a CFL and let G be a CFG that generates it.

We assign the pumping length p to be $ b^{V + 1 } $
 b is maximum number of symbols in the righthand side of a rule ( maximum number of children per node )
 V is the number of variables in G
 $ b^{V + 1 } $ is the length of the string when its parse tree isV + 1 high
 τ is a parse tree that has the smallest number of nodes.

We show that any strings in A of length at least p may be broken into the five pieces uvxyz, satisfying our three conditions.
 When τ is at least V + 1 high, due to pigeon hole principle, some variable R must appears more than once on that path. And we can divide s as follows
 Contradiction: if both y and v are ε than τ won’t be the smallest tree generate s
 The subtree where R generates vxy is at most V + 1 high, therefore has length at most $ b^{V + 1 } = p $
Example
$ C = \{a_i b_j c_k  0 ≤ i ≤ j ≤ k \ \} $