## [ITOC-ch2] CONTEXT-FREE LANGUAGES

In this chapter we present context-free grammars to describe certain features that have a **recursive** structure.

help us organize and understand relationships of terms.

Application: compilers and interpreters

- Most compilers and interpreters contain a
**parser**that extracts the meaning of a program before generating the compiled code or performing the interpreted execution.

### 2.1 CONTEXT-FREE GRAMMAR

Context Free Grammar

- consists of a collection of
**substitution rules**

Variable

- symbols on the
**left hand side**

Terminal

- other symbols
- can’t appear in the left hand side in CFG ( Ex: aB –> XX )

**Derivation**

- The sequence of substitutions to obtain a string
- u derives v, written $ u ⇒^{ * } v $, if u = v or if a sequence $u 1 , u 2 , . . . , u k $ exists for k ≥ 0 and

$ u ⇒ u 1 ⇒ u 2 ⇒ . . . ⇒ u k ⇒ v. $

Context Free Language

- strings generated by CFG
- $ \{ w ∈ Σ^{ ∗ } | S ⇒^{ * } w \}. $

?Parsing

- the compiler extracts the ?meaning of the code

**Parser Tree**

- Representation of the meaning of the code

leftmost derivation

- Derivation that at every step the leftmost remaining variable is the one replaced

**AMBIGUITY**

DEF

- A string w is derived ambiguously in context-free grammar G if it has two or more different leftmost derivations.
- Grammar G is ambiguous if it generates some string ambiguously.

**CHOMSKY NORMAL FORM**

Simplified form for CFG, can generate all CFL.

### 2.2 PUSHDOWN AUTOMATA

**nondeterministic** PDA (with extra **stack** )

- DEF
- (transition functions return sets)

**Computation** under PDA

- A pushdown automaton $ M = (Q, Σ, Γ, δ, q_0 , F ) $ while input $ \in $ $ Σ_ε $, states $ \in Q $, stack strings $ \in Γ^{ ∗ } $

Example

**EQUIVALENCE WITH CONTEXT-FREE GRAMMARS**

**2.21**If a language is context free, then some pushdown automaton recognizes it.**2.27**If a pushdown automaton recognizes some language, then it is context free.

**2.21**

- Construct PDA P to accept all strings generated by CFG G

Description of P

- stack push: marker $ than Start variable
- while ( stack.top != ‘\$’ )
- (stack.top == Variable A) nondeterministically select one of the rules A –> w and substitute A by w
- (stack.top == Terminal a) read next symbol and compare it to a

- pop ‘\$’, Accept

**2.27**

- construct CFG G that generates all the strings PDA P accepts
*(go from its Start state to an Accept state)*

Prepare: simplify P to have following features

- has single accept state $ q_{accept} $
- empties its stack before accepting
- each transition either push or pop a symbol, but never do both

Description of G

Variables of G are $ \{A_{pq} \ | \ p, q ∈ Q \ \} $. The start variable is $ A_{q0} ,q_{accept} $.

Rules

For each $ p ∈ Q $, put the rule $ A_{pp} → ε $ in G.

For each $ p, q, r, s ∈ Q $, $ u ∈ Γ $ , and $ a, b ∈ Σ

*ε $ , if $ δ(p, a, ε) $ contains $ (r, u) $ and $ δ(s, b, u) $ contains $ (q, ε)$, put the rule $ A*{pq} → aA_{rs} b $ in G.- For each $ p, q, r ∈ Q $, put the rule $ A_{pq} → A_{pr} A_{rq} $ in G.

Idea

- variable $ A_{pq} $ generates all the strings that can take P from p with an empty stack to q with an empty stack.
- 递归生成串？最后从最开始的变量 $ A_{q0} ,q_{accept} $ 出发生成的串就是PDA所有ac的串？

Proof

- prove that this construction works by demonstrating that $ A_{pq} $ generates x if and only if (iff) x can bring P from p with empty stack to q with empty stack.
- Two direction. Induction

### 2.3 NON-CONTEXT-FREE LANGUAGES

**THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES**

DEF

- If A is a context-free language, then
*there is*a number p (the pumping length) where, if s is any string in A of length at least p, then s may be divided into five pieces $ s = uvxyz $ satisfying the conditions

Proof

Let A be a CFL and let G be a CFG that generates it.

We assign the pumping length p to be $ b^{|V| + 1 } $

- b is maximum number of symbols in the right-hand side of a rule ( maximum number of children per node )
- |V| is the number of variables in G
- $ b^{|V| + 1 } $ is the length of the string when its parse tree is|V| + 1 high
- τ is a parse tree that has the smallest number of nodes.

We show that any strings in A of length at least p may be broken into the five pieces uvxyz, satisfying our three conditions.

- When τ is at least |V| + 1 high, due to pigeon hole principle, some variable R must appears more than once on that path. And we can divide s as follows

- Contradiction: if both y and v are ε than τ won’t be the smallest tree generate s
- The subtree where R generates vxy is at most |V| + 1 high, therefore has length at most $ b^{|V| + 1 } = p $

Example

$ C = \{a_i b_j c_k | 0 ≤ i ≤ j ≤ k \ \} $