nLab
context-free grammar

Contents

Idea

The notion of context-free grammar, now used in linguistics, computer science and mathematics, was introduced in the works of Noam Chomsky.

Traditional definition

We write X+YX + Y for the disjoint union of two sets and use vector notation and the Kleene star vX \vec{v} \in X^\star to denote sequences in the free monoid.

A context-free grammar is a tuple (Σ,X,R,s)(\Sigma, X, R, s) where Σ\Sigma and XX are finite sets called the vocabulary (also called the terminals) and the non-terminals respectively, RX×(X+Σ) R \subseteq X \times (X + \Sigma)^\star is a finite set of production rules and sXs \in X is called the start symbol.

The language of a context-free grammar is given by L(G)={uV |s Ru}L(G) = \{ \vec{u} \in V^\star \vert s \to_R \vec{u} \} where the rewriting relation ( R)(V+X) ×(V+X) (\to_R) \subseteq (V + X)^\star \times (V + X)^\star is traditionally defined as the transitive closure of the following directed graph:

{(uxw,uvw)|u,w(V+X) ,(x,v)R} \big\{ (\vec{u} x \vec{w}, \vec{u v w}) \quad \vert \quad \vec{u}, \vec{w} \in (V + X)^\star, (x, \vec{v}) \in R \big\}

With string diagrams

One may redefine L(G)={uΣ |C G(s,u)}L(G) = \big\{ \vec{u} \in \Sigma^\star \vert \C_G(s, \vec{u}) \neq \emptyset \big\} where C GC_G is the free monoidal category with:

That is, a string uΣ \vec{u} \in \Sigma^\star is grammatical whenever there exists an arrow from the start symbol ss to u\vec{u} in C GC_G. Arrows in C GC_G may be encoded as a syntax tree, seen as a special case of a string diagram, e.g.:

syntax tree for Colorless green ideas sleep furiously

Note that context-free grammar is weakly equivalent to Lambek's pregroup grammar i.e. they generate the same class of languages, see:

Early sources

Here is list of some of the original references in English and also of their Russian translations.

Modern literature

Some chapters in “Handbook of formal language theory” (3 vols.), G. Rozenberg, A. Salomaa (eds.), Springer 1997: