Automata Theory Tutorial on Chomsky Normal Form

a cfg is in chomsky normal form if the productions are in the following forms −

  • a → a
  • a → bc
  • s → ε

where a, b, and c are non-terminals and a is terminal.

algorithm to convert into chomsky normal form −

step 1 − if the start symbol s occurs on some right side, create a new start symbol s’ and a new production s’→ s.

step 2 − remove null productions. (using the null production removal algorithm discussed earlier)

step 3 − remove unit productions. (using the unit production removal algorithm discussed earlier)

step 4 − replace each production a → b1…bn where n > 2 with a → b1c where c → b2 …bn. repeat this step for all productions having two or more symbols in the right side.

step 5 − if the right side of any production is in the form a → ab where a is a terminal and a, b are non-terminal, then the production is replaced by a → xb and x → a. repeat this step for every production which is in the form a → ab.

problem

convert the following cfg into cnf

s → asa | ab, a → b | s, b → b | ε

solution

(1) since s appears in r.h.s, we add a new state s0 and s0→s is added to the production set and it becomes −

s0→s, s→ asa | ab, a → b | s, b → b | ∈

(2) now we will remove the null productions −

b → ∈ and a → ∈

after removing b → ε, the production set becomes −

s0→s, s→ asa | ab | a, a → b | s | ∈, b → b

after removing a → ∈, the production set becomes −

s0→s, s→ asa | ab | a | as | sa | s, a → b | s, b → b

(3) now we will remove the unit productions.

after removing s → s, the production set becomes −

s0→s, s→ asa | ab | a | as | sa, a → b | s, b → b

after removing s0→ s, the production set becomes −

s0→ asa | ab | a | as | sa, s→ asa | ab | a | as | sa

a → b | s, b → b

after removing a→ b, the production set becomes −

s0 → asa | ab | a | as | sa, s→ asa | ab | a | as | sa

a → s | b

b → b

after removing a→ s, the production set becomes −

s0 → asa | ab | a | as | sa, s→ asa | ab | a | as | sa

a → b |asa | ab | a | as | sa, b → b

(4) now we will find out more than two variables in the r.h.s

here, s0→ asa, s → asa, a→ asa violates two non-terminals in r.h.s.

hence we will apply step 4 and step 5 to get the following final production set which is in cnf −

s0→ ax | ab | a | as | sa

s→ ax | ab | a | as | sa

a → b |ax | ab | a | as | sa

b → b

x → sa

(5) we have to change the productions s0→ ab, s→ ab, a→ ab

and the final production set becomes −

s0→ ax | yb | a | as | sa

s→ ax | yb | a | as | sa

a → b a → b |ax | yb | a | as | sa

b → b

x → sa

y → a