page 1  (14 pages)
2to next section

Unifying LL and LR parsing

Wim Pijls
Erasmus University Rotterdam, P.O.Box 1738,
3000 DR Rotterdam, The Netherlands.
[email protected]

Abstract

In parsing theory, LL parsing and LR parsing are regarded to be two distinct methods. In this paper the relation between these methods is clarified. As shown in literature on parsing theory, for every context-free grammar, a so-called non-deterministic LR(0) automaton can be constructed. Here, we show, that traversing this automaton in a special way is equivalent to LL(1) parsing. This automaton can be transformed into a deterministic LR-automaton. The description of a method to traverse this automaton results into a new formulation of the LR parsing algorithm. Having obtained in this way a relationship between LL and LR parsing, the LL(1) class is characterised, using several LR-classes.

1 Introduction

In the theory of parsing, the two main methods are LL and LR parsing respectively. The LL method is implemented mostly by the recursive descent technique. LR parsing is implemented mainly as a stack algorithm, governed by a so-called action/goto-matrix representing the LR automaton. In a lot of text books on parsing or formal language theory, both methods are explained extensively. For every context-free grammar, a deterministic LR-automaton can be built. LR parsing is a special way of traversing this automaton. See e.g. [Aho], [Grune] or [Sippu]. Besides, also a non-deterministic LR-automaton can be constructed, see e.g. [Grune] or [Sippu]. (Both kinds of automatons differ from those, which are used to recognize a regular language.) The main result of our paper is the observation that LL(1) parsing is equivalent to a way of traversing the non-deterministic automaton, whereas LR parsing is equivalent is equivalent to traversing the deterministic automaton. This relationship is exploited to derive correspondances between LL(1) and LR grammar classes.

Now, we discuss some preliminaries. We consider context-free grammars fVN ; VT ; P , Zg, where VN is the set of non-terminals, VT the set of terminals (or tokens), P the set of productions, and Z is the start symbol. V ? and V ?T denote the set of strings consisting of symbols in V and VT respectively. A greek symbol will denote an element of V ?. We demand that the start symbol Z occurs in only one