
Unifying LL and LR parsing
Wim Pijls
Erasmus University Rotterdam, P.O.Box 1738,
3000 DR Rotterdam, The Netherlands.
[email protected]
Abstract
In parsing theory, LL parsing and LR parsing are regarded to be two distinct methods. In this paper the relation between these methods is clarified. As shown in literature on parsing theory, for every contextfree grammar, a socalled nondeterministic LR(0) automaton can be constructed. Here, we show, that traversing this automaton in a special way is equivalent to LL(1) parsing. This automaton can be transformed into a deterministic LRautomaton. The description of a method to traverse this automaton results into a new formulation of the LR parsing algorithm. Having obtained in this way a relationship between LL and LR parsing, the LL(1) class is characterised, using several LRclasses.
1 Introduction
In the theory of parsing, the two main methods are LL and LR parsing respectively. The LL method is implemented mostly by the recursive descent technique. LR parsing is implemented mainly as a stack algorithm, governed by a socalled action/gotomatrix representing the LR automaton. In a lot of text books on parsing or formal language theory, both methods are explained extensively. For every contextfree grammar, a deterministic LRautomaton can be built. LR parsing is a special way of traversing this automaton. See e.g. [Aho], [Grune] or [Sippu]. Besides, also a nondeterministic LRautomaton can be constructed, see e.g. [Grune] or [Sippu]. (Both kinds of automatons differ from those, which are used to recognize a regular language.) The main result of our paper is the observation that LL(1) parsing is equivalent to a way of traversing the nondeterministic automaton, whereas LR parsing is equivalent is equivalent to traversing the deterministic automaton. This relationship is exploited to derive correspondances between LL(1) and LR grammar classes.
Now, we discuss some preliminaries. We consider contextfree grammars fVN ; VT ; P , Zg, where VN is the set of nonterminals, VT the set of terminals (or tokens), P the set of productions, and Z is the start symbol. V ? and V ?T denote the set of strings consisting of symbols in V and VT respectively. A greek symbol will denote an element of V ?. We demand that the start symbol Z occurs in only one