page 1  (7 pages)
2to next section

Grammar Modularization for Efficient Processing:
Language Engineering Devices and Their Instantiations

Axel Theofilidis, Paul Schmid and Thierry Declerck
faxel;[email protected] and [email protected]

Abstract: This paper describes how unification-based grammar development benefits from a modularization methodology resulting in significant efficiency gains without giving up recognized advantages of unification grammars. Gains in efficiency are of such a scale that an industrial deployment of unification grammars seems to become realistic. We introduce a number of devices provided by the ALEP system which support grammar modularization at an operational level, and we illustrate how these devices have been instantiated in the grammatical resources developed in the LS-GRAM project. We conclude with drawing a prospective scenario of modular grammars serving industrial applications.

1 Unification Grammars for Industrial Applications?

There is a well-known gap between the scientific concept of unification grammar and its deployment in industrial contexts. Only few and limited examples of unification grammars are in fact known to be used in industrial NLP applications.

Being typically grown in academic contexts, unification grammars often lack an orientation towards coverage requirements of real-life documents. They are rather oriented towards the world of phenomena dealt with in the linguist's textbooks, aiming at most general models of language competence while at the same time becoming entangled with what is currently fashionable in the (computational) linguistics scientific community (both in terms of scientific paradigm and in terms of language data).

Against the background of the computationally rather expensive concept of unification and in combination with a failure to anticipate efficiency, cost and maintenance requirements of industrial NLP applications, this attitude in developing unification grammars results in an exponential decrease of performance features and, thus, in disqualification of unification grammars from real industrial deployment. On the other hand, unification grammars are known to have a number of higher level properties, such as declarativity and, by this, maintainability and extensibility, which are desirable for large-scale commercial NLP systems in terms of quality standards and, in the end, commercial success.

The LS-GRAM project funded by the Commission of the EU under the LRE program1

aimed at narrowing the gap between the scientific concept of unification grammar on the one hand, and the requirements of industrial NLP applications on the other hand. To achieve this, the design of grammars was based on an investigation of coverage requirements for real-life documents of a specific domain (newspaper articles on economy) and on the exploitation of grammar engineering techniques supporting optimal performance (cf. [Schmidt et al.1996]).

2 Modular Grammar Design { Devices and Instantiations

The Advanced Language Engineering Platform (ALEP) that served as the implementation framework of the LS-GRAM project provides a range of devices supporting modularization of grammars, and thus efficiency-oriented grammar engineering in the paradigm of unification

1LRE 61029: Large-Scale Grammars for EU Languages