
Learning Regular Grammars From Noisy Examples
Using Recurrent Neural Networks
M. Gori, M. Maggini, and G. Soda
Dipartimento di Sistemi e Informatica, Universit?a di Firenze
Via di Santa Marta 3  50139 Firenze  Italy
Tel. +39 (55) 479.6265  Fax +39 (55) 479.6363
Email : fmarco,maggini,[email protected]
WWW: http:/wwwdsi.ing.unifi.it/neural
Abstract
Many successful results have recently been reported concerning the application of recurrent neural networks to the induction of simple finite state grammars. These results can be used to explore the computational capabilities of neural models applied to symbolic tasks. Many insights have been given on the links between the continuous dynamics of a recurrent neural network and the symbolic rules that we want to learn.
However, so far, the advantages of dynamical adaptive models and the related gradientdriven learning techniques with respect to classical symbolic inference algorithms have not been clearly shown. In this paper, we explore a class of inductive inference problems that seems to be very wellsuited for optimizationbased learning algorithms. Bearing in mind the idea of optimal rather than perfect solution, we explore how optimality criteria can help a successful development of the learning process when some of the examples are erroneously labeled. Some experimental results show that neural networkbased learning algorithms favor the development of the simplest" solutions, thus eliminating most of the exceptions that arise when dealing with erroneous examples.
1 Introduction
Many researchers have shown the capabilities of recurrent neural networks for inductive inference of regular grammars (Cleeremans et al., 1989; Frasconi et al., 1995; Giles et al., 1992; Giles & Omlin, 1993; Watrous & Kuhn, 1992). The research in this field mainly concerns the study of suitable architectures for this specific task, the analysis of the internal representations developed during learning, and the formulation of algorithms to extract symbolic rules from the continuous dynamics of the neural network.
The results in the literature show that recurrent neural networks can easily learn to classify symbolic strings following the rules of a finite state grammar. To the best of our knowledge, however, there is no satisfactory and balanced comparison between more classical symbolicbased approaches and neural networks. The induction of the rules describing a finite state grammar from a set of examples in the form of a minimal finite state machine is wellknown to be an NPhard problem. Symbolic techniques use to search the solution in the discrete space of finite state automata and, most of
the times, it turns out to be very difficult to conceive criteria acting by efficient heuristics. The approach to grammatical inference by dynamical models is based on the approximation of the behavior of a symbolic finite state machine. Therefore, when using these models we are moving in larger search spaces, but we can exploit useful hints on the search direction by defining an error function and using gradient descent techniques.
However, even if a solution consistent with the set of examples can be found at least for simple grammars, the generalization to new examples usually is not perfect due to instabilities in the recurrent dynamics. From an analysis of the trajectories in the state space of a recurrent network, it can be noticed that its interpretation as finite state machine is a rough approximation (Kolen, 1994). However, using clustering algorithms, it is always possible to extract a finite state automaton that has the same behavior of the network, at least on the learning set.
The application of recurrent neural networks to solve symbolic tasks may seem to be of theoretical interest only. The main criticism that can be addressed to the use of these techniques is that symbolic models