page 1  (7 pages)
2to next section

, , 1{29 ()
c Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Representation of Finite State Automata in

Recurrent Radial Basis Function Networks

PAOLO FRASCONI [email protected] MARCO GORI [email protected] MARCO MAGGINI [email protected] GIOVANNI SODA [email protected] Dipartimento di Sistemi e Informatica, Universit?a di Firenze
Via di Santa Marta 3 - 50139 Firenze - Italy

Received May 1, 1991

Editor: C. Lee Giles

Abstract. In this paper, we propose some techniques for injecting finite state automata into Recurrent Radial Basis Function networks (R2BF). When providing proper hints and constraining the weight space properly, we show that these networks behave as automata. A technique is suggested for forcing the learning process to develop automata representations that is based on adding a proper penalty function to the ordinary cost. Successful experimental results are shown for inductive inference of regular grammars.

Keywords: Automata, backpropagation through time, high-order neural networks, inductive inference, learning from hints, radial basis functions, recurrent radial basis functions, recurrent networks

1. Introduction

The ability of learning from examples is certainly the most appealing feature of neural networks. In the last few years, several researchers have used connectionist models for solving different kinds of problems ranging from robot control to pattern recognition. Coping with optimization of functions with several thousands of variables is quite common. Surprisingly, in many practical cases, global or nearglobal optimization is attained also with non sophisticated numerical methods. For example, successful applications of neural nets for recognition of handwritten characters (le Cun, 1989) and for phoneme discrimination (Waibel et al., 1989) have been proposed which do not report serious convergence problems.
Some attempts to understand the theoretical reasons for the successes and failures of supervised learning schemes have been carried out which explain when such schemes are likely to succeed in discovering optimal solutions (Bianchini et al., 1994; Gori & Tesi, 1992; Yu, 1992), and to generalize to new examples (Baum & Haussler, 1989). These results give some theoretical foundations to learning from tabula rasa configurations, but unfortunately, the conditions they provide for optimal convergence and for generalization are quite limited in practice. Although