page 1  (39 pages)
2to next section

Recurrent Neural Networks and Prior Knowledge for Sequence

Processing: A Constrained Nondeterministic Approach

Paolo Frasconi, Marco Gori, and Giovanni Soda
Dipartimento di Sistemi e Informatica
Via di Santa Marta 3 - 50139 Firenze (Italy)
e-mail: fpaolo,[email protected]

Knowledge-Based Systems, to appear.

January 20, 1995

In this paper we focus on methods for injecting prior knowledge into adaptive recurrent networks for sequence processing. In order to increase the flexibility needed for specifying partially known rules, we propose a nondeterministic approach for modeling domain knowledge. The algorithms presented in this paper allow to map time-warping nondeterministic automata into recurrent architectures with first-order connections. This kind of automata is suitable for modeling temporal scale distortions in data such as acoustic sequences occurring in problems of speech recognition. The algorithms output a recurrent architecture and a feasible region in the connection weight space. We demonstrate that, as long as the weights are constrained into the feasible region, the nondeterministic rules introduced using prior knowledge are not destroyed by learning. This paper focuses primarily on architectural issues, but the proposed method allows the connection weights to be subsequently tuned to adapt the behavior of the network to data.
Keywords: Recurrent neural networks, prior knowledge, rule insertion, nondeterministic finite automata.

INTRODUCTION

The integration of connectionist and symbolic processing approaches has recently received attention by many researchers [1, 2, 3, 4, 5], mainly because it allows to jointly exploit the bottom-up (learning from data) and top-down (deductive reasoning) kinds of inference. A relevant aspect to such integration is the development of techniques for embedding rulebased domain knowledge into connectionist models. In this paper we focus on processing sequential streams of data by recurrent neural networks.

There are two main benefits that can be gained by introducing prior knowledge into an adaptive network: improving generalization to new instances and simplifying learning.