page 1  (9 pages)
2to next section

An Input Output HMM Architecture

DRAFT { To appear in NIPS 7

Yoshua Bengio
Dept. Informatique et Recherche
Universit?e de Montr?eal, Qc H3C-3J7
[email protected]

Paolo Frasconi
Dipartimento di Sistemi e Informatica
Universit?a di Firenze (Italy)
[email protected]


We introduce a recurrent architecture having a modular structure
and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.


Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).

Previous work on alternatives training algorithms (Bengio et al., 1994) could suggest that the root of the problem lies in the essentially discrete nature of the process of storing information for an indefinite amount of time. Thus, a potential solution is to propagate, backward in time, targets in a discrete state space rather than differential error information. Extending previous work (Bengio & Frasconi, 1994a), in this paper we propose a statistical approach to target propagation, based on the EM algorithm. We consider a parametric dynamical system with discrete states and we introduce a modular architecture, with subnetworks associated to discrete states. The architecture can be interpreted as a statistical model and can be trained by the EM or generalized EM (GEM) algorithms (Dempster et al., 1977),