
Input/Output HMMs: A Recurrent
Bayesian Network View
Paolo Frasconi
Dipartimento di Sistemi e Informatica. Universit?a di Firenze
Via di Santa Marta 3 { 50139 Firenze (Italy)
http://wwwdsi.ing.unifi.it/~paolo
Abstract
This paper reviews Markovian models for sequence processing tasks, with particular emphasis on input/output hidden Markov models (IOHMMs) for supervised learning on temporal domains. HMMs and IOHMMs are viewed as special cases of belief networks that might be called recurrent Bayesian networks. This view opens the way to more general structures that could be devised for learning probabilistic relationships among sets of data streams (instead of just input and output data streams) or that might exploit multiple hidden state variables. Introducing the concept of belief network unfolding it is shown that recurrent Bayesian networks operating on discrete domains are equivalent to recurrent neural networks with higher order connections and linear units.
1 Introduction
In this paper I focus on problems involving learning temporal relationships. Examples of problems of this kind are numerous and include sequence classification (e.g., speech recognition, DNA sequences classification), sequence production (e.g., generation of actuator signals in robotics) and sequence prediction (e.g., financial time series forecasting). Learning tasks interfaced with a sequential data generation process require special architectures and algorithms. Architectures characterized by an algebraic input/output relationship (such as feedforward neural networks) are inadequate models of sequential data because the absence of adaptive memory prevents temporal context to be taken into account in a flexible way.
The neural network research community attempted quite early to overcome these limitation by allowing feedback in the connection graph, resulting in the so called recurrent neural networks (Rumelhart et al., 1986; Jordan, 1986). Thanks to cycles in their connections, recurrent networks exhibit a dynamic input/output behavior and can store past information for arbitrary durations. Importantly, the memory depth of recurrent networks is not fixed a priori by the architecture (as it would happen by simply inserting time delays in the connections (Waibel et al., 1989)). Instead, memory depth can be learned from data by adjusting the weights of feedback connections.