Bottom-up Skill Learning in Reactive Sequential Decision Tasks
Ron Sun and Todd Peterson and Edward Merrill
The University of Alabama
Tuscaloosa, AL 35487
This paper introduces a hybrid model that unifies connectionist, symbolic, and reinforcement learning into an integrated architecture for bottom-up skill learning in reactive sequential decision tasks. The model is designed for an agent to learn continuously from on-going experience in the world, without the use of preconceived concepts and knowledge. Both procedural skills and high-level knowledge are acquired through an agent's experience interacting with the world. Computational experiments with the model in two domains are reported.
Skill learning (or skill acquisition) is an important area of cognitive science, as skilled performance (and its acquisition) constitutes the majority of human activities. Such skills range from simple motor movements and routine coping in everyday activities all the way to complex intellectual skills such as writing or proving mathematical theorems. There is a hierarchy of skills of varying complexities and cognitive involvement. Most widely studied in cognitive science is cognitive skill acquisition (VanLehn 1995), that is, the abilities to solve problems in more or less intellectual tasks, such as (just to mention a few) arithmetic, elementary geometry, LISP programming, and simulated airtraffic control (e.g., Anderson 1982, 1993, VanLehn 1995, Ackerman 1988). Most of the work assumes a top-down approach; that is, they assume that subjects first acquire a great deal of knowledge in a domain and then practice changes this explicit knowledge into a more usable form, which leads to skilled performance. The explicit knowledge acquired before practice is declarative knowledge while the knowledge directly used in skilled performance is procedural knowledge. It is commonly believed that skills are the result of ?proceduralization" of declarative knowledge.
However, there is a substantial literature of work that demonstrates that the opposite may also be true: subjects can learn skilled performance without being provided explicit knowledge prior to practice, such as Berry and Broadbent (1984), Stanley et al (1989), Willingham et al (1992), and Reber (1989). Among them, Berry and Broadbent (1984) and Stanley et al (1989) expressly demonstrate the dissociation between prior knowledge and skilled performance, in a variety of tasks. Explicit knowledge is not equivalent to but can arise out of skills.
Reactive sequential decision tasks (Sun and Peterson 1995) is a suitable domain for studying such bottom-up skill learning. They generally involve selecting and performing a sequence of actions, in order to accomplish an objective, mostly on the
basis of moment-to-moment perceptual information. In such tasks, while skills emerge from repeated practice, declarative knowledge is also formed, on the basis of acquired skilled performance. So the process is the opposite of the commonly assumed top-down approach.
A general specification is as follows: there is an agent that can select, from a finite set of actions, a particular action to perform at each time step. The selection decision is (mainly) based on the current state of the world, presented to the agent through sensory input. The world changes either autonomously or as a result of some action by an agent. Thus, over time, the world is presented to an agent as a sequence of states. At certain points in a sequence, the agent may receive payoffs or reinforcements. Thus, the agent may need to perform temporal and structural credit assignment, to attribute the payoffs/reinforcements to various actions at various points in time (that is, the temporal credit assignment problem), in accordance to various aspects of a state (that is, the structural credit assignment problem).
While performing this kind of task, the agent is often under severe time pressure. Often a decision has to be made in a fraction of a second; therefore it cannot do much ?information processing", and falls outside of Allen Newell's ?rational band". The decision making and learning in the agent thus cannot be too time-consuming. As in humans, the agent may also be severely limited in other resources, such as memory so that memorizing all the previous episodes is considered impossible. The perceptual ability of an agent may also be extremely limited so that only very local information is available. Learning in such a domain is an experiential, trialand-error process; the agent develops knowledge tentatively on an on-going basis, since it cannot wait until the end of an episode. Learning is thus concurrent or on-line (Nosofsky et al 1994).
In such tasks, with bottom-up learning and without prior knowledge, how can an agent develop a set of coping skills that are highly specific (geared towards particular situations) and thus highly efficient but, at the same time, acquire sufficiently general knowledge that can be readily applied to a variety of different situations? In the current context, one way to learn is through trial-and-error: repeated practice gradually gives rise to a set of procedural skills that deal specifically with the practiced situations and their minor variations. However, such skills may not be transferable to truly novel situations, since they are so embedded in specific contexts and tangled