Dynamic Expectancy: An Approach to Behaviour Shaping Using a New Method of Reinforcement Learning
6th International Symposium on Intelligent
Robotic Systems '98 (SIRS98),
Edinburgh, Scotland, UK, July 21-23, 1998
Mark Witkowski
Department of Computer Science,
Queen Mary Westfield College (University of London),
Mile End Road,
London E1 4NS
Abstract
This paper is concerned with issues relating to the source of reward and reinforcement
with potential application to various robot learning and behaviour shaping situations
(Dorigo and Colombetti, 1994; Lin, 1991, Maclin and Shavlik, 1996). The conventional
approach to behaviour shaping by reinforcement learning is to present "reward" to an
animal, animat or robot immediately following the performance by the animat of some
required or desirable activity. It is a commonplace observation in experimental
psychology that if this procedure is repeated a sufficient number of times by a
trainer the behaviour of an animal will come to favour those activities in the
circumstances under which they were reinforced.
This paper describes the Dynamic Expectancy Model, a new approach to issues in
reinforcement learning that emphasises the role of internally generated "reward"
signals, and in which overt behaviour is selected reactively from a policy map
created dynamically in response to motivating "goals". The results of two investigations
that illustrate these facets of learning and behaviour are presented. It is hoped
that this technique will find application in a variety of task areas where animat/robot
and man co-operate to address shared tasks.