I laugh when my sleep-deprived friends try to get their babies to eat something. Even though the kid may be hungry, he never seems to use the food for its intended purpose. Rather than satiate his hunger, he would rather squish the banana slice, crumble the Cheerio, and then toss it all on the floor (a behavior the dog learns instantly).
The little monster displays seemingly pointless behavior in other situations as well: crinkling wrapping paper on Christmas morning instead of playing with the doll, splashing water (hopefully it's water) up in Dad's face at bath time, tapping on the smartphone screen with disastrous results, and so on. Developmental Psychologists often refer to this type of behavior as "intrinsically motivated" because it appears to be executed for its own sake instead of as a way to achieve some separate (presumably rewarding) outcome like eating something sweet.
Rather than being "pointless," however, intrinsically motivated behaviors may help the child learn about his surrounding environment. When he plays with the banana slice or the cookie, he learns about their physical properties. He uses that knowledge later when he does want to eat the food (or toss it to the dog): with a firm grip and hard bite for the crunchy cookie, but perhaps a softer touch with the squishy banana slice. Insights gained from studying intrinsic motivation have important implications for understanding human development. Over the past couple decades, though, they have also contributed to development of another sort: building better artificial agents, including, someday, robots.
Scientists can tell artificial agents to consider something to be rewarding—like finding a specific object or navigating to a specific location—and program them to learn to achieve it. This is extrinsic motivation, in which acquiring something, like food or money, satisfies some known need. Intrinsic motivation, though, is a bit trickier, and a more precise description will help us understand how it can be useful.
One possibility is that an unpredicted, "surprising" sensory event makes us motivated to repeat whatever we did before the event. However, if that were the case, we'd be "stuck" repeating behavior that's sometimes followed by randomly occurring events but that doesn't actually do anything. Instead, a more useful formulation was described by the computer scientist Jürgen Schmidhuber in the early 1990's in one of the earliest computational accounts of intrinsic motivation in artificial systems: repeat behavior that's followed by an improvement in predicting subsequent events.
As a simple example, if you don't know what a button on the smartphone does, and pushing it turns on an LED light, then every time you push it, your ability to predict what happens next (that the LED light will turn on) improves. So, you're motivated to push to the button, at least for a while. At some point, though, your ability to predict the outcome of pushing the button cannot improve any more, so the motivation fades away.
In this formulation, we're motivated to behave to learn to predict events that were previously unpredicted, but not to waste time trying to predict events that already are well predicted or cannot be predicted. (Other formulations of intrinsic motivation in artificial systems, most of which also involve some aspect of prediction, are described in Oudeyer and Kaplan 2007 and Santucci et a.l 2013.)
How can this or other types of intrinsic motivation be used to build better robots and artificial agents in general? Much like a child (or any animal), a robot that hasn't been preprogrammed with absolute knowledge must be able to learn things, such as how one observation (like seeing it snow) predicts another observation (that the roads will be slippery). Detecting regularities in sequences of observations can train an internal prediction model—a bit of the robot's "brain" that's dedicated to predicting what will happen next—to help it better understand how the world around it works.
With good internal prediction models, it can make good decisions, like driving slowly (or not at all) when it snows out without having to actually slip and fall on the slippery road. In one experiment with simulated systems, Schmidhuber endowed an artificial learning agent with intrinsic motivation as described earlier: it was motivated to observe events that reliably predict other events, and move on to observe other events once it had learned the prediction well. It was able to learn how the simulated world works better and faster than artificial agents without intrinsic motivation, knowledge that is useful later for when it must decide what to do.
Another potential use of intrinsic motivation was described by the psychologist Robert W. White in the late 1950's: to gain competence by developing behaviors that affect the surrounding environment. For example, a toddler may happen to push a button on the car door while waving her toy around and, unexpectedly, hear a loud clank sound from the door locking or unlocking.
The unexpected clank motivates her to repeat her movements, she ends up pushing the button several times, and she eventually learns that the clank follows the button push, after which she moves on to other things. Also, by repeatedly pushing the button, she learns the behavioral skill of pushing the button proficiently (as opposed to always waving her toy around). She has added the skill of pushing the button in the car to make the clank sound, even if she doesn't immediately know how that could useful. In other words, she has increased her competence in interacting with her environment.
White's approach influenced work in the early 2000's by the computer scientist Andrew Barto and colleagues, who used it to study skill development in artificial agents acting in simulated environments in which certain sensory events were very salient (like the clank sound in the car). When the artificial learning agent with intrinsic motivation was behaving randomly and happened to achieve that salient sensory event by chance, it repeated and refined preceding behavior to reliably achieve that event proficiently and stored that behavior as a single skill to be recruited later.
As that event became predictable, the motivation to achieve it faded, the agent moved on to other situations, and the whole process repeated for other salient events. Barto and colleagues showed that a learning agent that first acquired an arsenal of skills through this intrinsically motivated process could learn to accomplish new tasks in that environment faster than an agent that never developed such skills. This process is similar to the child learning the skill of pushing the car door button that makes the clank sound, and that skill coming in handy later when Dad accidentally locks the kid, and the keys, in the car.
Intrinsic motivation has also been studied with real robots acting in the real world. Even robots who have preprogrammed behaviors must learn under which sensory conditions those behaviors might actually accomplish something, like trying to push a button only if it sees a button or closing its fingers to grasp something only if it feels something in its hand.
These sensory conditions are similar to "affordances" as described by the psychologist James Gibson in the late 1970's. About a decade ago, roboticists Stephen Hart and Roderic Grupen endowed a robot with pre-specified behaviors but only a rough estimate of the sensory conditions under which a particular behavior may accomplish something, mimicking the type of conditions that may be expected if a robot was placed in a completely new environment. Intrinsic motivation was implemented by delivering a signal when a behavior was executed successfully, and that signal was scaled by the difference between the estimate of the sensory conditions under which that behavior was expected to accomplish something—the estimated affordance—and the actual sensory conditions under which the behavior was successfully executed.
The scaled signal was the intrinsic motivation for the robot to repeat the behavior under similar sensory conditions. With repeated executions, the robot learned accurate affordances for each of its behaviors. With accurate affordances, the robot can interact efficiently with the environment: it won't attempt to close its fingers around something if it doesn't feel something in its hand, and it won't attempt to push a button if it doesn't see a button.
A different formulation of intrinsic motivation was used by roboticists Adrien Baranes and Pierre-Yves Oudeyer to control a multi-link robot arm working in two dimensions on a table top (like a 2D octopus tentacle). Similar to an infant flapping its arms about and rolling around, a naive robot must learn how to control its body by moving around. In their work, the robot arm generates a target location to which to reach, attempts to reach to it, and adjusts its control mechanisms to make subsequent reaches more accurate.
An intrinsic motivation signal was generated that is higher when the ability to reach a specific target had increased from the last time the robot tried to reach to that target. Thus, the robot was more likely to attempt to reach target locations for which it has increased in ability—thus improving its motor control—than target locations for which it had not increased in ability, either because that target location is too hard to reach at the moment or because the robot can already reach it without any problems. The robot thus efficiently learns to control its body by concentrating its behavior to learn progressively harder movements, without wasting time trying to learn movements that are too hard or too easy for it at any one time.
In all four examples, an intrinsic motivation signal temporarily causes the artificial agent to repeat behavior for reasons other than accomplishing a specific task. Instead, the behavior generated experiences that resulted in learning something useful: internal models that enable accurate predictions, skills that affect the environment, affordances which indicate when a behavior can be executed, or efficiently learning how to control its body.
As these have been learned to some acceptable degree, the intrinsic motivation signal fades and the learning agent moves on to other things, but they enable the agent to better interact with the environment and thus better accomplish tasks in the future. The boy playing with water in plastic cups in the bath develops knowledge and skills useful for when he eventually starts pouring milk on his cereal. The girl building block towers in the living room develops knowledge and skills useful for when she later stacks boxes to carry into the garage.
The research demonstrates how intrinsic motivation thought to underlie "pointless" play behavior can focus behavior to learn about the world and how to better interact with it. Importantly, the learning was self-directed and done with actual experience as opposed to through instruction by an outside teacher, which has limitations in what it can communicate (like how a lecture on how to throw a ball pales in comparison to actually trying to throw the ball).
This column focused on the relatively objective realms of predicting events and executing behaviors for artificial agents, but research in Developmental Psychology has shown the importance of similar intrinsic motivation processes, along with self-directed learning from actual experience, in cognitive and social development. By trying to describe and understand these processes precisely enough to have them control artificial agents, we gain insights that may better our understanding of human development on many levels as well.