December 13, 2012

What Is Operant Conditioning? (and How Does It Explain Driving Dogs?)

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

While second nature to many of us, driving a car is actually a fairly complex process. At its most stripped down version, first you sit in the driver's seat, then you start the engine, then you shift into gear, and then you must simultaneously steer while keeping your foot on the gas pedal. That doesn't include things like adjusting your mirrors, verifying that you won't drive into another person or car, and so on.

In one sense, it is incredibly impressive that three dogs in New Zealand have learned - in a fairly rudimentary way - to drive a car. They sit in the driver's seat, shift into gear, operate the steering wheel, and step on the accelerator. Those deserving the true accolades however are not the dogs, but the human trainers for their impressive patience and determination.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The training that led man's best friend to operate a car is no different from the kind of training behind the bird shows found at zoos all over the world, or the dolphin, killer whale, seal, or sea lion displays you might see at Sea World. It's the same kind of training that scientists use to probe the emotional and cognitive lives of rats, mice, and the other critters that populate their laboratories. At the end of the day, it all comes down to a form of learning first described by Edward L. Thorndike at the beginning of the 1900s, which was later expanded and popularized by B.F. Skinner and taught to every student of Introductory Psychology: operant conditioning.

What is operant conditioning?

While classical conditioning is a form of learning that binds external stimuli to reflexive, involuntary responses, operant conditioning involves voluntary behaviors, and is maintained over time by the consequences that follow those behaviors. In one experiment, Skinner placed pigeons individually into experimental chambers (sometimes referred to as "Skinner boxes") that were designed to deliver food rewards at systematic intervals. He found that by rewarding a bird after it displayed a desired behavior, he could motivate the bird to increase the frequency of that particular behavior.

The tools used in operant conditioning are known as positive and negative reinforcement and positive and negative punishment.

So, what's the difference between positive reinforcement and negative punishment? Negative reinforcement? Positive punishment? HELP!

More than one introductory psychology student has been confused by the differences between positive and negative, between reinforcement and punishment. Here are the three (and a half) things you need to know:

1. Reinforcement is used to maintain or increase a desired behavior, while punishment is used to reduce or eliminate a behavior. (Skinner argued that reinforcement is more effective than punishment in modifying behavior.)

2. Positive involves introducing or adding a stimulus to the situation. Negative, then, means that a stimulus is withdrawn or removed.

3. If a stimulus is pleasing or rewarding, your psych textbooks might refer to them as "appetitive." If the stimulus is unrewarding or unwanted, they might be referred to as "aversive."

3a. Positive reinforcement and negative punishment involve appetitive stimuli. Positive punishment and negative reinforcement involved aversive stimuli.

Many students think of the stimuli themselves as positive or negative, and this is where things get muddled. Say it with me: positive and negative refer to the addition or removal of a stimulus, not to the stimulus itself.

Positive reinforcement might involve rewarding a child with candy in order to encourage his playing nicely with his brother. Candy is an appetitive stimulus that is used to increase or maintain the desired behavior.

If a child misbehaves, they might have their television privileges revoked. This is negative punishment, because you've removed an appetitive stimulus (TV) in order to eliminate an unwanted behavior.

If the child continues to misbehave, a parent might yell at him or her; this would constitute positive punishment. It involves the application of an aversive stimulus (yelling), in order to eliminate the unwanted behavior.

Finally, the frustrated parent might negotiate with their misbehaving child by offering to reduce the chores that he or she must complete that week in exchange for the desired behavior. This is a form of negative reinforcement, since an aversive stimulus (chores) is removed in the service of increasing good behavior.

But wait, there's more

When it comes to training animals (or sometimes, humans), reinforcement is delivered according to a predefined schedule. If a stimulus is delivered after a set number of responses, it is considered a fixed ratio schedule. For example, a pigeon might be given a food reward after every tenth time that it pecks a button. The pigeon would learn that ten button presses are required in order to receive a reward.

If the number of responses required to receive a stimulus varies, then you are using a variable ratio schedule. The best example for this is a slot machine, which has a fixed probability of delivering a reward over time, but a variable number of pulls between rewards. It is no wonder that variable ratio reinforcement schedules are the most effective for quickly establishing and maintaining a desired behavior.

If a stimulus is given after a fixed amount of time, regardless of the number of responses, then you've got a fixed interval schedule. No matter how many times the pigeon pecks the button, it only receives one reward every ten minutes. This is the least effective reinforcement schedule.

Finally, if a stimulus is given after a variable amount of time, you've got a variable interval schedule. A stimulus might be applied every week on average, which means sometimes it occurs more often than once per week week, and sometimes less often. Pop quizzes are the best known example of variable interval reinforcement schedules, since the precise time at which they occur is unpredictable. The desired response in this case is studying.

In general, ratio schedules are more effective at modifying behavior than interval schedules, and variable schedules are more effective than fixed schedules.

Putting it all together

Skinner took the lessons he learned from his early pigeon experiments and went on to develop methods for eliciting more complex behaviors by dividing them into segments, each of which could then be individually conditioned. This is called chaining, and forms the basis for training dogs to drive cars. The behaviorists who worked with the driving dogs first trained them to operate a lever, then to use a steering wheel to adjust the direction of a moving cart, then to press or depress a pedal to speed up or slow down the cart. As each dog mastered each step, an additional segment was added until they learned the entire target behavior. Unlike pigeons, for whom food is the best reward, the domestication process has meant that dogs can be rewarded with verbal praise alone (though food definitely helps).

How are such unnatural behaviors elicited in the first place? By using a combination of reinforcement and punishment, a trainer can shape a desired behavior by rewarding successively closer approximations. Skinner referred to this process, appropriately, as shaping. In 1953, Skinner described it this way (emphasis added):

We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot. ... The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. ... The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay.

This is also the way that a dog can be taught to salsa dance:

Or the way that this mouse was trained to navigate an obstacle course:

Or this chicken:

Or this goat:

The clicker training featured in the chicken and goat videos, and used by many for training dogs, combines classical and operant conditioning. Classical conditioning is used to make the clicking sound into a conditional stimulus, which is then used for positive reinforcement in operant conditioning.

Operant conditioning in the wild

Several real-world examples of operant conditioning have already been mentioned: rewarding a child for good behavior or punishing a child for bad behavior, slot machines, and pop quizzes. In zoos and other animal facilities, keepers use operant conditioning in order to train animals to move between different parts of their enclosures, to present body parts for inspection, or to ensure that veterinary examinations are conducted safely.

Operant conditioning can also explain why some zoo animals display stereotypies or repetitive behaviors. To understand how this works, let's return to Skinner's pigeons. In one experiment, Skinner placed the birds into their boxes, and set the food reward to be delivered at a systematic interval regardless of the birds' behaviors. The pigeons went on to develop what Skinner referred to as "superstitious behaviors," as the result of accidental juxtapositions between their overt behaviors and the presentation of the food reward. One pigeon turned counter-clockwise in the cage just before a reward was presented, which led the pigeon to learn an association between the counter-clockwise turn and food. The pigeon spent its time turning 'round and 'round waiting for the reward. Another thrust its head into one corner of the cage to elicit the food. Two birds swayed their heads from left to right, and another bird had been conditioned to peck towards - almost but not quite touching – the floor.

Stereotypical behaviors in captive animals can result from a number of sources, but accidental operant conditioning might explain a large proportion of them. Indeed, the most common form of stereotypical behavior in zoo animals is pacing, if combined with stereotypic swimming patterns, followed by various forms of swaying or head bobbing. Luckily, principles of operant conditioning can also be used to remedy these sorts of problems.

Can you think of other real-world examples of operant conditioning? Leave them in the comments!

Skinner B.F. (1948). 'Superstition' in the pigeon., Journal of Experimental Psychology, 38 (2) 168-172. DOI: 10.1037/h0055873

Shyne A. (2006). Meta-analytic review of the effects of enrichment on stereotypic behavior in zoo mammals, Zoo Biology, 25 (4) 317-337. DOI: 10.1002/zoo.20091