ADVERTISEMENT
  About the SA Blog Network













The Thoughtful Animal

The Thoughtful Animal


Exploring the evolution and architecture of the mind
The Thoughtful Animal Home

What Is Operant Conditioning? (and How Does It Explain Driving Dogs?)

The views expressed are those of the author and are not necessarily those of Scientific American.


Email   PrintPrint



While second nature to many of us, driving a car is actually a fairly complex process. At its most stripped down version, first you sit in the driver’s seat, then you start the engine, then you shift into gear, and then you must simultaneously steer while keeping your foot on the gas pedal. That doesn’t include things like adjusting your mirrors, verifying that you won’t drive into another person or car, and so on.

In one sense, it is incredibly impressive that three dogs in New Zealand have learned – in a fairly rudimentary way – to drive a car. They sit in the driver’s seat, shift into gear, operate the steering wheel, and step on the accelerator. Those deserving the true accolades however are not the dogs, but the human trainers for their impressive patience and determination.

The training that led man’s best friend to operate a car is no different from the kind of training behind the bird shows found at zoos all over the world, or the dolphin, killer whale, seal, or sea lion displays you might see at Sea World. It’s the same kind of training that scientists use to probe the emotional and cognitive lives of rats, mice, and the other critters that populate their laboratories. At the end of the day, it all comes down to a form of learning first described by Edward L. Thorndike at the beginning of the 1900s, which was later expanded and popularized by B.F. Skinner and taught to every student of Introductory Psychology: operant conditioning.

What is operant conditioning?
While classical conditioning is a form of learning that binds external stimuli to reflexive, involuntary responses, operant conditioning involves voluntary behaviors, and is maintained over time by the consequences that follow those behaviors. In one experiment, Skinner placed pigeons individually into experimental chambers (sometimes referred to as “Skinner boxes”) that were designed to deliver food rewards at systematic intervals. He found that by rewarding a bird after it displayed a desired behavior, he could motivate the bird to increase the frequency of that particular behavior.

The tools used in operant conditioning are known as positive and negative reinforcement and positive and negative punishment.

So, what’s the difference between positive reinforcement and negative punishment? Negative reinforcement? Positive punishment? HELP!
More than one introductory psychology student has been confused by the differences between positive and negative, between reinforcement and punishment. Here are the three (and a half) things you need to know:

1. Reinforcement is used to maintain or increase a desired behavior, while punishment is used to reduce or eliminate a behavior. (Skinner argued that reinforcement is more effective than punishment in modifying behavior.)

2. Positive involves introducing or adding a stimulus to the situation. Negative, then, means that a stimulus is withdrawn or removed.

3. If a stimulus is pleasing or rewarding, your psych textbooks might refer to them as “appetitive.” If the stimulus is unrewarding or unwanted, they might be referred to as “aversive.”

3a. Positive reinforcement and negative punishment involve appetitive stimuli. Positive punishment and negative reinforcement involved aversive stimuli.

Many students think of the stimuli themselves as positive or negative, and this is where things get muddled. Say it with me: positive and negative refer to the addition or removal of a stimulus, not to the stimulus itself.

Positive reinforcement might involve rewarding a child with candy in order to encourage his playing nicely with his brother. Candy is an appetitive stimulus that is used to increase or maintain the desired behavior.

If a child misbehaves, they might have their television privileges revoked. This is negative punishment, because you’ve removed an appetitive stimulus (TV) in order to eliminate an unwanted behavior.

If the child continues to misbehave, a parent might yell at him or her; this would constitute positive punishment. It involves the application of an aversive stimulus (yelling), in order to eliminate the unwanted behavior.

Finally, the frustrated parent might negotiate with their misbehaving child by offering to reduce the chores that he or she must complete that week in exchange for the desired behavior. This is a form of negative reinforcement, since an aversive stimulus (chores) is removed in the service of increasing good behavior.

But wait, there’s more
When it comes to training animals (or sometimes, humans), reinforcement is delivered according to a predefined schedule. If a stimulus is delivered after a set number of responses, it is considered a fixed ratio schedule. For example, a pigeon might be given a food reward after every tenth time that it pecks a button. The pigeon would learn that ten button presses are required in order to receive a reward.

If the number of responses required to receive a stimulus varies, then you are using a variable ratio schedule. The best example for this is a slot machine, which has a fixed probability of delivering a reward over time, but a variable number of pulls between rewards. It is no wonder that variable ratio reinforcement schedules are the most effective for quickly establishing and maintaining a desired behavior.

If a stimulus is given after a fixed amount of time, regardless of the number of responses, then you’ve got a fixed interval schedule. No matter how many times the pigeon pecks the button, it only receives one reward every ten minutes. This is the least effective reinforcement schedule.

Finally, if a stimulus is given after a variable amount of time, you’ve got a variable interval schedule. A stimulus might be applied every week on average, which means sometimes it occurs more often than once per week week, and sometimes less often. Pop quizzes are the best known example of variable interval reinforcement schedules, since the precise time at which they occur is unpredictable. The desired response in this case is studying.

In general, ratio schedules are more effective at modifying behavior than interval schedules, and variable schedules are more effective than fixed schedules.

Putting it all together
Skinner took the lessons he learned from his early pigeon experiments and went on to develop methods for eliciting more complex behaviors by dividing them into segments, each of which could then be individually conditioned. This is called chaining, and forms the basis for training dogs to drive cars. The behaviorists who worked with the driving dogs first trained them to operate a lever, then to use a steering wheel to adjust the direction of a moving cart, then to press or depress a pedal to speed up or slow down the cart. As each dog mastered each step, an additional segment was added until they learned the entire target behavior. Unlike pigeons, for whom food is the best reward, the domestication process has meant that dogs can be rewarded with verbal praise alone (though food definitely helps).

How are such unnatural behaviors elicited in the first place? By using a combination of reinforcement and punishment, a trainer can shape a desired behavior by rewarding successively closer approximations. Skinner referred to this process, appropriately, as shaping. In 1953, Skinner described it this way (emphasis added):

We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot. … The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. … The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay.

This is also the way that a dog can be taught to salsa dance:

Or the way that this mouse was trained to navigate an obstacle course:

Or this chicken:

Or this goat:

The clicker training featured in the chicken and goat videos, and used by many for training dogs, combines classical and operant conditioning. Classical conditioning is used to make the clicking sound into a conditional stimulus, which is then used for positive reinforcement in operant conditioning.

Operant conditioning in the wild
Several real-world examples of operant conditioning have already been mentioned: rewarding a child for good behavior or punishing a child for bad behavior, slot machines, and pop quizzes. In zoos and other animal facilities, keepers use operant conditioning in order to train animals to move between different parts of their enclosures, to present body parts for inspection, or to ensure that veterinary examinations are conducted safely.

Operant conditioning can also explain why some zoo animals display stereotypies or repetitive behaviors. To understand how this works, let’s return to Skinner’s pigeons. In one experiment, Skinner placed the birds into their boxes, and set the food reward to be delivered at a systematic interval regardless of the birds’ behaviors. The pigeons went on to develop what Skinner referred to as “superstitious behaviors,” as the result of accidental juxtapositions between their overt behaviors and the presentation of the food reward. One pigeon turned counter-clockwise in the cage just before a reward was presented, which led the pigeon to learn an association between the counter-clockwise turn and food. The pigeon spent its time turning ’round and ’round waiting for the reward. Another thrust its head into one corner of the cage to elicit the food. Two birds swayed their heads from left to right, and another bird had been conditioned to peck towards – almost but not quite touching – the floor.

Stereotypical behaviors in captive animals can result from a number of sources, but accidental operant conditioning might explain a large proportion of them. Indeed, the most common form of stereotypical behavior in zoo animals is pacing, if combined with stereotypic swimming patterns, followed by various forms of swaying or head bobbing. Luckily, principles of operant conditioning can also be used to remedy these sorts of problems.

Can you think of other real-world examples of operant conditioning? Leave them in the comments!

Skinner B.F. (1948). ‘Superstition’ in the pigeon., Journal of Experimental Psychology, 38 (2) 168-172. DOI:

Shyne A. (2006). Meta-analytic review of the effects of enrichment on stereotypic behavior in zoo mammals, Zoo Biology, 25 (4) 317-337. DOI:

Related:
What Is Classical Conditioning? (And Why Does It Matter?)

Jason G. Goldman About the Author: Dr. Jason G. Goldman received his Ph.D. in Developmental Psychology at the University of Southern California, where he studied the evolutionary and developmental origins of the mind in humans and non-human animals. Jason is also an editor at ScienceSeeker and Editor of Open Lab 2010. He lives in Los Angeles, CA. Follow on . Follow on Twitter @jgold85.

The views expressed are those of the author and are not necessarily those of Scientific American.





Rights & Permissions

Comments 5 Comments

Add Comment
  1. 1. RSW 11:39 pm 12/19/2012

    Why does your article restrict itself to discussing operant learning only in (other) animals? Humans are just as subject to it as dogs, mice, goats, and chickens.

    Link to this
  2. 2. Jason G. Goldman in reply to Jason G. Goldman 11:56 pm 12/19/2012

    @RSW: It doesn’t. You may have missed that the majority of the examples of operant conditioning, particularly in the sections regarding reinforcement schedules, where I refer to the ways in which we use operant conditioning in human culture: misbehaving children, slot machines, pop quizzes, and so on.

    Link to this
  3. 3. RSW 4:30 pm 12/20/2012

    Sorry. I did miss the examples in sections on reinforcement schedules. But still the examples are rather trivial and seem to imply that humans have other mechanisms for learning that are more significant. If so, what are they?

    Link to this
  4. 4. Jason G. Goldman in reply to Jason G. Goldman 4:36 pm 12/20/2012

    @RSW: ah, good question. humans are subject to basic forms of associative learning, but much of our cultural knowledge is constructed through pedagogy, or explicit teaching. For some background, see this piece.

    Link to this
  5. 5. RSW 12:17 am 12/23/2012

    Thanks for the link to your other article. It was interesting to read, but I disagree with the interpretation of the scholars you cite. What you call pedagogy or explicit learning seems to me to be just another form of operant learning, where a human member (the teacher) of a verbal community is contriving the contingencies of reinforcement, primarily through language, to get another human being (the learner) to either change his/her behavior or acquire a new set of behaviors. If you want to call this teaching, that is fine with me. But may I also suggest that meerkat adults are also arranging contingencies of reinforcement, so that young meerkats learn how to eat and eventually kill scorpions without getting struck by their lethal stingers. The difference is that meerkats, lacking language, but communicating nevertheless, rely on natural contingencies to teach their young. But to learn to do and talk about biochemistry or to play backgammon, for example, humans must rely on contrivances that are only possible for a “languaging” species.

    Link to this

Add a Comment
You must sign in or register as a ScientificAmerican.com member to submit a comment.

More from Scientific American

Scientific American Holiday Sale

Black Friday/Cyber Monday Blow-Out Sale

Enter code:
HOLIDAY 2014
at checkout

Get 20% off now! >

X

Email this Article

X