Instrumental Conditioning

Avoidance Conditioning

Avoidance occurs when an aversive event (typically a brief footshock in laboratory research with rats) that has been scheduled is cancelled as a result of the avoidance response. Avoidance behavior has puzzled theorists in the field of learning and behavior for some time -- since at least the 1940s when the first avoidance-conditioning studies were performed. In this paper I describe the development of method and theory in the study of avoidance. You will learn why the acquisition, maintainence, and extinction of avoidance behavior puzzled early theorists and how they attempted to solve those puzzles. As you will learn, these solutions often were closely tied to the invention of new avoidance procedures that called into question earlier explanations.

The Shuttlebox and the "Problem of Avoidance"

A modern rat shuttlebox The earliest studies of avoidance used an apparatus known as a shuttlebox. It consists of a box divided into two compartments by a hurdle, over which a subject can jump to shuttle from one compartment to the other. [Later versions of the apparatus, like the one pictured at right, replace the hurdle with a doorway between compartments.] Shocks are delivered through the metal bars that form the floor of the box. The box also is outfitted with a lamp or speaker which can be used to provide signals. Subjects in the early studies were either laboratory rats or dogs, with the apparatus sized appropriately for the animal.

Two-Way Shuttlebox Avoidance

In the two-way shuttlebox avoidance procedure, the subject is placed in one of the two compartments. After a bit of time has passed, a stimulus (e.g., a tone) occurs. After the tone has been sounding for, say, 20 seconds, the shock is turned on, and the subject begins to run in response to the shock. Eventually it shuttles into the other compartment, an act that has two immediate consequences: (1) the shock ends, and (2) the tone ends. More time passes and the tone again sounds, and 20 seconds later the shock again begins, producing more running and, eventually, a second shuttle, returning the subject to the first compartment.

Very quickly the subject learns to orient toward the other compartment during the tone and quickly shuttles into it as soon as the shock occurs, thus terminating both shock and tone. The subject is now escaping both shock and tone. After more experience, however, something new happens: after orienting toward the other compartment during the tone, the subject shuttles into it before the shock has begun. This act has two consequences: (1) it terminates the tone, and (2) it cancels the shock that was programmed to occur after 20 seconds of tone. The subject has escaped from the tone, but it has avoided the programmed shock.

Notice that the subject can avoid shock by shuttling either way -- from the left to the right compartment or vice versa. For this reason the procedure is called "two-way avoidance."

The Problem of Avoidance

In the 1940s when research on avoidance conditioning was first being done, most behaviorists subscribed to a "stimulus-response" or "S-R" theory of behavior. According to this theory, all responses (behaviors) are elicited, reflex-fashion, by prior stimuli. If a rat in a maze learned to turn left at a certain choice-point, for example, it was assumed that the stimuli unique to that choice-point had through associative conditioning acquired the ability to elicit a left-turn response. Such responses were reinforced by stimuli that immediately followed the left turn and which, being on the way to the goal-box and food, were associated with reinforcement -- they had become conditioned reinforcers. Left-turning occurred because, in past trials, left-turning had been immediately reinforced by the immediate stimulus change.

But now, here in the shuttlebox, subjects were learning to shuttle, apparently in the absence or any immediate stimulus change that could serve as a reinforcer. There was no shock present before the shuttle, and there continued to be no shock present afterward. What, then, could be serving as the reinforcer? This question came to be known as "the problem of avoidance."

Mowrer's Solution to the Problem of Avoidance

Soon after the problem of avoidance surfaced, Hobart Mowrer proposed a solution that identified an immediate reinforcer for the avoidance response. According to Mowrer, during the early conditioning trials in the shuttlebox, subjects received many pairings between the tone and shock. These pairings constitute a typical classical conditioning procedure, with the tone serving as the initially neutral stimulus and the shock as an unconditioned stimulus arousing the emotional response of fear. After a few pairings of tone and shock, the previously neutral stimulus became a conditioned stimulus (CS), capable of producing a strong conditioned fear response.

Now that the tone produced strong fear, any response that terminated this conditioned stimulus would be strengthened through the process of negative reinforcement. What response terminates the CS? Shuttling to the other compartment! Thus, according to Mowrer's account, the subjects shuttle during the tone and before the shock begins, not to avoid the shock, but rather to escape the fear-inducing tone CS.

Note that Mowrer's account appeals to two processes or "factors." First, there is classical conditioning of fear to the tone, transforming it into an elicitor of strong fear. Second, there is instrumental conditioning of shuttling through negative reinforcement, the termination of the aversive and fear-evoking CS. Because it appeals to both classical and instrumental conditioning, the account is today known as Mowrer's two-factor theory of avoidance.

Another Problem: Extinction of Avoidance Responding

Mowrer's two-factor theory provided a possible answer to the question of what reinforces avoidance responses, but it did not solve a second "problem of avoidance," which is the extremely slow extinction of avoidance behavior. After a subject has received extensive avoidance training in the shuttlebox and is now avoiding nearly every programmed shock, the shocker is turned off. Now, even if the subject fails to shuttle during the tone, a shock still will not occur. Even so, subjects will continue shuttling for hours at a time. You can disconnect the shocker, pack it up, and put it away in a closet and subjects will continue to make avoidance responses.

Two-factor theory not only failed to explain why extinction of avoidance follows such a slow course, it predicted a phenomenon during extinction that was not observed to happen. In two-factor theory, successful acquisition of avoidance should lead to the extinction of conditioned fear to the CS. That is because, when the subject successfully avoids the shocks, the CS is no longer being paired with them -- which constitutes extinction of the conditioned response to the CS. As fear to the CS extinguishes, escape from the CS by jumping over the hurdle will no longer be reinforcing, and shuttling should extinguish. But this will then lead to further pairings between CS and shock, reinstating fear of the CS and making escape from the CS an effective reinforcer once again. This cycle of acquisition, extinction, reacquisition, extinction, etc. is not observed in shuttlebox avoidance.

To explain the slow extinction of avoidance, some theorists went so far as to suggest that the process of evolution might have selected for organisms whose avoidance responses are extremely resistant to extinction. In this view, continuing to avoid when it is no longer necessary may be far less costly to the organism than failing to avoid when avoidance is necessary (to avoid being eaten, for example). Slowness of extinction was said to be a special property of avoidance responses.

Later it was realized that the slowness of extinction of avoidance could be accounted for without appeal to any special characteristics of avoidance. Subjects learn when placed in the shuttlebox that they receive a shock at the end of the tone if they do not shuttle, and that they do not receive a shock at the end of the tone if they do shuttle. After a bit of practice, they almost always jump shuttle before the tone has reached its end. Consequently, when the shocker is disconnected, they have little opportunity to learn that the previous contingency between shuttling and shock-omission has changed. Tone-in-the-absence-of-shuttling still signals shock and, just as before, after a shuttle the tone ends and no shock follows. From the subject's perspective, nothing has changed and so avoidance behavior continues. Only after the subject has experienced a few failures to shuttle that were not followed by shock can it learn that there is no need to shuttle.

Sidman's Avoidance Procedure

In the early 1950's Murray Sidman, one of B. F. Skinner's students, devised a novel procedure for studying avoidance in an operant chamber. The experimental subjects were lab rats and the operant under study was the usual lever-press. A subject was placed in the chamber and soon after began to receive a series of brief (0.5-s) footshocks spaced five seconds apart. As you can imagine the rat became very active during the shocks and consequently was fairly likely to bump against the lever. A lever-press immediately suspended the shock series and started a timer, which timed an interval of, say, 20 seconds, during which no further shocks were delivered. If the rat failed to press the lever again during the interval, a shock was delivered and the series of closely-spaced shocks began again. However, if the rat did press the lever during the interval, this reset the timer and shock-delivery was once again at least 20 seconds away. By repeatedly pressing the lever during the interval, the rat could keep postponing the next shock indefinitely.

The rats learned to avoid the shocks on this procedure, which came to be known as Sidman avoidance or less comonly, as free-operant avoidance. (I prefer to reserve the latter term as a generic category label for any free-operant, as opposed to discrete-trials, avoidance procedure.) In Sidman avoidance, the interval between response and shock is called the R-S or response-shock interval (typical value: 20 seconds). The interval between shocks in the absence of responding is called the S-S or shock-shock interval (typical value: 5 seconds). The short S-S interval is designed to break up freezing behavior, which tends to follow shocks, so that the baseline of lever-pressing remains above zero during initial training.

Explaining Sidman Avoidance

An important feature of the Sidman avoidance procedure is its lack of a signal warning of an impending shock, as found in two-way shuttlebox avoidance. On the surface, at least, this lack would seem to pose a problem for Mowrer's two-factor theory: If there is no warning signal, then how can there be a classically conditioned CS to arouse fear and motivate the subject to escape from it? However, Douglas Anger proposed a solution based on the fact that the R-S interval in Sidman avoidance is a fixed length (e.g., 20 seconds). In Anger's analysis, time-correlated stimuli within the subject that occur near the end of the interval become classically conditioned CSs owing to their contiguity to shock. Responses that reset the timer remove these fear-evoking stimuli and replace them with stimuli more distant from shock (near the start of the interval) that evoke much less fear. Once again, subjects are responding, not to avoid the shock, but to escape from a fear-provoking CS.

Anger referred to these time-correlated CSs, somewhat tongue-in-cheek, as conditioned aversive temporal stimuli," or "CATS." Anger's analysis thus came to be known as the "CATS theory of avoidance." The rats pressed the lever to escape from CATS . . .

The CATS theory ran into difficulty from data collected by Sidman in a modified version of his procedure. In this version, a warning signal came on five seconds before the end of the R-S interval and continued until shock occurred, unless a response occurred first and reset the R-S interval timer. This modification is called "signaled Sidman avoidance" to distinguish it from the standard variety in which there is no warning signal.

In unsignaled avoidance, subjects typically respond at a much higher rate than necessary, typically resetting the R-S timer every few seconds. However, when the warning signal was added, subjects quickly learned to wait until the signal came on before responding. Even then they tended to respond in no apparent haste, waiting until almost the last possible moment before pressing the lever and resetting the R-S timer.

This behavior is problematic for the CATS theory, because the explicit signal presented just before the end of the R-S interval should have become highly aversive, just as the internal, time-correlated stimuli are supposed to do, and subjects should have been highly motivated to escape quickly from it by pressing the lever. Instead, subjects seemed to be using the signal more as a discriminative stimulus for lever-pressing. Rather than acting out of fear, the subjects seemed to be calmly acting to prevent the shock that would otherwise follow.

Sidman himself proposed a different analysis that did not depend on conditioned fear. Instead, it was based on differential punishment of other behavior." The idea there is that, when the subject fails to press the lever, whatever other behaviors it engages in sooner or later will be paired with shock. This punishment of "other behaviors" suppresses them, leaving only lever-pressing unpunished and unsuppressed. In other words, the subjects in Sidman avoidance are lever-pressing because it's the only thing they can do that is not paired with shock.

Herrnstein and Hineline's Avoidance Procedure

Anger was able to apply Mowrer's two-factor theory to Sidman avoidance because the fixed R-S interval made it at least theoretically possible that subjects could become conditioned to time-related stimuli closely associated with shock. Herrnestein and Hineline (1966) devised their avoidance procedure to eliminate this possibility.

In the Herrnstein and Hineline avoidance procedure, shocks were programmed to occur on a variable time (VT) schedule. This is like a variable interval schedule except that there is no response requirement: a shock is delivered at the end of each programmed interval regardless of what the subject does. This schedule programmed shocks to occur at an average rate of one per two minutes. However, if the rat pressed a lever, this act would switch control of shock delivery to a second VT schedule, this one programming shocks at an average rate of one per four minutes, or at half the rate of the first schedule. When the interval currently being timed by the VT 4-minute schedule ended, the shock was delivered and control was switched automatically back to the VT 2-minute schedule. At this time, another response on the lever could again switch control over shock-delivery to the VT 4-minute schedule, and so on.

Note that in this procedure, there are no regular time intervals between response and the next shock. In fact, if the subject pressed the lever just as the current VT 4-minute interval was ending, a shock would follow the response almost immediately. However, on the average, the time to shock would be longer following a response than if the response had not been made.

The procedure as designed did not allow subjects to avoid all shocks, but by pressing the lever immediately after each shock, they could keep the VT 4-minute schedule in effect for most of the session and thereby receive about half as many shocks as they would if not responding. The rats required quite a bit of experience with this schedule before they "caught on," but most did eventually learn to press the lever and avoid unnecessary shocks.

In Herrnstein and Hineline's procedure, about all the subject can learn is the average rate at which shocks occur in the absence of responding, and the average rate at which they occur in the presence of responding. Herrnstein and Hineline proposed that subjects are in fact able to perceive these shock rates. If one makes the reasonable assumption that rats prefer a lower rate of shock-delivery over a higher one, they said, then the rats will learn a response that reduces the shock frequency. In other words, the reinforcer for avoidance responding is shock frequency reduction.

The idea that rats can sense the average rate of events over time, and that their choices are influenced by differences in such long-run outcomes, amounts to a molar account of avoidance behavior. Molar accounts assume that behavior can be sensitive to long-run outcomes. They require us to assume that there are mechanisms inside the organism that can do things like summate or average events over time -- rather sophisticated capabilities. In contrast, molecular accounts assume that behavior is sensitive only to immediate events, such as the following of response by shock. An example is Sidman's differential punishment analysis, where the immediate association of bits of behavior with shock suppresses those behaviors.