Instrumental Conditioning

Discrimination and the Discriminative Stimulus

In Thorndike's Law of Effect, responses that occur in a particular context or "situation" were said to become "connected" to the situation, "so that, when the situation recurs, the response is more likely to occur" (Thorndike, 1898, 1911). In today's vocabulary, the term "situation" has been replaced by the term "discriminative stimulus" (symbolized as SD -- pronounced "S dee"). This paper discusses the discriminative stimulus and the concept of stimulus control, and describes how to produce a generalization gradient in the context of instrumental conditioning.

The Discriminative Stimulus and Stimulus Control

The term "discriminative stimulus" has been defined in at least two ways. The first definition emphasizes procedure:

Discriminative stimulus or SD: A stimulus in the presence of which a response is reinforced.

When Thorndike's cats operated the latch-mechanism of a puzzlebox and escaped from the box, the situational stimuli associated with being in the box were present when the response occurred and resulted in the unlatching of the door. Thorndike's situational stimuli thus qualify as a (complex) discriminative stimulus according to this definition.

But there is something wrong with this definition -- it is not enough simply to reinforce a response in the presence of some stimulus. You must also see to it that the relationship between response and reinforcement changes in the absence of the stimulus. What is typically done is that responses are reinforced when the discriminative stimulus is present and not reinforced when the discriminative stimulus is absent. The occurrence of the SD must signal a change in the response-reinforcer contingency.

There is also another problem with the above definition. Although it defines an SD in terms of responses being reinforced in its presence, a perfectly good SD can be created by punishing a response in the presence of a stimulus. To be complete, then, the definition would have to change "reinforced" to "reinforced or punished."

The second definition of "discriminative stimulus" emphasizes process or result:

Discriminative stimulus or SD: A stimulus that controls the probability of a response.

Because responses have been reinforced in the presence of the discriminative stimulus (first definition), and not in its absence, they tend to occur more often when the SD is present than when it is absent.

As an example, consider a pigeon pecking away at a response key for grain reward. When the key is green (illuminated from behind by a green light), we occasionally reinforce key-pecks. After a few minutes, we turn off the green lignt and switch the schedule to extinction (key-pecks are never reinforced). From time to time we turn the green key-light on or off, continuing to reinforce key-pecks only in the presence of the green light.

What we will observe is that the pigeon at first pecks at the key whether the green light is on or not. As the pigeon gains more experience with the contingencies, however, pecking when the green light is off gradually diminishes until key-pecks occur only in the presence of the green key. The green illumination of the key is serving as a discriminative stimulus, and response probability (as reflected in the rate of responding) is now much higher in its presence and in its absence. We say that the green key-light exerts stimulus control over key-pecking:

Stimulus control: The ability of a discriminative stimulus to alter the probability of a response. This relationship between discriminative stimulus, response, and reinforcement has been described as the three-term contingency. Because responses always occur and are reinforced (or not) or punished (or not) in some stimulus context, one can expect response probabilities to vary sensitively as contexts change. This accounts for the fact that you are likely to behave very differently in one context (say, the classroom) than you do in another context (say, at a party): Different discriminative stimuli are present, indicating that different reinforcement contingencies are in effect -- what is likely to be rewarded, ignored, or punished changes.

Producing a Generalization Gradient

Imagine that we repeat the demonstration described above, but instead of using a green light as the SD, we project a vertical line onto the key while occasionally reinforcing key-pecks. Occasionally we replace the vertical line with a horizontal one, and impose extinction in its presence. (We will refer to the horizontal line as "S-delta" to distinguish it from the vertical line or SD.) After a time, the pigeon will be pecking the key at a goodly rate in the presence of SD and pecking little or not at all in the presence of S-delta. At this point we devise a test.

In the test, we present a series of lines on the key oriented at different angles from the vertical, all during extinction conditions, in a randomized order, and observe the rate of key-pecking in each case. We will find, not surprisingly, that the pigeon responds at its highest rate during SD and at its lowest rate during S-delta. But what about line orientations between the vertical and horizontal? For those likes, the pigeon will respond at average rates that vary smoothly from higher to lower as the line orientation changes from vertical to horizontal, whether the line is rotated clockwise or counterclockwise. The result will be a smooth curve with a peak at SD and declining in either direction of rotation toward S-delta -- a typical generalization gradient similar in appearance to those produced in generalization tests in classical conditioning (see figure to right showing some real data). What is changing in this case, however, is the rate at which the pigeon emits an operant -- usually thought of as a voluntary response.

If you observe carefully during the generalization tests, you will discover that what changes as the line rotates from vertical to horizontal is not the lowering of an otherwise steady rate of responding; rather, what changes is that the pigeon begins to alternate between periods of responding and not responding, or in other words, between behaviors appropriate to SD and those appropriate to S-delta. As the line orientation becomes less like SD and more like S-delta, the balance shifts more and more toward S-delta-appropriate behavior (i.e., doing something other than key-pecking). Consequently the average rate of keypecking in the presence of the line declines.