Schedules of Reinforcement

B. F. Skinner's initial research on operant conditioning with laboratory rats had them pressing a lever for small pellets of lab chow. Each lever-press produced a single pellet. Skinner had to make the pellets himself and a typical rat could earn and consume upwards of 300 pellets in a single experimental session. He was having trouble keeping up with the demand.

Then one day, Skinner returned to the lab after an experimental session had finished and discovered that the relay actuating the feeder had become unreliable: only occasionally would one of the rat's lever-presses succeed in producing a pellet -- most went unreinforced. One might have expected the rate of responding on the lever to fall, but instead, it actually went up.

One immediate implication of this result was that Skinner could begin to schedule reinforcers like this on purpose and get by on far fewer pellets per session. But far more important was Skinner's realization that behavior could not be completely predicted simply from the knowledge that certain responses were being reinforced. The rate of responding and its pattern of change over time depends not only on what response is reinforced, but on the way in which reinforcers are scheduled -- on the schedule of reinforcement. With his usual thoroughness, Skinner began the job of determining the pattern of responding characteristic of various reinforcement schedules.

The Cumulative Record

To examine how the rate of responding varies over time on different schedules, Skinner created a device called a cumulative recorder, which automatically produces a type of graph known as a cumulative record. In this graph, each response moves the pen upward one small notch, while the pen moves horizontally at a constant slow speed across the paper. At any given moment during the experimental session, the vertical position of the pen indicates the total number of responses produced since the session started, a number called the cumulative responses. The horizontal distance from the origin at the left edge of the paper indicates the number of minutes and seconds since the session began.

A steady rate of responding causes the pen to move upward at a steady rate while, at the same time, the paper is being moved under the pen at a constant rate. The combination of these two motions causes the pen to draw a line whose slope directly indicates the rate of responding -- steep slopes for high rates, shallow slopes for low rates, and a horizontal line if the response rate falls to zero. A laboratory rat that began the session responding at a high rate and then gradually slowed down would produce a line that began by rising steeply and then gradually curved to become more and more nearly horizontal.

Five Basic Schedules of Reinforcement

The number of possible schedules of reinforcement is infinite. However, only a relatively few of them have been studied scientifically. We will be learning about some of the more complex schedules a bit later in the course, but here I present only five of the simpler schedules that were initially investigated by Skinner.

Patterns of Responding on Ratio and Interval Schedules

Each of these schedules produces a characteristic pattern of responding, as revealed by the cumulative record. Here I describe each pattern and offer some theoretical explanations for those patterns.

Fixed Ratio

Responses occur at a high rate and usually without significant pauses while the subject completes the fixed ratio requirement (the ratio run. On reinforcer delivery, the subject does not immediately resume responding, producing the post-reinforcement pause. The cumulative record resembles a set of stair steps, with alternating runs (steeply sloped lines) and pauses (horizontal lines). The post-reinforcement pauses increase in length as the ratio size is made larger.

Current evidence suggests that the post-reinforcement pauses occur because of the delay between the resumption of responding and eventual delivery of the reinforcer. With high ratios this delay can be considerable. Delay of reinforcement is known to weaken the effect of reinforcer delivery on the probability of the response. Following reinforcer delivery, the subject has the option of initiating a new ratio run or engaging in some other activity, which may provide its own inherent, immediate reinforcement. Because of the delay between initiating a ratio run and receiving the reinforcer, this activity may not be as attractive as some others, so the subject does these others first, creating the post-reinforcement pause.

As for the ratio run, it is generally true of ratio schedules that higher response rates yield higer rates of reinforcement. This builds in a positive feedback loop that tends to drive response rates to the highest levels that the subject can comfortably sustain.

Variable Ratio

Variable ratio schedules tend to sustain high rates of responding, with little evidence of pausing, other than the time required to retrieve the reinforcer (if necessary). The cumulative record is a steeply angled line with occasional "blips" marking the moments when the reinforcer was delivered.

The absence of significant post-reinforcement pauses is a striking feature of variable ratio schedules, given their ubiquity on fixed ratio schedules. The lack of pauses appears to be due to the fact that occasionally even the very first response or two after reinforcement will yeild another reinforcer, due to the schedule's inclusion of a few very low ratios among the variable ratio sizes provided. Consequently, initiation of a new ratio run is sometimes strongly reinforced by nearly immediate reinforcer delivery, so that this behavior remains a relatively high-probability one even right after completion of the previous ratio. The steeply rising cumulative record is due both to the elimination of pauses and, as in fixed ratio schedules, to the built-in direct relationship between rate of response and rate of reinforcement.

Fixed Interval

The cumulative record for fixed-interval responding can closely resemble that of the fixed ratio. However, usually the transition from non-responding (following reinforcement) to high-rate responding (near the end of the fixed interval) has the form of a curve rather than a sudden jump from flat to steeply rising. The beginning of the rise usually occurs at a particular percentage of time into the interval, such as just past half-way into the interval, regardless of the actual size of the interval in seconds. The maximum rate of responding usually occurs just as the interval ends, guaranteeing that a response will occur almost immediately after the interval ends. (Remember, on fixed interval schedules, it is only the first response after the interval ends that produces the reinforcer!) This pattern of gradual acceleration of responding from beginning to end of each interval produces a fluted or "scalloped" pattern, which has been labeled the fixed interval scallop.

The explanation for the scalloped pattern of responding on FI schedules is still a matter of controversy. One explanation begins by noting that subjects are not perfect at timing intervals. If they were, they could simply wait until the interval ended and then respond once to receive the reinforcer. However, given some variability in the estimates, the subject using this strategy might wait too long to respond, thus reducing the overall rate of reinforcement. Responding too soon means wasted responses, but this may be of relatively little consequence to the subject compared to waiting too long. By beginning to respond early, subjects guarantee that they will be responding when the interval ends, and thus maximize the rate of reinforcement.

Another explanation refers to the effect of reinforcement delay on responding. In this view, responses occuring near the end of the interval, because they are soon followed by reinforcement, are strongly reinforced and can be expected to occur at a high rate. However, those occuring earlier in the interval will also be reinforced, although at a greater delay and therefore less effectively. Consequently the rates at which these earlier resonses occur will be lower. This analysis thus predicts a gradually rising rate of responding as the interval elapses. However, it cannot easily account for the fact that the point along the interval at which the rate begins to rise is a constant proportion of the interval length.

FI schedules are sometimes used in research on drug effects, where any disruption, caused by the drug, of the subject's ability to regulate its behavior according to time will produce a corresponding disruption of the fixed-interval scallop.

Variable Interval

Variable interval schedules tend to produce steady, relatively moderate response rates. This pattern is similar to that produced by variable ratio schedules, except that at the same overall rate of reinforcement, response rates are higher on variable ratio schedules than on variable interval schedules.

The reason for the lower response rates on variable interval schedules is that on these schedules, the rate of reinforcement is almost independent of the rate of responding, given at least a moderate response rate. No matter how rapidly the subject responds, a reinforcer is not going to be delivered until after the interval currently being timed is finished. So response rates will rise to the point that reinforcers are being collected almost as soon as they become available, and no higher. Because the size of the next interval cannot be predicted, the subject has no choice but to keep responding at that rate until reinforcer delivery. As in variable ratio schedules, the return to responding is occasionally reinforced almost immediately (when the interval being timed is very short) so subjects tend not to pause after reinforcement but begin responding again as soon as the reinforcer has been collected.

The response rate sustained on variable interval schedules is highest when the average interval length is short and declines with increasing average interval size. For example, subjects will respond faster on a VI 10-s schedule than on a VI 30-s schedule.

Because on VI schedules, reinforcement rate tends to be relatively independent of response rate, and because response rates tend to be relatively steady, VI schedules provide excellent baselines against which to evaluate the effects of drugs on behavior. For example, amphetamine produces an increase in response rates on VI schedules. Because the increased response rate produces little or no change in reinforcement rate, the response-rate effects of amphetamine are not confounded by any effect of response rate on rate of reinforcement.