Intelligence Tests

In this paper I describe some of the main events in the historical development of intelligence testing, and describe a few of the available tests.

Galton's Attempt

The English aristocrat Francis Galton made the first serious attempt to develop measures that would reflect a person's intelligence. Believing that intelligence was mainly (though not exclusively) a matter of having the right genes, Galton reasoned that superior intelligence would be a reflection of superior physical development of brain and body; if so, then simple physical measures might provide a reliable index of intellectual prowess. To investigate this possibility, he set about measuring a variety of physical variables, such as reaction time and grip strength, and looked for a correlation between these measures and measures of success in endeavors thought to reflect intellectual ability, such as one's class rank in school or one's occupational level. Unfortunately for Galton's hypothesis, no such relationship was evident, and Galton's attempt must be counted a failure.

Binet's Success

The first successful test of intelligence was developed by French psychologist Alfred Binet in response to a request by French public school officials for a test that could identify school children at risk of falling behind their peers in academic achievement. The result was the Binet-Simon intelligence test.

The Binet-Simon test consists of a variety of items intended to reflect knowledge and skills the average French school child of a given age would have. These items are graded in difficulty according to age, so that, for example, items the average twelve-year-old would be able to answer, a younger child would tend to miss. The test is administered individually, one-on-one, by a person trained to do so, and requires upwards of two hours to complete.

The scoring of the test produces a number called the child's mental age. The mental age reflects the level at which the child performed on the test -- if the child performed at the level of the average ten-year-old, for example, then the child would be assigned a mental age of ten, regardless of the child's chronological age (physical age). One compares the child's mental age to his or her chronological age. If the mental age is the same as the chronological age, then the child is average. If the mental age is higher than the chronological age, then the child is mentally "advanced" or gifted. If the mental age is lower than the chronological age, then the child is mentally "retarded," or behind his or her peers in intellectual development.

The Binet-Simon test and its successors measure intelligence by assessing intellectual skills and knowledge. They assume that the individual has had the opportunity to learn these skills and knowledge; if the person had the opportunity to learn them and did not, then this is assumed to reflect a defecit in intelligence. On the other hand, if the person has not had the exposure needed to learn these things, the failure to demonstrate knowledge of them says nothing about the person's intelligence. Ignoring this truth has led to some unwarranted conclusions being drawn based on test results.

The Army Alpha and Beta Tests

During World War I, the U. S. Army saw a need for a quick-to-administer intelligence test to be used when deciding what sort of advanced training a recruit would receive. Psychologists Lewis Termin, Robert Yerkes, and others collaborated to develop two versions of the test, known as the Army Alpha and Army Beta tests. The Alpha test emphasized verbal abilities and was given to everyone. The Beta test emphasized non-verbal abilities and was to be given to those who performed poorly on the Alpha test and were suspected of having language problems.

A large number of Army recruits took the Alpha version of the test and after the war, the data were analyzed, with a surprising result. It appeared that the average recruit had a mental age of around 13 -- a mild level of retardation. The reason for this had to do mainly with the level of education of the recruits rather than low native intelligence, but Yerkes and others concluded incorrectly that the intelligence deficit was real, sounding alarm bells about the "menace of the feeble-minded."

The Stanford-Binet

After World War II, Lewis Terman of Stanford University translated the Binet-Simon test into English, adapted it to the American culture and school curriculum, and called it the Stanford-Binet. This test is still in use today, although it has undergone periodic revision over the years, the last one a significant revision based on a new model of intelligence. Initially the scores were reported in terms of Mental Age, just as in the original. Later, mental age and chronological age were used to compute a new metric called the Intelligence Quotient, or I.Q. This was computed using what is now called the ratio method, which involves plugging the numbers into the following formula:
(Mental Age/Chronological Age) * 100 = I.Q.
No matter what the child's chronological age, if the mental age matches the chronological age, then the I.Q. will equal 100. An I.Q. of 100 thus indicates a child of average intellectual development. If the mental age is above the chronological age (a more gifted than average child), then the I.Q. is above 100; if the mental age is below the chronological age (a developmentally retarded child), then the I.Q. is below 100.

The Wechsler Tests

Psychologist David Wechsler was unhappy with available intelligence tests such as the Stanford-Binet, as he felt that they placed too much emphasis on verbal abilities. To correct this problem he devised his own versions, similar to the Stanford-Binet in some ways, but including a number of tasks, called performance tasks, that did not require much in the way of verbal ability (like the old Army Beta test). Scoring the test yields three separate I.Q.s: A verbal I.Q., which correlates well with the Stanford-Binet I.Q., a performance I.Q. (based on those non-verbal items), and an overall I.Q., which is the average of the other two.

Comparing the verbal and performance I.Q.s can reveal possible problems that would not show up when using a test that reports only a single I.Q. For example, if the performance I.Q. is quite a bit higher than the verbal I.Q., this could indicate the the person has some sort of specific language problem. Further, more specific tests would be indicated to identify the problem.

There are two Wechsler Intelligence tests as follows:

Standardizing Intelligence Test Results

As I mentioned above, the old "ratio method" of computing I.Q. is no longer used. The currently used method is called the deviation method, and is based on the fact that I.Q. scores tend to closely follow a mathematical distribution known as the normal distribution, otherwise known as the "bell curve." The normal distribution shows essentially the relative number of scores in the population that have each possible value of the variable being plotted (e.g., the number of scores having an I.Q. value of 90, 91, 92, etc.). The curve has the shape of a bell, with few scores appearing at extreme distances on either side of the center and a large bulge of scores at and around the center. Once we "map" the I.Q. scores onto the normal distribution, we can state what percentage of scores fall at or below any given I.Q. value -- the percentile rank of that score. For example, a person having an I.Q. of 115 on a Wechsler test would fall at the 84th percentile, having outscored 84% of the population on the I.Q. test.

To map the I.Q. scores onto the normal distribution, we give the test to a large standardization sample and compute the mean (average) and standard deviation (a measure of score variability) for the group. These statistics are then used in a conversion formula to convert the "raw" scores from the test into standard I.Q. scores having a predetermined mean and standard deviation. (For the Wechsler tests the mean will be set to 100 and the standard deviation to 15 I.Q. points.)

The main advantage of the deviation method is that, regardless of the age group, a given I.Q. value will place a person at the same percentile rank. This is not necessarily true when using the ratio method.