You are familiar with devices used to measure physical characteristics: the bathroom scale to measure your body weight, the ruler to measure length or distance. But what about psychological characteristics such as introversion/extroversion, aptitude, or intelligence? For those characteristics, too, there are special devices used to measure them, called psychological tests. In fact, these are real measuring instruments, and are sometimes actually referred to as instruments.
To be of any value, psychological tests must have certain properties. In this paper I describe those characteristics and how we go about assessing them. After that, I review a selected sample of psychological tests.
Properties of a Good Psychological Test
Three important properties of any good psychological test are validity, reliability, and (where appropriate) standardization. Below I define each of these properties and describe ways in which those properties are established.
A psychological test is said to be valid if it measures what it is intended to measure. An intelligence test, for example, is valid to the extent that it does measure intelligence and not simply some other variable, such as knowledge. A number of ways to assess the validity of a test have been developed; here I will describe a few of them.
- Concurrent Validity -- results of the test agree with those of another test of accepted validity as a measure of that characteristic. A newly developed test of intelligence would be considered to have concurrent validity if it gave the same I.Q. values (within measurement error) as an established intelligence test.
- Predictive Validity -- predictions based on the results agree with what one would expect if the test is a valid measure of the characteristic. A newly developed test of intelligence would be considered to have predictive validity if those who score high on the test tend to do very well in academic settings or other areas thought to require high intelligence, while those who score low on the test do poorly in those areas.
- Face Validity -- examination of the test reveals that the test appears to measure what it is intended to measure. For example, a test of mathematical aptitude contains mathematical and logical problems to solve. Face validity is a relatively poor index of the validity of the test as gaged by other methods -- a test may have low face validity and yet prove to have good predictive validity, for example.
A psychological test is reliable to the extent that it produces similar results when the individual is repeatedly tested under the same conditions. There are two main methods used to assess reliability, described below.
- Test-Retest Reliability -- the same individuals ar given the test twice, separated by some interval of time. The Pearson r correlation is then computed on the pairs of scores across individuals. A test is said to have high test-retest reliability if the correlation is 0.95 or better (where 1.0 equals perfect reliability).
Test-retest reliability is useful for tests of characteristics that change only slowly over time, such as intelligence. If the characteristic changes between administrations of the test, then the test reliability will appear to be low, when it fact the test may be reliably tracking real changes in the characteristic.
Another potential problem with this method is that individuals may remember their answers on the first administration of the test and simply repeat those answers on the second. If they do, then the test will appear to be more reliable than it really is. To avoid this problem, testmakers sometimes produce an alternate form of the test, which is supposed to be equivalent to the original but with somewhat different items. However, this introduces another problem, that of assuring that the two versions are indeed equivalent.
- Split-Half Reliability -- individuals take the test and then the items are divided into two equivalent halves, which are then separately scored. The pairs of scores for each test are then correlated as in the test-retest method.
The split-half method has the advantage that no time elapses between "administrations," so the characteristic being measured cannot change. However, the method has the same disadvantage as the use of alternate forms with the split-half method: the split halves may not be exactly equivalent and, if not, then the true reliability of the test will be underestimated.
In tests of physical characteristics such as weight, it is possible to establish the accuracy of the measurement by comparing measurements against a set of known standards. For example, a scale could be checked against standard weights of 50 grams, 100 grams, 500 grams, and so on. If inaccuracies were found, the scale would be calibrated to remove them. Standard samples for many variables are available from the National Bureau of Standards.
For psychological characteristics, there are no standard samples that one can purchase and use to evaluate the accuracy of the test. (For example, you cannot rent a person known to have an I.Q. of exactly 100.) Thus, to standardize psychological tests, a different method is needed. What is actually done is to administer the test to a large sample of individuals from the population for which the test is intended, and then compute certain group statistics, usually the mean and standard deviation. These provide the average value across individuals and the amount of variability, and are used to determine a formula for converting raw scores to standard scores. For example, different I.Q. tests are standardized so that the average I.Q. on the test is 100.
Some Examples of Psychological Tests
Psychological tests abound; here I provide only a few major categories and examples.
- Intelligence Tests -- these measure aspects of intelligence that contribute to good academic performance. I'll provide more information on these later.
- Personality Tests -- these measure personality characteristics. Different tests measure different characteristics, according to the theory of personality on which they were based. Examples include:
- Minnesota Multiphasic Personality Inventory -- measures personality traits on several scales based on true-false answers to 500 statements. Scale values are plotted on a set of parallel scales and the dots are connected by lines to form a "profile" used in diagnosis and assessment. Objectively scored.
- Thematic Apperception Test (TAT) -- individual is shown a series of 8" X 10" cards, each depicting a scene of some sort, and is asked to tell a story based on that scene. Designed to allow the person to "project" something about himself or herself into the answers (a type of projective test). Not objectively scored.
- Rorshack Inkblot Test -- individual is shown a series of left-right symmetrical inkblots and is asked to describe what he or she sees there. Another projective test.
- Aptitude Tests -- designed to indicate an individual's aptitude or talent in some area. It works by assessing the degree to which the individual already has the requisite knowledge and skills required. The SAT that high-school students take for admission to college assesses your aptitude for college-level work. In fact, it was once called the "Scholastic Aptitude Test," but has been renamed for political reasons.
- Achievement Tests -- these measure what an individual knows or can do. A familiar example to Indiana students is the ISTEP test, designed to assess what Indiana primary and secondary school students have learned.
- Interest Inventory -- I like to mention this one because I took it myself as an undergraduate and found it helpful when I was trying to decide on a career. The tests asks you to indicate, for each of a large number of activities, what you are interested or not interested in doing. Your results are compared to the pattern marked by successful individuals in each of a variety of occupational fields. If your responses match up well with those of, say, a successful architect, then you would probably enjoy the sort of work an architect does. This does not tell you, however, whether you have any aptitude for the work! (For that you need to take other tests.)