1.1. Proper linear models
К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94
A particularly successful kind of SPR is the proper linear model (Dawes
1982, 391). Proper linear models have the following form:
P ј w1c1 ю w2c2 ю w3c3 ю w4c4
where cn is the value for the nth cue, and wn is the weight assigned to the
nth cue. Our favorite proper linear model predicts the quality of the vintage
for a red Bordeaux wine. For example, c1 reflects the age of the vintage,
while c2 , c3 , and c4 reflect climatic features of the relevant Bordeaux
region. Given a reasonably large set of data showing how these cues correlate
with the target property (the market price of mature Bordeaux
wines), weights are then chosen so as to best fit the data. This is what
makes this SPR a proper linear model: The weights optimize the relationship
between P (the weighted sum of the cues) and the target property as
given in the data set. A wine predicting SPR was developed by Ashenfelter,
Ashmore, and Lalonde (1995). It has done a better job predicting the price
of mature Bordeaux red wines at auction (predicting 83% of the variance)
26 Epistemology and the Psychology of Human Judgment
than expert wine tasters. Reaction in the wine-tasting industry to such
SPRs has been ‘‘somewhere between violent and hysterical’’ (Passell 1990).
Whining wine tasters might derive a small bit of comfort from the fact
that they are not the only experts trounced by a mechanical formula. We
have already introduced The Golden Rule of Predictive Modeling: When
based on the same evidence, the predictions of SPRs are at least as reliable
as, and are typically more reliable than, the predictions of human
experts for problems of social prediction. The most definitive case for the
Golden Rule has been made by Grove and Meehl (1996). They report on
an exhaustive search for studies comparing human predictions to those of
SPRs in which (a) the humans and SPRs made predictions about the same
individual cases and (b) the SPRs never had more information than the humans
(although the humans often had more information than the SPRs).
They
found 136 studies which yielded 617 distinct comparisons between the two
methods of prediction. These studies concerned a wide range of predictive
criteria, including medical and mental heath diagnosis, prognosis, treatment
recommendations and treatment outcomes; personality description; success
in training or employment; adjustment to institutional life (e.g., military,
prison); socially relevant behaviors such as parole violation and violence;
socially relevant behaviors in the aggregate, such as bankruptcy of firms; and
many other predictive criteria. (1996, 297)
Of the 136 studies, 64 clearly favored the SPR, 64 showed approximately
equivalent accuracy, and 8 clearly favored the clinician. The 8 studies that
favored the clinician appeared to have no common characteristics; they
‘‘do not form a pocket of predictive excellence in which clinicians could
profitably specialize’’ (299). What’s more, Grove and Meehl argue plausibly
that these 8 outliers are likely the result of random sampling errors
(i.e., given 136 chances, the better reasoning strategy is bound to lose
sometimes) ‘‘and the clinicians’ informational advantage in being provided
with more data than the actuarial formula’’ (298).
There is an intuitively plausible explanation for the success of proper
linear models. Proper linear models are constructed so as to best fit a large
set of (presumably accurate) data. But the typical human predictor does
not have all the correlational data easily available; and even if he did,
he couldn’t perfectly calculate the complex correlations between the cues
and the target property. As a result, we should not find it surprising that
proper linear models are more accurate than (even expert) humans. While
The Amazing Success of Statistical Prediction Rules 27
this explanation is intuitively satisfying, it is mistaken. To see why, let’s
look at the surprising but robust success of some improper linear models.
A particularly successful kind of SPR is the proper linear model (Dawes
1982, 391). Proper linear models have the following form:
P ј w1c1 ю w2c2 ю w3c3 ю w4c4
where cn is the value for the nth cue, and wn is the weight assigned to the
nth cue. Our favorite proper linear model predicts the quality of the vintage
for a red Bordeaux wine. For example, c1 reflects the age of the vintage,
while c2 , c3 , and c4 reflect climatic features of the relevant Bordeaux
region. Given a reasonably large set of data showing how these cues correlate
with the target property (the market price of mature Bordeaux
wines), weights are then chosen so as to best fit the data. This is what
makes this SPR a proper linear model: The weights optimize the relationship
between P (the weighted sum of the cues) and the target property as
given in the data set. A wine predicting SPR was developed by Ashenfelter,
Ashmore, and Lalonde (1995). It has done a better job predicting the price
of mature Bordeaux red wines at auction (predicting 83% of the variance)
26 Epistemology and the Psychology of Human Judgment
than expert wine tasters. Reaction in the wine-tasting industry to such
SPRs has been ‘‘somewhere between violent and hysterical’’ (Passell 1990).
Whining wine tasters might derive a small bit of comfort from the fact
that they are not the only experts trounced by a mechanical formula. We
have already introduced The Golden Rule of Predictive Modeling: When
based on the same evidence, the predictions of SPRs are at least as reliable
as, and are typically more reliable than, the predictions of human
experts for problems of social prediction. The most definitive case for the
Golden Rule has been made by Grove and Meehl (1996). They report on
an exhaustive search for studies comparing human predictions to those of
SPRs in which (a) the humans and SPRs made predictions about the same
individual cases and (b) the SPRs never had more information than the humans
(although the humans often had more information than the SPRs).
They
found 136 studies which yielded 617 distinct comparisons between the two
methods of prediction. These studies concerned a wide range of predictive
criteria, including medical and mental heath diagnosis, prognosis, treatment
recommendations and treatment outcomes; personality description; success
in training or employment; adjustment to institutional life (e.g., military,
prison); socially relevant behaviors such as parole violation and violence;
socially relevant behaviors in the aggregate, such as bankruptcy of firms; and
many other predictive criteria. (1996, 297)
Of the 136 studies, 64 clearly favored the SPR, 64 showed approximately
equivalent accuracy, and 8 clearly favored the clinician. The 8 studies that
favored the clinician appeared to have no common characteristics; they
‘‘do not form a pocket of predictive excellence in which clinicians could
profitably specialize’’ (299). What’s more, Grove and Meehl argue plausibly
that these 8 outliers are likely the result of random sampling errors
(i.e., given 136 chances, the better reasoning strategy is bound to lose
sometimes) ‘‘and the clinicians’ informational advantage in being provided
with more data than the actuarial formula’’ (298).
There is an intuitively plausible explanation for the success of proper
linear models. Proper linear models are constructed so as to best fit a large
set of (presumably accurate) data. But the typical human predictor does
not have all the correlational data easily available; and even if he did,
he couldn’t perfectly calculate the complex correlations between the cues
and the target property. As a result, we should not find it surprising that
proper linear models are more accurate than (even expert) humans. While
The Amazing Success of Statistical Prediction Rules 27
this explanation is intuitively satisfying, it is mistaken. To see why, let’s
look at the surprising but robust success of some improper linear models.