4.2. Grounded and ungrounded SPRs
К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94
Let’s make a rough distinction between two classes of SPRs. Grounded
SPRs are SPRs for which we have a theoretical explanation for their success.
Ungrounded SPRs are SPRs for which we do not have a theoretical
explanation for their success. Basically, we understand why grounded SPRs
work, but we don’t understand why ungrounded SPRs work. There are
two points to note about this distinction. First, it is not hard-and-fast,
since we can have better and worse understanding of why an SPR works.
Second, for any ungrounded SPR, there may well be a neat causal explanation
for its success that we don’t yet know. So the distinction is not
meant to be a metaphysical one, but an epistemological one. It is a distinction
based on the quality of our understanding of SPRs and the subject
matters on which they are based.
Consider an ungrounded SPR—the F minus F Rule for predicting
marital happiness (discussed in section 1). Why is this rule reliable?
The Amazing Success of Statistical Prediction Rules 47
A reasonable assumption is that the correlation between the combined set
of predictor cues and the target property is sustained by an underlying,
stable network of causes. This is not to say that there is a science that
would treat such ensembles of cues as a natural kind; it is to say, however,
two things. First, their arrangement has a natural explanation. The explanation
may not be unified—indeed, it may be so tortured that it is little
more than a description of causal inventory—but it is an explanation in
terms of causes nonetheless. Second, these arrangements, in general, do
not spontaneously vanish.
Whatever specific facts explain the success of SPRs, they are not metaphysically
exotic. As predictive instruments, SPRs are not like the occasional
‘‘technical’’ stock market indicators offered by gurus who combine
a motley of moon phases, glottal stops, and transfer credits to predict
stock movements. The VRAG test for predicting violent recidivism is an
ungrounded SPR. In its present form, it consists of twelve predictor variables,
and each is scored on a weighting system of (ю) or (_). The weights
vary from a _5 to a ю12. The VRAG requires such information as the
person’s: Revised Psychopathy Checklist Score, Elementary School Maladjustment
Score, satisfaction of any DSM criteria for a personality disorder,
age at the time of the index offense, separation from either parent
(except by death) by the age of sixteen, failure on prior conditional release,
nonviolent offense history score (using the Cormier-Lang scale), unmarried
status (or equivalent), meeting DSM criteria for schizophrenia, most
serious victim injury (from the index offense), alcohol abuse score, and
any female victim in the index offense (Quinsey et al. 1998). Many of these
categories are independently known to interact richly with social behavior.
It is not as though the diagnostic problem of deciding whether this person
is likely to commit a similarly violent crime is being determined by facts
known to be ontologically unrelated to or isolated from social behavior,
such as the psychic’s interpretation of tarot cards.
Now let’s turn our attention to grounded SPRs. Many good examples
of grounded SPRs come from medicine. In the case of determining the extent
of prostate cancer, for example, there is a four-variable SPR that takes
into account patient age, PSA (prostate specific antigen) test value, the biopsy
Gleason score (arrived at from a pathologist’s assessment of tissue
samples), and the observable properties of the magnetic resonance image.
Each variable makes an incremental improvement in determining the
patient’s prognosis. But we understand very well why three of those variables
help to reliably predict the target property. We don’t understand
much about what mechanisms account for age being a good predictor. Recall that we said that there was an exception to the general failure of
strategies of selective defection. Grounded SPRs provide that exception.
Experts can sometimes improve on the reliability of SPRs by adopting a
strategy of selective defection (Swets, Dawes, and Monahan 2000). But
notice that the improved reliability comes about because the expert can
apply her well-supported theoretical knowledge to a problem. When
someone is in possession of a theory that has proven to be reliable and that
theory suggests defecting from an SPR (particularly when the expert’s
judgment relies on a cue not used by the SPR), then a strategy of selective
defection can be an excellent one.
Even when an expert is able to outperform an SPR because of her
superior theoretical knowledge, there are two notes of caution. First, there
is every reason to believe that a new SPR can be developed that takes the
expert’s knowledge into account and that the refined SPR will be more reliable
than the expert. One way to think about this is that when an expert
is able to defeat the best available SPR, this situation is typically temporary:
There is likely another SPR that can take into account the extra
theoretical knowledge being employed by the expert and that is at least as
reliable as the expert. The second note of caution is that even in domains
with grounded SPRs, selective defection is not always a good strategy. The
reasoner who has adopted the selective defection strategy needs to be able
to apply the relevant theoretical understanding well enough to reliably
defect from the SPR. And this will not always be easy to do. Even when the
reasoner knows what variables to look at, he might still have a hard time
weighing and integrating different lines of information (see section 3,
above).
What about the (unfortunately) more common ungrounded SPRs,
such as the Goldberg Rule, the VRAG, and the F minus F Rule? For most
of the variables that make up these rules, there is no well-confirmed theory
that explains their incremental validity, even if we feel we can tell a good
story about why each variable contributes to the accuracy of prediction.
Broken leg problems are particularly acute when it comes to ungrounded
SPRs. Since we don’t know why, specifically, the SPR is reliable, we are
naturally diffident about applying the SPR to cases which seem to us to
have some relevantly different property. For example, as we have noted,
the VRAG was originally developed for violent Canadian psychiatric patients.
But in order to prove its worth, it was tested on other populations
and shown to be robust. A reasoning rule, particularly an ungrounded
rule, that is not tested on a wide variety of different subpopulations is
suspect.
Once we know that an ungrounded rule is robustly more reliable than
unaided human judgment, the selective defection strategy is deeply suspect.
As far as we know, VRAG has not been tested on violent criminals in
India. So suppose we were asked to make judgments of violent recidivism
for violent criminals in India, and suppose we didn’t have the time or
resources to test VRAG on the relevant population. Would it be reasonable
to use VRAG in this situation? Let’s be clear about what the issue is.
The issue is not whether VRAG in the new setting is as reliable as VRAG in
the original setting (where it has been tested and found successful). The
issue is whether VRAG in the new setting is better than our unaided human
judgment in the new setting. Let’s consider this issue in a bit of detail.
When trying to make judgments about a new situation in which we
aren’t sure about the reliability of our reasoning strategies, we are clearly
in a rather poor epistemic position. It is useful to keep in mind that this is
not the sort of situation in which any strategy is likely to be particularly
reliable. But our unaided human judgments often possess a characteristic
that ungrounded SPRs don’t—a deep confidence in their correctness.
When we consider whether to employ an SPR (like VRAG) or our unaided
human judgment to a new situation, it will often seem more reasonable to
employ our judgment than the SPR. But notice, we typically don’t know
why either of them is as reliable as it is in the known cases. So we are not
deciding on the basis of a well-grounded theory that the new situation has
properties that make our judgment more reliable than the SPR. Instead,
we’re probably assuming that our reasoning faculties are capable of
adapting to the new situation (whereas the SPR isn’t), and so our faculties
are likely to be more reliable. But on what grounds do we make such an
assumption? After all, in a wide variety of situations analogous to the new
one (recall, we’re assuming the SPR is robustly more reliable than human
experts), the SPR is more reliable than the expert. Why should we think
that the expert is going to do better than the SPR in a quite defective
epistemic situation? Perhaps neither of them will do any better than
chance; but surely the best bet is that the strategy that has proven itself to
be more reliable in analogous situations is going to be more reliable in the
new situation.
Our tendency to defect from a lovely SPR is related to our tendency to
plump for causal stories. Consider a disturbing example of a catchy story
being accepted as causal fact. For too long, infantile autism was thought to
be caused by maternal rejection. The evidence? Parents of autistic children
could readily recall episodes in which they had not been accepting of their
child (Dawes 2001, 136). It is easy to piece together a story about how
maternal rejection would lead to the characteristic social, emotional, and
communication troubles associated with autism. But it is beyond appalling
that such weak evidence could have been used to justify the view that
mothers were causally responsible for their children’s autism. As this case
makes clear, stories are cheap. But even some of the most inaccurate stories
are irresistible. When we tell a story, we begin to feel we understand. And
when we think we understand, we begin to think we know when to defect
from an SPR. Our unconstrained facility in generating stories, and our
arrogance in accepting them, causes us to defect from far more accurate
predictive rules. Consider another story. There are more ‘‘muscle car’’
purchases in the southeastern U.S. than in any other region. What explains
this southeastern taste for Mustangs, Camaros, and Firebirds? Elements of
an explanation immediately spring to mind. No doubt the Daytona and
Winston-Salem stock car races influence local tastes. And (perhaps making
a bit of a leap here), there’s a good ol’ boy hot-rod culture in the area—
isn’t there? As we fit these images into a more or less coherent assemblage,
centered on a stereotype of rural poverty, poor education, and green bean
casseroles, a gratifying sense of understanding washes over us. We become
confident that we have hit upon an explanation. But as it turns out, the
typical muscle-car purchaser also enjoys wok cooking and oat-bran cereal,
uses fax machines, and buys flowers for special events (Weiss 1994, 62). Is
the stereotype that motivates the story easily integrated with delectation of
wok-prepared cuisine and floral sensibilities? It is hard to see how. Our
‘‘explanation’’ is really just a folksy story, creatively cobbled lore of familiar
anecdotal cast. It is also dead wrong, and the sense of understanding it
conveys, however comforting, is counterfeit. And yet it is hard to shake the
story. Especially when it is fortified with apparently confirming evidence:
The demographic map for muscle-car purchases looks very much like the
demographic map for rates of response to junk mail. Those queried who
aren’t too shy sum it up very simply: It’s what you’d expect from trailer
trash (Weiss 1994).
As we have already admitted, sometimes reasoners should defect from
SPRs, even ungrounded ones. One of our colleagues in psychology has
developed an SPR for predicting recidivism for people convicted of child
sexual abuse. When asked about the broken leg problem, the psychologist
admitted that one should always correct the rule if it doesn’t predict a zero
chance of recidivism for dead people. There are very well-grounded causal
hypotheses for why this sort of situation would call for defection. But in
absence of a situation in which we have documented reasons (not merely
easy causal stories) to believe that the ‘‘broken leg’’ property (e.g., death) is
The Amazing Success of Statistical Prediction Rules 51
a powerful predictor of the target property (e.g., crime), defection is
usually a bad idea. The best advice is probably that one should typically
resist defecting well beyond what intuitively seems reasonable. As Paul
Meehl has said, we should defect from a well-tested SPR when the ‘‘situation
is as clear as a broken leg; otherwise, very, very seldom’’ (1957, 273).
Let’s make a rough distinction between two classes of SPRs. Grounded
SPRs are SPRs for which we have a theoretical explanation for their success.
Ungrounded SPRs are SPRs for which we do not have a theoretical
explanation for their success. Basically, we understand why grounded SPRs
work, but we don’t understand why ungrounded SPRs work. There are
two points to note about this distinction. First, it is not hard-and-fast,
since we can have better and worse understanding of why an SPR works.
Second, for any ungrounded SPR, there may well be a neat causal explanation
for its success that we don’t yet know. So the distinction is not
meant to be a metaphysical one, but an epistemological one. It is a distinction
based on the quality of our understanding of SPRs and the subject
matters on which they are based.
Consider an ungrounded SPR—the F minus F Rule for predicting
marital happiness (discussed in section 1). Why is this rule reliable?
The Amazing Success of Statistical Prediction Rules 47
A reasonable assumption is that the correlation between the combined set
of predictor cues and the target property is sustained by an underlying,
stable network of causes. This is not to say that there is a science that
would treat such ensembles of cues as a natural kind; it is to say, however,
two things. First, their arrangement has a natural explanation. The explanation
may not be unified—indeed, it may be so tortured that it is little
more than a description of causal inventory—but it is an explanation in
terms of causes nonetheless. Second, these arrangements, in general, do
not spontaneously vanish.
Whatever specific facts explain the success of SPRs, they are not metaphysically
exotic. As predictive instruments, SPRs are not like the occasional
‘‘technical’’ stock market indicators offered by gurus who combine
a motley of moon phases, glottal stops, and transfer credits to predict
stock movements. The VRAG test for predicting violent recidivism is an
ungrounded SPR. In its present form, it consists of twelve predictor variables,
and each is scored on a weighting system of (ю) or (_). The weights
vary from a _5 to a ю12. The VRAG requires such information as the
person’s: Revised Psychopathy Checklist Score, Elementary School Maladjustment
Score, satisfaction of any DSM criteria for a personality disorder,
age at the time of the index offense, separation from either parent
(except by death) by the age of sixteen, failure on prior conditional release,
nonviolent offense history score (using the Cormier-Lang scale), unmarried
status (or equivalent), meeting DSM criteria for schizophrenia, most
serious victim injury (from the index offense), alcohol abuse score, and
any female victim in the index offense (Quinsey et al. 1998). Many of these
categories are independently known to interact richly with social behavior.
It is not as though the diagnostic problem of deciding whether this person
is likely to commit a similarly violent crime is being determined by facts
known to be ontologically unrelated to or isolated from social behavior,
such as the psychic’s interpretation of tarot cards.
Now let’s turn our attention to grounded SPRs. Many good examples
of grounded SPRs come from medicine. In the case of determining the extent
of prostate cancer, for example, there is a four-variable SPR that takes
into account patient age, PSA (prostate specific antigen) test value, the biopsy
Gleason score (arrived at from a pathologist’s assessment of tissue
samples), and the observable properties of the magnetic resonance image.
Each variable makes an incremental improvement in determining the
patient’s prognosis. But we understand very well why three of those variables
help to reliably predict the target property. We don’t understand
much about what mechanisms account for age being a good predictor. Recall that we said that there was an exception to the general failure of
strategies of selective defection. Grounded SPRs provide that exception.
Experts can sometimes improve on the reliability of SPRs by adopting a
strategy of selective defection (Swets, Dawes, and Monahan 2000). But
notice that the improved reliability comes about because the expert can
apply her well-supported theoretical knowledge to a problem. When
someone is in possession of a theory that has proven to be reliable and that
theory suggests defecting from an SPR (particularly when the expert’s
judgment relies on a cue not used by the SPR), then a strategy of selective
defection can be an excellent one.
Even when an expert is able to outperform an SPR because of her
superior theoretical knowledge, there are two notes of caution. First, there
is every reason to believe that a new SPR can be developed that takes the
expert’s knowledge into account and that the refined SPR will be more reliable
than the expert. One way to think about this is that when an expert
is able to defeat the best available SPR, this situation is typically temporary:
There is likely another SPR that can take into account the extra
theoretical knowledge being employed by the expert and that is at least as
reliable as the expert. The second note of caution is that even in domains
with grounded SPRs, selective defection is not always a good strategy. The
reasoner who has adopted the selective defection strategy needs to be able
to apply the relevant theoretical understanding well enough to reliably
defect from the SPR. And this will not always be easy to do. Even when the
reasoner knows what variables to look at, he might still have a hard time
weighing and integrating different lines of information (see section 3,
above).
What about the (unfortunately) more common ungrounded SPRs,
such as the Goldberg Rule, the VRAG, and the F minus F Rule? For most
of the variables that make up these rules, there is no well-confirmed theory
that explains their incremental validity, even if we feel we can tell a good
story about why each variable contributes to the accuracy of prediction.
Broken leg problems are particularly acute when it comes to ungrounded
SPRs. Since we don’t know why, specifically, the SPR is reliable, we are
naturally diffident about applying the SPR to cases which seem to us to
have some relevantly different property. For example, as we have noted,
the VRAG was originally developed for violent Canadian psychiatric patients.
But in order to prove its worth, it was tested on other populations
and shown to be robust. A reasoning rule, particularly an ungrounded
rule, that is not tested on a wide variety of different subpopulations is
suspect.
Once we know that an ungrounded rule is robustly more reliable than
unaided human judgment, the selective defection strategy is deeply suspect.
As far as we know, VRAG has not been tested on violent criminals in
India. So suppose we were asked to make judgments of violent recidivism
for violent criminals in India, and suppose we didn’t have the time or
resources to test VRAG on the relevant population. Would it be reasonable
to use VRAG in this situation? Let’s be clear about what the issue is.
The issue is not whether VRAG in the new setting is as reliable as VRAG in
the original setting (where it has been tested and found successful). The
issue is whether VRAG in the new setting is better than our unaided human
judgment in the new setting. Let’s consider this issue in a bit of detail.
When trying to make judgments about a new situation in which we
aren’t sure about the reliability of our reasoning strategies, we are clearly
in a rather poor epistemic position. It is useful to keep in mind that this is
not the sort of situation in which any strategy is likely to be particularly
reliable. But our unaided human judgments often possess a characteristic
that ungrounded SPRs don’t—a deep confidence in their correctness.
When we consider whether to employ an SPR (like VRAG) or our unaided
human judgment to a new situation, it will often seem more reasonable to
employ our judgment than the SPR. But notice, we typically don’t know
why either of them is as reliable as it is in the known cases. So we are not
deciding on the basis of a well-grounded theory that the new situation has
properties that make our judgment more reliable than the SPR. Instead,
we’re probably assuming that our reasoning faculties are capable of
adapting to the new situation (whereas the SPR isn’t), and so our faculties
are likely to be more reliable. But on what grounds do we make such an
assumption? After all, in a wide variety of situations analogous to the new
one (recall, we’re assuming the SPR is robustly more reliable than human
experts), the SPR is more reliable than the expert. Why should we think
that the expert is going to do better than the SPR in a quite defective
epistemic situation? Perhaps neither of them will do any better than
chance; but surely the best bet is that the strategy that has proven itself to
be more reliable in analogous situations is going to be more reliable in the
new situation.
Our tendency to defect from a lovely SPR is related to our tendency to
plump for causal stories. Consider a disturbing example of a catchy story
being accepted as causal fact. For too long, infantile autism was thought to
be caused by maternal rejection. The evidence? Parents of autistic children
could readily recall episodes in which they had not been accepting of their
child (Dawes 2001, 136). It is easy to piece together a story about how
maternal rejection would lead to the characteristic social, emotional, and
communication troubles associated with autism. But it is beyond appalling
that such weak evidence could have been used to justify the view that
mothers were causally responsible for their children’s autism. As this case
makes clear, stories are cheap. But even some of the most inaccurate stories
are irresistible. When we tell a story, we begin to feel we understand. And
when we think we understand, we begin to think we know when to defect
from an SPR. Our unconstrained facility in generating stories, and our
arrogance in accepting them, causes us to defect from far more accurate
predictive rules. Consider another story. There are more ‘‘muscle car’’
purchases in the southeastern U.S. than in any other region. What explains
this southeastern taste for Mustangs, Camaros, and Firebirds? Elements of
an explanation immediately spring to mind. No doubt the Daytona and
Winston-Salem stock car races influence local tastes. And (perhaps making
a bit of a leap here), there’s a good ol’ boy hot-rod culture in the area—
isn’t there? As we fit these images into a more or less coherent assemblage,
centered on a stereotype of rural poverty, poor education, and green bean
casseroles, a gratifying sense of understanding washes over us. We become
confident that we have hit upon an explanation. But as it turns out, the
typical muscle-car purchaser also enjoys wok cooking and oat-bran cereal,
uses fax machines, and buys flowers for special events (Weiss 1994, 62). Is
the stereotype that motivates the story easily integrated with delectation of
wok-prepared cuisine and floral sensibilities? It is hard to see how. Our
‘‘explanation’’ is really just a folksy story, creatively cobbled lore of familiar
anecdotal cast. It is also dead wrong, and the sense of understanding it
conveys, however comforting, is counterfeit. And yet it is hard to shake the
story. Especially when it is fortified with apparently confirming evidence:
The demographic map for muscle-car purchases looks very much like the
demographic map for rates of response to junk mail. Those queried who
aren’t too shy sum it up very simply: It’s what you’d expect from trailer
trash (Weiss 1994).
As we have already admitted, sometimes reasoners should defect from
SPRs, even ungrounded ones. One of our colleagues in psychology has
developed an SPR for predicting recidivism for people convicted of child
sexual abuse. When asked about the broken leg problem, the psychologist
admitted that one should always correct the rule if it doesn’t predict a zero
chance of recidivism for dead people. There are very well-grounded causal
hypotheses for why this sort of situation would call for defection. But in
absence of a situation in which we have documented reasons (not merely
easy causal stories) to believe that the ‘‘broken leg’’ property (e.g., death) is
The Amazing Success of Statistical Prediction Rules 51
a powerful predictor of the target property (e.g., crime), defection is
usually a bad idea. The best advice is probably that one should typically
resist defecting well beyond what intuitively seems reasonable. As Paul
Meehl has said, we should defect from a well-tested SPR when the ‘‘situation
is as clear as a broken leg; otherwise, very, very seldom’’ (1957, 273).