4.2. Grounded and ungrounded SPRs

К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 

Let’s make a rough distinction between two classes of SPRs. Grounded

SPRs are SPRs for which we have a theoretical explanation for their success.

Ungrounded SPRs are SPRs for which we do not have a theoretical

explanation for their success. Basically, we understand why grounded SPRs

work, but we don’t understand why ungrounded SPRs work. There are

two points to note about this distinction. First, it is not hard-and-fast,

since we can have better and worse understanding of why an SPR works.

Second, for any ungrounded SPR, there may well be a neat causal explanation

for its success that we don’t yet know. So the distinction is not

meant to be a metaphysical one, but an epistemological one. It is a distinction

based on the quality of our understanding of SPRs and the subject

matters on which they are based.

Consider an ungrounded SPR—the F minus F Rule for predicting

marital happiness (discussed in section 1). Why is this rule reliable?

The Amazing Success of Statistical Prediction Rules 47

A reasonable assumption is that the correlation between the combined set

of predictor cues and the target property is sustained by an underlying,

stable network of causes. This is not to say that there is a science that

would treat such ensembles of cues as a natural kind; it is to say, however,

two things. First, their arrangement has a natural explanation. The explanation

may not be unified—indeed, it may be so tortured that it is little

more than a description of causal inventory—but it is an explanation in

terms of causes nonetheless. Second, these arrangements, in general, do

not spontaneously vanish.

Whatever specific facts explain the success of SPRs, they are not metaphysically

exotic. As predictive instruments, SPRs are not like the occasional

‘‘technical’’ stock market indicators offered by gurus who combine

a motley of moon phases, glottal stops, and transfer credits to predict

stock movements. The VRAG test for predicting violent recidivism is an

ungrounded SPR. In its present form, it consists of twelve predictor variables,

and each is scored on a weighting system of (ю) or (_). The weights

vary from a _5 to a ю12. The VRAG requires such information as the

person’s: Revised Psychopathy Checklist Score, Elementary School Maladjustment

Score, satisfaction of any DSM criteria for a personality disorder,

age at the time of the index offense, separation from either parent

(except by death) by the age of sixteen, failure on prior conditional release,

nonviolent offense history score (using the Cormier-Lang scale), unmarried

status (or equivalent), meeting DSM criteria for schizophrenia, most

serious victim injury (from the index offense), alcohol abuse score, and

any female victim in the index offense (Quinsey et al. 1998). Many of these

categories are independently known to interact richly with social behavior.

It is not as though the diagnostic problem of deciding whether this person

is likely to commit a similarly violent crime is being determined by facts

known to be ontologically unrelated to or isolated from social behavior,

such as the psychic’s interpretation of tarot cards.

Now let’s turn our attention to grounded SPRs. Many good examples

of grounded SPRs come from medicine. In the case of determining the extent

of prostate cancer, for example, there is a four-variable SPR that takes

into account patient age, PSA (prostate specific antigen) test value, the biopsy

Gleason score (arrived at from a pathologist’s assessment of tissue

samples), and the observable properties of the magnetic resonance image.

Each variable makes an incremental improvement in determining the

patient’s prognosis. But we understand very well why three of those variables

help to reliably predict the target property. We don’t understand

much about what mechanisms account for age being a good predictor. Recall that we said that there was an exception to the general failure of

strategies of selective defection. Grounded SPRs provide that exception.

Experts can sometimes improve on the reliability of SPRs by adopting a

strategy of selective defection (Swets, Dawes, and Monahan 2000). But

notice that the improved reliability comes about because the expert can

apply her well-supported theoretical knowledge to a problem. When

someone is in possession of a theory that has proven to be reliable and that

theory suggests defecting from an SPR (particularly when the expert’s

judgment relies on a cue not used by the SPR), then a strategy of selective

defection can be an excellent one.

Even when an expert is able to outperform an SPR because of her

superior theoretical knowledge, there are two notes of caution. First, there

is every reason to believe that a new SPR can be developed that takes the

expert’s knowledge into account and that the refined SPR will be more reliable

than the expert. One way to think about this is that when an expert

is able to defeat the best available SPR, this situation is typically temporary:

There is likely another SPR that can take into account the extra

theoretical knowledge being employed by the expert and that is at least as

reliable as the expert. The second note of caution is that even in domains

with grounded SPRs, selective defection is not always a good strategy. The

reasoner who has adopted the selective defection strategy needs to be able

to apply the relevant theoretical understanding well enough to reliably

defect from the SPR. And this will not always be easy to do. Even when the

reasoner knows what variables to look at, he might still have a hard time

weighing and integrating different lines of information (see section 3,

above).

What about the (unfortunately) more common ungrounded SPRs,

such as the Goldberg Rule, the VRAG, and the F minus F Rule? For most

of the variables that make up these rules, there is no well-confirmed theory

that explains their incremental validity, even if we feel we can tell a good

story about why each variable contributes to the accuracy of prediction.

Broken leg problems are particularly acute when it comes to ungrounded

SPRs. Since we don’t know why, specifically, the SPR is reliable, we are

naturally diffident about applying the SPR to cases which seem to us to

have some relevantly different property. For example, as we have noted,

the VRAG was originally developed for violent Canadian psychiatric patients.

But in order to prove its worth, it was tested on other populations

and shown to be robust. A reasoning rule, particularly an ungrounded

rule, that is not tested on a wide variety of different subpopulations is

suspect.

Once we know that an ungrounded rule is robustly more reliable than

unaided human judgment, the selective defection strategy is deeply suspect.

As far as we know, VRAG has not been tested on violent criminals in

India. So suppose we were asked to make judgments of violent recidivism

for violent criminals in India, and suppose we didn’t have the time or

resources to test VRAG on the relevant population. Would it be reasonable

to use VRAG in this situation? Let’s be clear about what the issue is.

The issue is not whether VRAG in the new setting is as reliable as VRAG in

the original setting (where it has been tested and found successful). The

issue is whether VRAG in the new setting is better than our unaided human

judgment in the new setting. Let’s consider this issue in a bit of detail.

When trying to make judgments about a new situation in which we

aren’t sure about the reliability of our reasoning strategies, we are clearly

in a rather poor epistemic position. It is useful to keep in mind that this is

not the sort of situation in which any strategy is likely to be particularly

reliable. But our unaided human judgments often possess a characteristic

that ungrounded SPRs don’t—a deep confidence in their correctness.

When we consider whether to employ an SPR (like VRAG) or our unaided

human judgment to a new situation, it will often seem more reasonable to

employ our judgment than the SPR. But notice, we typically don’t know

why either of them is as reliable as it is in the known cases. So we are not

deciding on the basis of a well-grounded theory that the new situation has

properties that make our judgment more reliable than the SPR. Instead,

we’re probably assuming that our reasoning faculties are capable of

adapting to the new situation (whereas the SPR isn’t), and so our faculties

are likely to be more reliable. But on what grounds do we make such an

assumption? After all, in a wide variety of situations analogous to the new

one (recall, we’re assuming the SPR is robustly more reliable than human

experts), the SPR is more reliable than the expert. Why should we think

that the expert is going to do better than the SPR in a quite defective

epistemic situation? Perhaps neither of them will do any better than

chance; but surely the best bet is that the strategy that has proven itself to

be more reliable in analogous situations is going to be more reliable in the

new situation.

Our tendency to defect from a lovely SPR is related to our tendency to

plump for causal stories. Consider a disturbing example of a catchy story

being accepted as causal fact. For too long, infantile autism was thought to

be caused by maternal rejection. The evidence? Parents of autistic children

could readily recall episodes in which they had not been accepting of their

child (Dawes 2001, 136). It is easy to piece together a story about how

maternal rejection would lead to the characteristic social, emotional, and

communication troubles associated with autism. But it is beyond appalling

that such weak evidence could have been used to justify the view that

mothers were causally responsible for their children’s autism. As this case

makes clear, stories are cheap. But even some of the most inaccurate stories

are irresistible. When we tell a story, we begin to feel we understand. And

when we think we understand, we begin to think we know when to defect

from an SPR. Our unconstrained facility in generating stories, and our

arrogance in accepting them, causes us to defect from far more accurate

predictive rules. Consider another story. There are more ‘‘muscle car’’

purchases in the southeastern U.S. than in any other region. What explains

this southeastern taste for Mustangs, Camaros, and Firebirds? Elements of

an explanation immediately spring to mind. No doubt the Daytona and

Winston-Salem stock car races influence local tastes. And (perhaps making

a bit of a leap here), there’s a good ol’ boy hot-rod culture in the area—

isn’t there? As we fit these images into a more or less coherent assemblage,

centered on a stereotype of rural poverty, poor education, and green bean

casseroles, a gratifying sense of understanding washes over us. We become

confident that we have hit upon an explanation. But as it turns out, the

typical muscle-car purchaser also enjoys wok cooking and oat-bran cereal,

uses fax machines, and buys flowers for special events (Weiss 1994, 62). Is

the stereotype that motivates the story easily integrated with delectation of

wok-prepared cuisine and floral sensibilities? It is hard to see how. Our

‘‘explanation’’ is really just a folksy story, creatively cobbled lore of familiar

anecdotal cast. It is also dead wrong, and the sense of understanding it

conveys, however comforting, is counterfeit. And yet it is hard to shake the

story. Especially when it is fortified with apparently confirming evidence:

The demographic map for muscle-car purchases looks very much like the

demographic map for rates of response to junk mail. Those queried who

aren’t too shy sum it up very simply: It’s what you’d expect from trailer

trash (Weiss 1994).

As we have already admitted, sometimes reasoners should defect from

SPRs, even ungrounded ones. One of our colleagues in psychology has

developed an SPR for predicting recidivism for people convicted of child

sexual abuse. When asked about the broken leg problem, the psychologist

admitted that one should always correct the rule if it doesn’t predict a zero

chance of recidivism for dead people. There are very well-grounded causal

hypotheses for why this sort of situation would call for defection. But in

absence of a situation in which we have documented reasons (not merely

easy causal stories) to believe that the ‘‘broken leg’’ property (e.g., death) is

The Amazing Success of Statistical Prediction Rules 51

a powerful predictor of the target property (e.g., crime), defection is

usually a bad idea. The best advice is probably that one should typically

resist defecting well beyond what intuitively seems reasonable. As Paul

Meehl has said, we should defect from a well-tested SPR when the ‘‘situation

is as clear as a broken leg; otherwise, very, very seldom’’ (1957, 273).

Let’s make a rough distinction between two classes of SPRs. Grounded

SPRs are SPRs for which we have a theoretical explanation for their success.

Ungrounded SPRs are SPRs for which we do not have a theoretical

explanation for their success. Basically, we understand why grounded SPRs

work, but we don’t understand why ungrounded SPRs work. There are

two points to note about this distinction. First, it is not hard-and-fast,

since we can have better and worse understanding of why an SPR works.

Second, for any ungrounded SPR, there may well be a neat causal explanation

for its success that we don’t yet know. So the distinction is not

meant to be a metaphysical one, but an epistemological one. It is a distinction

based on the quality of our understanding of SPRs and the subject

matters on which they are based.

Consider an ungrounded SPR—the F minus F Rule for predicting

marital happiness (discussed in section 1). Why is this rule reliable?

The Amazing Success of Statistical Prediction Rules 47

A reasonable assumption is that the correlation between the combined set

of predictor cues and the target property is sustained by an underlying,

stable network of causes. This is not to say that there is a science that

would treat such ensembles of cues as a natural kind; it is to say, however,

two things. First, their arrangement has a natural explanation. The explanation

may not be unified—indeed, it may be so tortured that it is little

more than a description of causal inventory—but it is an explanation in

terms of causes nonetheless. Second, these arrangements, in general, do

not spontaneously vanish.

Whatever specific facts explain the success of SPRs, they are not metaphysically

exotic. As predictive instruments, SPRs are not like the occasional

‘‘technical’’ stock market indicators offered by gurus who combine

a motley of moon phases, glottal stops, and transfer credits to predict

stock movements. The VRAG test for predicting violent recidivism is an

ungrounded SPR. In its present form, it consists of twelve predictor variables,

and each is scored on a weighting system of (ю) or (_). The weights

vary from a _5 to a ю12. The VRAG requires such information as the

person’s: Revised Psychopathy Checklist Score, Elementary School Maladjustment

Score, satisfaction of any DSM criteria for a personality disorder,

age at the time of the index offense, separation from either parent

(except by death) by the age of sixteen, failure on prior conditional release,

nonviolent offense history score (using the Cormier-Lang scale), unmarried

status (or equivalent), meeting DSM criteria for schizophrenia, most

serious victim injury (from the index offense), alcohol abuse score, and

any female victim in the index offense (Quinsey et al. 1998). Many of these

categories are independently known to interact richly with social behavior.

It is not as though the diagnostic problem of deciding whether this person

is likely to commit a similarly violent crime is being determined by facts

known to be ontologically unrelated to or isolated from social behavior,

such as the psychic’s interpretation of tarot cards.

Now let’s turn our attention to grounded SPRs. Many good examples

of grounded SPRs come from medicine. In the case of determining the extent

of prostate cancer, for example, there is a four-variable SPR that takes

into account patient age, PSA (prostate specific antigen) test value, the biopsy

Gleason score (arrived at from a pathologist’s assessment of tissue

samples), and the observable properties of the magnetic resonance image.

Each variable makes an incremental improvement in determining the

patient’s prognosis. But we understand very well why three of those variables

help to reliably predict the target property. We don’t understand

much about what mechanisms account for age being a good predictor. Recall that we said that there was an exception to the general failure of

strategies of selective defection. Grounded SPRs provide that exception.

Experts can sometimes improve on the reliability of SPRs by adopting a

strategy of selective defection (Swets, Dawes, and Monahan 2000). But

notice that the improved reliability comes about because the expert can

apply her well-supported theoretical knowledge to a problem. When

someone is in possession of a theory that has proven to be reliable and that

theory suggests defecting from an SPR (particularly when the expert’s

judgment relies on a cue not used by the SPR), then a strategy of selective

defection can be an excellent one.

Even when an expert is able to outperform an SPR because of her

superior theoretical knowledge, there are two notes of caution. First, there

is every reason to believe that a new SPR can be developed that takes the

expert’s knowledge into account and that the refined SPR will be more reliable

than the expert. One way to think about this is that when an expert

is able to defeat the best available SPR, this situation is typically temporary:

There is likely another SPR that can take into account the extra

theoretical knowledge being employed by the expert and that is at least as

reliable as the expert. The second note of caution is that even in domains

with grounded SPRs, selective defection is not always a good strategy. The

reasoner who has adopted the selective defection strategy needs to be able

to apply the relevant theoretical understanding well enough to reliably

defect from the SPR. And this will not always be easy to do. Even when the

reasoner knows what variables to look at, he might still have a hard time

weighing and integrating different lines of information (see section 3,

above).

What about the (unfortunately) more common ungrounded SPRs,

such as the Goldberg Rule, the VRAG, and the F minus F Rule? For most

of the variables that make up these rules, there is no well-confirmed theory

that explains their incremental validity, even if we feel we can tell a good

story about why each variable contributes to the accuracy of prediction.

Broken leg problems are particularly acute when it comes to ungrounded

SPRs. Since we don’t know why, specifically, the SPR is reliable, we are

naturally diffident about applying the SPR to cases which seem to us to

have some relevantly different property. For example, as we have noted,

the VRAG was originally developed for violent Canadian psychiatric patients.

But in order to prove its worth, it was tested on other populations

and shown to be robust. A reasoning rule, particularly an ungrounded

rule, that is not tested on a wide variety of different subpopulations is

suspect.

Once we know that an ungrounded rule is robustly more reliable than

unaided human judgment, the selective defection strategy is deeply suspect.

As far as we know, VRAG has not been tested on violent criminals in

India. So suppose we were asked to make judgments of violent recidivism

for violent criminals in India, and suppose we didn’t have the time or

resources to test VRAG on the relevant population. Would it be reasonable

to use VRAG in this situation? Let’s be clear about what the issue is.

The issue is not whether VRAG in the new setting is as reliable as VRAG in

the original setting (where it has been tested and found successful). The

issue is whether VRAG in the new setting is better than our unaided human

judgment in the new setting. Let’s consider this issue in a bit of detail.

When trying to make judgments about a new situation in which we

aren’t sure about the reliability of our reasoning strategies, we are clearly

in a rather poor epistemic position. It is useful to keep in mind that this is

not the sort of situation in which any strategy is likely to be particularly

reliable. But our unaided human judgments often possess a characteristic

that ungrounded SPRs don’t—a deep confidence in their correctness.

When we consider whether to employ an SPR (like VRAG) or our unaided

human judgment to a new situation, it will often seem more reasonable to

employ our judgment than the SPR. But notice, we typically don’t know

why either of them is as reliable as it is in the known cases. So we are not

deciding on the basis of a well-grounded theory that the new situation has

properties that make our judgment more reliable than the SPR. Instead,

we’re probably assuming that our reasoning faculties are capable of

adapting to the new situation (whereas the SPR isn’t), and so our faculties

are likely to be more reliable. But on what grounds do we make such an

assumption? After all, in a wide variety of situations analogous to the new

one (recall, we’re assuming the SPR is robustly more reliable than human

experts), the SPR is more reliable than the expert. Why should we think

that the expert is going to do better than the SPR in a quite defective

epistemic situation? Perhaps neither of them will do any better than

chance; but surely the best bet is that the strategy that has proven itself to

be more reliable in analogous situations is going to be more reliable in the

new situation.

Our tendency to defect from a lovely SPR is related to our tendency to

plump for causal stories. Consider a disturbing example of a catchy story

being accepted as causal fact. For too long, infantile autism was thought to

be caused by maternal rejection. The evidence? Parents of autistic children

could readily recall episodes in which they had not been accepting of their

child (Dawes 2001, 136). It is easy to piece together a story about how

maternal rejection would lead to the characteristic social, emotional, and

communication troubles associated with autism. But it is beyond appalling

that such weak evidence could have been used to justify the view that

mothers were causally responsible for their children’s autism. As this case

makes clear, stories are cheap. But even some of the most inaccurate stories

are irresistible. When we tell a story, we begin to feel we understand. And

when we think we understand, we begin to think we know when to defect

from an SPR. Our unconstrained facility in generating stories, and our

arrogance in accepting them, causes us to defect from far more accurate

predictive rules. Consider another story. There are more ‘‘muscle car’’

purchases in the southeastern U.S. than in any other region. What explains

this southeastern taste for Mustangs, Camaros, and Firebirds? Elements of

an explanation immediately spring to mind. No doubt the Daytona and

Winston-Salem stock car races influence local tastes. And (perhaps making

a bit of a leap here), there’s a good ol’ boy hot-rod culture in the area—

isn’t there? As we fit these images into a more or less coherent assemblage,

centered on a stereotype of rural poverty, poor education, and green bean

casseroles, a gratifying sense of understanding washes over us. We become

confident that we have hit upon an explanation. But as it turns out, the

typical muscle-car purchaser also enjoys wok cooking and oat-bran cereal,

uses fax machines, and buys flowers for special events (Weiss 1994, 62). Is

the stereotype that motivates the story easily integrated with delectation of

wok-prepared cuisine and floral sensibilities? It is hard to see how. Our

‘‘explanation’’ is really just a folksy story, creatively cobbled lore of familiar

anecdotal cast. It is also dead wrong, and the sense of understanding it

conveys, however comforting, is counterfeit. And yet it is hard to shake the

story. Especially when it is fortified with apparently confirming evidence:

The demographic map for muscle-car purchases looks very much like the

demographic map for rates of response to junk mail. Those queried who

aren’t too shy sum it up very simply: It’s what you’d expect from trailer

trash (Weiss 1994).

As we have already admitted, sometimes reasoners should defect from

SPRs, even ungrounded ones. One of our colleagues in psychology has

developed an SPR for predicting recidivism for people convicted of child

sexual abuse. When asked about the broken leg problem, the psychologist

admitted that one should always correct the rule if it doesn’t predict a zero

chance of recidivism for dead people. There are very well-grounded causal

hypotheses for why this sort of situation would call for defection. But in

absence of a situation in which we have documented reasons (not merely

easy causal stories) to believe that the ‘‘broken leg’’ property (e.g., death) is

The Amazing Success of Statistical Prediction Rules 51

a powerful predictor of the target property (e.g., crime), defection is

usually a bad idea. The best advice is probably that one should typically

resist defecting well beyond what intuitively seems reasonable. As Paul

Meehl has said, we should defect from a well-tested SPR when the ‘‘situation

is as clear as a broken leg; otherwise, very, very seldom’’ (1957, 273).