2. Robust reliability

К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 

A rule is robustly reliable to the extent that (a) it makes accurate predictions

for the various natural partitions of the rule’s range and (b) it has a wide

range. Robustness is a matter of consistency and scope. First, a rule’s robustness

is a function of the extent to which its reliability score is consistently

reliable across various discriminable partitions of the rule’s range. A rule that

is reliable for some problem-types in its range but not for others is not

robust. And a rule’s robustness is a function of the scope of its range. The

wider the range of the rule, the more robust it is. Both features of robustness

are matters of degree; and so robustness is a matter of degree as well.

There are at least three reasons epistemology should recommend robust

reasoning strategies—strategies that are resilient (retain high truth ratios)

under changes in cognizers and environments. First, as more rules are tested

and recommended, the probability increases that a rule will seem more

reliable than it really is. An epistemological theory that values robustness is

best positioned to catch a rule whose real reliability score is relatively low

but whose observed reliability score is high by chance. One way to identify

lucky rules is to export them to a somewhat different domain and see

whether they hold up under the strain. This is essentially the familiar admonition

in science that one should test hypotheses on diverse evidence. A

second reason robustness is important is that more robust rules can be

easier to implement. Other things being equal, applying one rule to a wide

range of problems is easier than keeping in mind and applying many rules

(with their varying application conditions) to those problems. A third reason

to prefer robust rules is that they can be recommended for general use,

regardless of the vagaries of an individual’s environment.

Assessing whether a reasoning strategy is robust can be trickier than it

appears. Consider Gigerenzer’s recognition heuristic, which we introduced

in chapter 3: If S recognizes one of two objects but not the other, and

recognition correlates positively (negatively) with the criterion, then S can

infer that the recognized object has the higher (lower) value. It has been

applied to problems of city size and investment (Gigerenzer, Todd, and

the ABC Group 1999). Is the recognition heuristic robust? This is not a

well-framed question. Recall that a reasoning strategy is defined in terms

of cues, a formula (or algorithm), a target property, and a range. The

robustness of a reasoning strategy is a function of the scope of its range

and how accurate the strategy is on natural partitions within the rule’s

range. Unless we specify the appropriate range of the recognition heuristic,

we cannot assess its robustness. If the range of the heuristic is city size problems, then it will not be robust because of its rather narrow scope. If

its range is investment strategies, we suspect that it will not be robust

because of its failure to be reliable on many discriminable subsets (or partitions)

of that range, e.g., rolling 6-month periods from 1960 to 2000 (see

chapter 3, section 1). So the recognition heuristic provides a nice example

of the ways in which reasoning strategies can fail to be robust.

It is perhaps worthwhile to note that second-order reasoning strategies

can be, and can fail to be, robust. Recall that Grove and Meehl (1996)

surveyed 136 studies which had 617 distinct comparisons of the reliability

of SPRs and clinical prediction (see chapter 2, section 1.1). They concluded

that 64 studies favored the SPR and 8 favored the clinician (with the other

64 showing approximately equivalent accuracy). Grove and Meehl then

examined the cases in which the clinicians were more accurate and wondered

whether they could fathom some coherent set of problems on which

human experts are more reliable than SPRs. Here is their conclusion:

The 8 studies favoring the clinician are not concentrated in any one predictive

area, do not overrepresent any one type of clinician (e.g., medical

doctors), and do not in fact have any obvious characteristics in common.

This is disappointing, as one of the chief goals of the meta-analysis was to

identify particular areas in which the clinician might outperform the

mechanical prediction method. (Grove and Meehl 1996, 298)

Grove and Meehl are after a kind of (second-order) robustness here. They

want to know whether there is some set of problems for which clinical

prediction is robustly more reliable than SPRs. Since they couldn’t find

any such pocket of expertise, they conclude that ‘‘the most plausible explanation

of these deviant studies is that they arose by a combination of

random sampling errors (8 deviant out of 136) and the clinicians’ informational

advantage in being provided with more data than the actuarial

formula’’ (1996, 298).

A rule is robustly reliable to the extent that (a) it makes accurate predictions

for the various natural partitions of the rule’s range and (b) it has a wide

range. Robustness is a matter of consistency and scope. First, a rule’s robustness

is a function of the extent to which its reliability score is consistently

reliable across various discriminable partitions of the rule’s range. A rule that

is reliable for some problem-types in its range but not for others is not

robust. And a rule’s robustness is a function of the scope of its range. The

wider the range of the rule, the more robust it is. Both features of robustness

are matters of degree; and so robustness is a matter of degree as well.

There are at least three reasons epistemology should recommend robust

reasoning strategies—strategies that are resilient (retain high truth ratios)

under changes in cognizers and environments. First, as more rules are tested

and recommended, the probability increases that a rule will seem more

reliable than it really is. An epistemological theory that values robustness is

best positioned to catch a rule whose real reliability score is relatively low

but whose observed reliability score is high by chance. One way to identify

lucky rules is to export them to a somewhat different domain and see

whether they hold up under the strain. This is essentially the familiar admonition

in science that one should test hypotheses on diverse evidence. A

second reason robustness is important is that more robust rules can be

easier to implement. Other things being equal, applying one rule to a wide

range of problems is easier than keeping in mind and applying many rules

(with their varying application conditions) to those problems. A third reason

to prefer robust rules is that they can be recommended for general use,

regardless of the vagaries of an individual’s environment.

Assessing whether a reasoning strategy is robust can be trickier than it

appears. Consider Gigerenzer’s recognition heuristic, which we introduced

in chapter 3: If S recognizes one of two objects but not the other, and

recognition correlates positively (negatively) with the criterion, then S can

infer that the recognized object has the higher (lower) value. It has been

applied to problems of city size and investment (Gigerenzer, Todd, and

the ABC Group 1999). Is the recognition heuristic robust? This is not a

well-framed question. Recall that a reasoning strategy is defined in terms

of cues, a formula (or algorithm), a target property, and a range. The

robustness of a reasoning strategy is a function of the scope of its range

and how accurate the strategy is on natural partitions within the rule’s

range. Unless we specify the appropriate range of the recognition heuristic,

we cannot assess its robustness. If the range of the heuristic is city size problems, then it will not be robust because of its rather narrow scope. If

its range is investment strategies, we suspect that it will not be robust

because of its failure to be reliable on many discriminable subsets (or partitions)

of that range, e.g., rolling 6-month periods from 1960 to 2000 (see

chapter 3, section 1). So the recognition heuristic provides a nice example

of the ways in which reasoning strategies can fail to be robust.

It is perhaps worthwhile to note that second-order reasoning strategies

can be, and can fail to be, robust. Recall that Grove and Meehl (1996)

surveyed 136 studies which had 617 distinct comparisons of the reliability

of SPRs and clinical prediction (see chapter 2, section 1.1). They concluded

that 64 studies favored the SPR and 8 favored the clinician (with the other

64 showing approximately equivalent accuracy). Grove and Meehl then

examined the cases in which the clinicians were more accurate and wondered

whether they could fathom some coherent set of problems on which

human experts are more reliable than SPRs. Here is their conclusion:

The 8 studies favoring the clinician are not concentrated in any one predictive

area, do not overrepresent any one type of clinician (e.g., medical

doctors), and do not in fact have any obvious characteristics in common.

This is disappointing, as one of the chief goals of the meta-analysis was to

identify particular areas in which the clinician might outperform the

mechanical prediction method. (Grove and Meehl 1996, 298)

Grove and Meehl are after a kind of (second-order) robustness here. They

want to know whether there is some set of problems for which clinical

prediction is robustly more reliable than SPRs. Since they couldn’t find

any such pocket of expertise, they conclude that ‘‘the most plausible explanation

of these deviant studies is that they arose by a combination of

random sampling errors (8 deviant out of 136) and the clinicians’ informational

advantage in being provided with more data than the actuarial

formula’’ (1996, 298).