2. Robust reliability
К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94
A rule is robustly reliable to the extent that (a) it makes accurate predictions
for the various natural partitions of the rule’s range and (b) it has a wide
range. Robustness is a matter of consistency and scope. First, a rule’s robustness
is a function of the extent to which its reliability score is consistently
reliable across various discriminable partitions of the rule’s range. A rule that
is reliable for some problem-types in its range but not for others is not
robust. And a rule’s robustness is a function of the scope of its range. The
wider the range of the rule, the more robust it is. Both features of robustness
are matters of degree; and so robustness is a matter of degree as well.
There are at least three reasons epistemology should recommend robust
reasoning strategies—strategies that are resilient (retain high truth ratios)
under changes in cognizers and environments. First, as more rules are tested
and recommended, the probability increases that a rule will seem more
reliable than it really is. An epistemological theory that values robustness is
best positioned to catch a rule whose real reliability score is relatively low
but whose observed reliability score is high by chance. One way to identify
lucky rules is to export them to a somewhat different domain and see
whether they hold up under the strain. This is essentially the familiar admonition
in science that one should test hypotheses on diverse evidence. A
second reason robustness is important is that more robust rules can be
easier to implement. Other things being equal, applying one rule to a wide
range of problems is easier than keeping in mind and applying many rules
(with their varying application conditions) to those problems. A third reason
to prefer robust rules is that they can be recommended for general use,
regardless of the vagaries of an individual’s environment.
Assessing whether a reasoning strategy is robust can be trickier than it
appears. Consider Gigerenzer’s recognition heuristic, which we introduced
in chapter 3: If S recognizes one of two objects but not the other, and
recognition correlates positively (negatively) with the criterion, then S can
infer that the recognized object has the higher (lower) value. It has been
applied to problems of city size and investment (Gigerenzer, Todd, and
the ABC Group 1999). Is the recognition heuristic robust? This is not a
well-framed question. Recall that a reasoning strategy is defined in terms
of cues, a formula (or algorithm), a target property, and a range. The
robustness of a reasoning strategy is a function of the scope of its range
and how accurate the strategy is on natural partitions within the rule’s
range. Unless we specify the appropriate range of the recognition heuristic,
we cannot assess its robustness. If the range of the heuristic is city size problems, then it will not be robust because of its rather narrow scope. If
its range is investment strategies, we suspect that it will not be robust
because of its failure to be reliable on many discriminable subsets (or partitions)
of that range, e.g., rolling 6-month periods from 1960 to 2000 (see
chapter 3, section 1). So the recognition heuristic provides a nice example
of the ways in which reasoning strategies can fail to be robust.
It is perhaps worthwhile to note that second-order reasoning strategies
can be, and can fail to be, robust. Recall that Grove and Meehl (1996)
surveyed 136 studies which had 617 distinct comparisons of the reliability
of SPRs and clinical prediction (see chapter 2, section 1.1). They concluded
that 64 studies favored the SPR and 8 favored the clinician (with the other
64 showing approximately equivalent accuracy). Grove and Meehl then
examined the cases in which the clinicians were more accurate and wondered
whether they could fathom some coherent set of problems on which
human experts are more reliable than SPRs. Here is their conclusion:
The 8 studies favoring the clinician are not concentrated in any one predictive
area, do not overrepresent any one type of clinician (e.g., medical
doctors), and do not in fact have any obvious characteristics in common.
This is disappointing, as one of the chief goals of the meta-analysis was to
identify particular areas in which the clinician might outperform the
mechanical prediction method. (Grove and Meehl 1996, 298)
Grove and Meehl are after a kind of (second-order) robustness here. They
want to know whether there is some set of problems for which clinical
prediction is robustly more reliable than SPRs. Since they couldn’t find
any such pocket of expertise, they conclude that ‘‘the most plausible explanation
of these deviant studies is that they arose by a combination of
random sampling errors (8 deviant out of 136) and the clinicians’ informational
advantage in being provided with more data than the actuarial
formula’’ (1996, 298).
A rule is robustly reliable to the extent that (a) it makes accurate predictions
for the various natural partitions of the rule’s range and (b) it has a wide
range. Robustness is a matter of consistency and scope. First, a rule’s robustness
is a function of the extent to which its reliability score is consistently
reliable across various discriminable partitions of the rule’s range. A rule that
is reliable for some problem-types in its range but not for others is not
robust. And a rule’s robustness is a function of the scope of its range. The
wider the range of the rule, the more robust it is. Both features of robustness
are matters of degree; and so robustness is a matter of degree as well.
There are at least three reasons epistemology should recommend robust
reasoning strategies—strategies that are resilient (retain high truth ratios)
under changes in cognizers and environments. First, as more rules are tested
and recommended, the probability increases that a rule will seem more
reliable than it really is. An epistemological theory that values robustness is
best positioned to catch a rule whose real reliability score is relatively low
but whose observed reliability score is high by chance. One way to identify
lucky rules is to export them to a somewhat different domain and see
whether they hold up under the strain. This is essentially the familiar admonition
in science that one should test hypotheses on diverse evidence. A
second reason robustness is important is that more robust rules can be
easier to implement. Other things being equal, applying one rule to a wide
range of problems is easier than keeping in mind and applying many rules
(with their varying application conditions) to those problems. A third reason
to prefer robust rules is that they can be recommended for general use,
regardless of the vagaries of an individual’s environment.
Assessing whether a reasoning strategy is robust can be trickier than it
appears. Consider Gigerenzer’s recognition heuristic, which we introduced
in chapter 3: If S recognizes one of two objects but not the other, and
recognition correlates positively (negatively) with the criterion, then S can
infer that the recognized object has the higher (lower) value. It has been
applied to problems of city size and investment (Gigerenzer, Todd, and
the ABC Group 1999). Is the recognition heuristic robust? This is not a
well-framed question. Recall that a reasoning strategy is defined in terms
of cues, a formula (or algorithm), a target property, and a range. The
robustness of a reasoning strategy is a function of the scope of its range
and how accurate the strategy is on natural partitions within the rule’s
range. Unless we specify the appropriate range of the recognition heuristic,
we cannot assess its robustness. If the range of the heuristic is city size problems, then it will not be robust because of its rather narrow scope. If
its range is investment strategies, we suspect that it will not be robust
because of its failure to be reliable on many discriminable subsets (or partitions)
of that range, e.g., rolling 6-month periods from 1960 to 2000 (see
chapter 3, section 1). So the recognition heuristic provides a nice example
of the ways in which reasoning strategies can fail to be robust.
It is perhaps worthwhile to note that second-order reasoning strategies
can be, and can fail to be, robust. Recall that Grove and Meehl (1996)
surveyed 136 studies which had 617 distinct comparisons of the reliability
of SPRs and clinical prediction (see chapter 2, section 1.1). They concluded
that 64 studies favored the SPR and 8 favored the clinician (with the other
64 showing approximately equivalent accuracy). Grove and Meehl then
examined the cases in which the clinicians were more accurate and wondered
whether they could fathom some coherent set of problems on which
human experts are more reliable than SPRs. Here is their conclusion:
The 8 studies favoring the clinician are not concentrated in any one predictive
area, do not overrepresent any one type of clinician (e.g., medical
doctors), and do not in fact have any obvious characteristics in common.
This is disappointing, as one of the chief goals of the meta-analysis was to
identify particular areas in which the clinician might outperform the
mechanical prediction method. (Grove and Meehl 1996, 298)
Grove and Meehl are after a kind of (second-order) robustness here. They
want to know whether there is some set of problems for which clinical
prediction is robustly more reliable than SPRs. Since they couldn’t find
any such pocket of expertise, they conclude that ‘‘the most plausible explanation
of these deviant studies is that they arose by a combination of
random sampling errors (8 deviant out of 136) and the clinicians’ informational
advantage in being provided with more data than the actuarial
formula’’ (1996, 298).