1. Real reliability scores
К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94
A reasoning strategy is a rule for making judgments on the basis of certain
cues. We can characterize the Goldberg Rule (but not necessarily all reasoning
strategies) in terms of four elements: (a) the cues used to make the
prediction; (b) the formula for combining the cues to make the prediction;
(c) the target of the prediction (i.e., what the prediction is about); and (d)
the range of objects (states, properties, processes, etc.), defined by detectable
cues, about which the rule makes judgments that are thought to
be reliable.
Cues : 4 MMPI personality scales (Pa, Sc, Hy, Pt) and one validity scale (L)
Formula : If [(LюPaюSc)–(HyюPt)] < 45, diagnose patient as neurotic;
otherwise diagnose patient as psychotic
Target : Neurosis or psychosis
Range : All psychiatric patients (assumed to be either psychotic or neurotic)
A reasoning strategy’s real reliability score is its ratio of true to total judgments
in the limit on its expected range of problems. When tested on a set
of 861 patients, the Goldberg Rule had a 70% hit rate; that is, the ratio of
its true predictions to total predictions was .7. So the Goldberg Rule’s observed
reliability score on this particular set of problems was 70%. On the
assumption that this set of problems is representative of the rule’s entire
range of problems, this observed reliability score can be said to approximate
(to a high degree of confidence, given the sample size) the rule’s real
reliability score.
But things are not so simple. Notice an important fact about the real
reliability score of any empirical reasoning rule: It is essentially dependent
on contingent factors. In one environment, the real reliability score of a
reasoning strategy might be high, whereas in another environment, it might
be low. This is the problem of environmental disparity. To see this problem
clearly, consider another example. The academic success prediction rule
(ASPR) works as follows: It makes relative predictions about applicants’
disposition to succeed in college by taking high school rank and aptitude
test score rank, weighing them equally, and then predicting that the best
students will be those with the highest scores. So if Smith’s high school rank
is 87 and her test score rank is 62, Smith gets a 149; if Jones’s scores are 75
and 73 respectively, he gets a 148; and so the ASPR predicts that Smith will
be more academically successful, as measured by GPA and prospects for
graduation, than Jones. We can characterize ASPR as follows:
Cues : High school rank, test score rank
Formula : Target is an increasing function of (hs rankюts rank)
Target : Disposition to succeed academically in college
Range : All high school applicants to U.S. colleges and universities
A rule’s range is just a bunch of objects (states, properties, processes, etc.)
about which the rule allows us to make a judgment (e.g., all U.S. males, all
U.S. children under 10 with reading disorders, NFL football games, etc.). A
typical range can be subdivided into many different natural discriminable
partitions. A natural partition will divide the objects in the range in
terms of properties that could in principle be causally related to the target
property. This restriction is meant to rule out partitions that involve mere-Cambridge properties (e.g., the property of being closer to Des
Moines than to Chicago), grue-ified properties (e.g., the property of being
green before 2010, blue afterward), or other artificial means of carving out
partitions for a range. A discriminable partition of a rule’s range is a partition
based on some feature that can in principle be detected by a reasoner
prior to the rule’s formulation. There are typically going to be many
different ways to divide a rule’s range into discriminable subgroups. For
example, ASPR’s range can be subdivided in terms of many properties of
the applicants (age, geography, quality of high school, etc.). Thus, the requirement
that partitions be discriminable limits the potentially infinite
number of possible partitions of a rule’s range. (One reason to insist on
only discriminable partitions is to avoid objections that might try to partition
a rule’s range into those cases in which the rule gives an accurate
judgment from those in which the rule does not give an accurate judgment.
Permitting such partitions would undermine our view. Rules could
be made to be perfectly reliable if their ranges were to be defined as consisting
of only those cases for which they are accurate. But rules whose
conditions of application cannot be detected cannot be used, and so they
should not play a role in a reason-guiding epistemology.)
The problem of environmental disparity arises when a rule is not consistently
reliable across discriminable partitions of a rule’s range. Suppose
for example that ASPR performs differently when it is applied to native
English speakers and nonnative English speakers. In particular, when applied
to native speakers it has a reliability score of 70%, but when applied
to nonnative speakers it has a reliability score of only 60%. Let’s suppose
that S and S1 have adopted the ASPR for making predictions about the
future academic success of high-school applicants. Even if S and S1 are
disposed to apply the ASPR to the same kinds of problems—to all high
school applicants to U.S. colleges and universities—they might find themselves
in quite different circumstances. Suppose S is a recently hired admissions
officer at a small, prestigious eastern liberal arts college; and S1 is
a recently hired admissions officer at a small community college in a Texas
border town. Because of their systematically different environments, it
is possible that ASPR’s reliability score for S would not be ASPR’s reliability
score for S1. There are many familiar examples of environmental
disparity. For example, concluding that a lake trout is safe to eat
will depend on the lake it comes from and perhaps also on the trout’s
age. Many examples come from strategies involved in interpreting behaviors
across different cultures. While most of us would interpret being spit
on by a priest in a Christian church to be a very bad sign, we have been told that being spit on by a priest in Senegal is a very good sign (one of
purification).
The problem of environmental disparity is troubling for our view
because it makes it hard to figure out just what a rule’s real reliability score
is supposed to be. Is ASPR’s real reliability score 67% because that’s its
score (let’s suppose) on all high school applicants? Or is it different for
different people? We will argue that real reliability scores attach to reasoning
strategies, or more specifically, to an individual’s use of a reasoning
strategy. So we will argue that if S and S1 are in different environments,
their use of ASPR might well have different real reliability scores.
To handle the problem of environmental disparity, let’s introduce the
notion of a reasoning strategy’s expected range for a subject in an environment.
The intuitive notion is straightforward: Given a person’s disposition to apply
a certain reasoning strategy R, there is a certain distribution of problems she
can expect to face, given her environment. How exactly this expected range is
to be defined will depend on the particulars of the case. We can often expect
counterfactual-supporting generalizations to play an important role in defining
the expected range of a reasoning strategy for a subject in an environment.
For example, small, prestigious, eastern liberal arts colleges tend to
attract a certain distribution of students, while small community colleges in
southern border towns tend to attract a different distribution of students.
There is a quite powerful, complicated web of causal connections that
maintains and explains those distributions. Once we know what a reasoning
strategy’s expected range is (for a person in an environment), we can approximate
the strategy’s real reliability score. We test the strategy on a representative
sample of problems in the expected range. The strategy’s observed
reliability score in that range will approximate the reasoning strategy’s real
reliability score for that person in that environment.
But what about those cases in which there are no generalizations one
can reasonably make about the expected range of problems for a particular
reasoner in an environment? Perhaps the person moves quickly through
relevantly different environments based on whim or unpredictable contingencies.
In such cases, what is our theory to say? To handle these sorts
of cases, we need to introduce the notion of a robustly reliable reasoning
strategy. Intuitively, a robust reasoning strategy is one that is reliable
across a wide range of environments. If there is really no reason to think S
is more likely to face some natural partitions of the rule’s range rather than
others, then the only reasoning strategy that is reliable on S’s expected
range of problems will be a robust reasoning strategy. Let’s turn to this
important notion.
A reasoning strategy is a rule for making judgments on the basis of certain
cues. We can characterize the Goldberg Rule (but not necessarily all reasoning
strategies) in terms of four elements: (a) the cues used to make the
prediction; (b) the formula for combining the cues to make the prediction;
(c) the target of the prediction (i.e., what the prediction is about); and (d)
the range of objects (states, properties, processes, etc.), defined by detectable
cues, about which the rule makes judgments that are thought to
be reliable.
Cues : 4 MMPI personality scales (Pa, Sc, Hy, Pt) and one validity scale (L)
Formula : If [(LюPaюSc)–(HyюPt)] < 45, diagnose patient as neurotic;
otherwise diagnose patient as psychotic
Target : Neurosis or psychosis
Range : All psychiatric patients (assumed to be either psychotic or neurotic)
A reasoning strategy’s real reliability score is its ratio of true to total judgments
in the limit on its expected range of problems. When tested on a set
of 861 patients, the Goldberg Rule had a 70% hit rate; that is, the ratio of
its true predictions to total predictions was .7. So the Goldberg Rule’s observed
reliability score on this particular set of problems was 70%. On the
assumption that this set of problems is representative of the rule’s entire
range of problems, this observed reliability score can be said to approximate
(to a high degree of confidence, given the sample size) the rule’s real
reliability score.
But things are not so simple. Notice an important fact about the real
reliability score of any empirical reasoning rule: It is essentially dependent
on contingent factors. In one environment, the real reliability score of a
reasoning strategy might be high, whereas in another environment, it might
be low. This is the problem of environmental disparity. To see this problem
clearly, consider another example. The academic success prediction rule
(ASPR) works as follows: It makes relative predictions about applicants’
disposition to succeed in college by taking high school rank and aptitude
test score rank, weighing them equally, and then predicting that the best
students will be those with the highest scores. So if Smith’s high school rank
is 87 and her test score rank is 62, Smith gets a 149; if Jones’s scores are 75
and 73 respectively, he gets a 148; and so the ASPR predicts that Smith will
be more academically successful, as measured by GPA and prospects for
graduation, than Jones. We can characterize ASPR as follows:
Cues : High school rank, test score rank
Formula : Target is an increasing function of (hs rankюts rank)
Target : Disposition to succeed academically in college
Range : All high school applicants to U.S. colleges and universities
A rule’s range is just a bunch of objects (states, properties, processes, etc.)
about which the rule allows us to make a judgment (e.g., all U.S. males, all
U.S. children under 10 with reading disorders, NFL football games, etc.). A
typical range can be subdivided into many different natural discriminable
partitions. A natural partition will divide the objects in the range in
terms of properties that could in principle be causally related to the target
property. This restriction is meant to rule out partitions that involve mere-Cambridge properties (e.g., the property of being closer to Des
Moines than to Chicago), grue-ified properties (e.g., the property of being
green before 2010, blue afterward), or other artificial means of carving out
partitions for a range. A discriminable partition of a rule’s range is a partition
based on some feature that can in principle be detected by a reasoner
prior to the rule’s formulation. There are typically going to be many
different ways to divide a rule’s range into discriminable subgroups. For
example, ASPR’s range can be subdivided in terms of many properties of
the applicants (age, geography, quality of high school, etc.). Thus, the requirement
that partitions be discriminable limits the potentially infinite
number of possible partitions of a rule’s range. (One reason to insist on
only discriminable partitions is to avoid objections that might try to partition
a rule’s range into those cases in which the rule gives an accurate
judgment from those in which the rule does not give an accurate judgment.
Permitting such partitions would undermine our view. Rules could
be made to be perfectly reliable if their ranges were to be defined as consisting
of only those cases for which they are accurate. But rules whose
conditions of application cannot be detected cannot be used, and so they
should not play a role in a reason-guiding epistemology.)
The problem of environmental disparity arises when a rule is not consistently
reliable across discriminable partitions of a rule’s range. Suppose
for example that ASPR performs differently when it is applied to native
English speakers and nonnative English speakers. In particular, when applied
to native speakers it has a reliability score of 70%, but when applied
to nonnative speakers it has a reliability score of only 60%. Let’s suppose
that S and S1 have adopted the ASPR for making predictions about the
future academic success of high-school applicants. Even if S and S1 are
disposed to apply the ASPR to the same kinds of problems—to all high
school applicants to U.S. colleges and universities—they might find themselves
in quite different circumstances. Suppose S is a recently hired admissions
officer at a small, prestigious eastern liberal arts college; and S1 is
a recently hired admissions officer at a small community college in a Texas
border town. Because of their systematically different environments, it
is possible that ASPR’s reliability score for S would not be ASPR’s reliability
score for S1. There are many familiar examples of environmental
disparity. For example, concluding that a lake trout is safe to eat
will depend on the lake it comes from and perhaps also on the trout’s
age. Many examples come from strategies involved in interpreting behaviors
across different cultures. While most of us would interpret being spit
on by a priest in a Christian church to be a very bad sign, we have been told that being spit on by a priest in Senegal is a very good sign (one of
purification).
The problem of environmental disparity is troubling for our view
because it makes it hard to figure out just what a rule’s real reliability score
is supposed to be. Is ASPR’s real reliability score 67% because that’s its
score (let’s suppose) on all high school applicants? Or is it different for
different people? We will argue that real reliability scores attach to reasoning
strategies, or more specifically, to an individual’s use of a reasoning
strategy. So we will argue that if S and S1 are in different environments,
their use of ASPR might well have different real reliability scores.
To handle the problem of environmental disparity, let’s introduce the
notion of a reasoning strategy’s expected range for a subject in an environment.
The intuitive notion is straightforward: Given a person’s disposition to apply
a certain reasoning strategy R, there is a certain distribution of problems she
can expect to face, given her environment. How exactly this expected range is
to be defined will depend on the particulars of the case. We can often expect
counterfactual-supporting generalizations to play an important role in defining
the expected range of a reasoning strategy for a subject in an environment.
For example, small, prestigious, eastern liberal arts colleges tend to
attract a certain distribution of students, while small community colleges in
southern border towns tend to attract a different distribution of students.
There is a quite powerful, complicated web of causal connections that
maintains and explains those distributions. Once we know what a reasoning
strategy’s expected range is (for a person in an environment), we can approximate
the strategy’s real reliability score. We test the strategy on a representative
sample of problems in the expected range. The strategy’s observed
reliability score in that range will approximate the reasoning strategy’s real
reliability score for that person in that environment.
But what about those cases in which there are no generalizations one
can reasonably make about the expected range of problems for a particular
reasoner in an environment? Perhaps the person moves quickly through
relevantly different environments based on whim or unpredictable contingencies.
In such cases, what is our theory to say? To handle these sorts
of cases, we need to introduce the notion of a robustly reliable reasoning
strategy. Intuitively, a robust reasoning strategy is one that is reliable
across a wide range of environments. If there is really no reason to think S
is more likely to face some natural partitions of the rule’s range rather than
others, then the only reasoning strategy that is reliable on S’s expected
range of problems will be a robust reasoning strategy. Let’s turn to this
important notion.