1. Real reliability scores

К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 

A reasoning strategy is a rule for making judgments on the basis of certain

cues. We can characterize the Goldberg Rule (but not necessarily all reasoning

strategies) in terms of four elements: (a) the cues used to make the

prediction; (b) the formula for combining the cues to make the prediction;

(c) the target of the prediction (i.e., what the prediction is about); and (d)

the range of objects (states, properties, processes, etc.), defined by detectable

cues, about which the rule makes judgments that are thought to

be reliable.

Cues : 4 MMPI personality scales (Pa, Sc, Hy, Pt) and one validity scale (L)

Formula : If [(LюPaюSc)–(HyюPt)] < 45, diagnose patient as neurotic;

otherwise diagnose patient as psychotic

Target : Neurosis or psychosis

Range : All psychiatric patients (assumed to be either psychotic or neurotic)

A reasoning strategy’s real reliability score is its ratio of true to total judgments

in the limit on its expected range of problems. When tested on a set

of 861 patients, the Goldberg Rule had a 70% hit rate; that is, the ratio of

its true predictions to total predictions was .7. So the Goldberg Rule’s observed

reliability score on this particular set of problems was 70%. On the

assumption that this set of problems is representative of the rule’s entire

range of problems, this observed reliability score can be said to approximate

(to a high degree of confidence, given the sample size) the rule’s real

reliability score.

But things are not so simple. Notice an important fact about the real

reliability score of any empirical reasoning rule: It is essentially dependent

on contingent factors. In one environment, the real reliability score of a

reasoning strategy might be high, whereas in another environment, it might

be low. This is the problem of environmental disparity. To see this problem

clearly, consider another example. The academic success prediction rule

(ASPR) works as follows: It makes relative predictions about applicants’

disposition to succeed in college by taking high school rank and aptitude

test score rank, weighing them equally, and then predicting that the best

students will be those with the highest scores. So if Smith’s high school rank

is 87 and her test score rank is 62, Smith gets a 149; if Jones’s scores are 75

and 73 respectively, he gets a 148; and so the ASPR predicts that Smith will

be more academically successful, as measured by GPA and prospects for

graduation, than Jones. We can characterize ASPR as follows:

Cues : High school rank, test score rank

Formula : Target is an increasing function of (hs rankюts rank)

Target : Disposition to succeed academically in college

Range : All high school applicants to U.S. colleges and universities

A rule’s range is just a bunch of objects (states, properties, processes, etc.)

about which the rule allows us to make a judgment (e.g., all U.S. males, all

U.S. children under 10 with reading disorders, NFL football games, etc.). A

typical range can be subdivided into many different natural discriminable

partitions. A natural partition will divide the objects in the range in

terms of properties that could in principle be causally related to the target

property. This restriction is meant to rule out partitions that involve mere-Cambridge properties (e.g., the property of being closer to Des

Moines than to Chicago), grue-ified properties (e.g., the property of being

green before 2010, blue afterward), or other artificial means of carving out

partitions for a range. A discriminable partition of a rule’s range is a partition

based on some feature that can in principle be detected by a reasoner

prior to the rule’s formulation. There are typically going to be many

different ways to divide a rule’s range into discriminable subgroups. For

example, ASPR’s range can be subdivided in terms of many properties of

the applicants (age, geography, quality of high school, etc.). Thus, the requirement

that partitions be discriminable limits the potentially infinite

number of possible partitions of a rule’s range. (One reason to insist on

only discriminable partitions is to avoid objections that might try to partition

a rule’s range into those cases in which the rule gives an accurate

judgment from those in which the rule does not give an accurate judgment.

Permitting such partitions would undermine our view. Rules could

be made to be perfectly reliable if their ranges were to be defined as consisting

of only those cases for which they are accurate. But rules whose

conditions of application cannot be detected cannot be used, and so they

should not play a role in a reason-guiding epistemology.)

The problem of environmental disparity arises when a rule is not consistently

reliable across discriminable partitions of a rule’s range. Suppose

for example that ASPR performs differently when it is applied to native

English speakers and nonnative English speakers. In particular, when applied

to native speakers it has a reliability score of 70%, but when applied

to nonnative speakers it has a reliability score of only 60%. Let’s suppose

that S and S1 have adopted the ASPR for making predictions about the

future academic success of high-school applicants. Even if S and S1 are

disposed to apply the ASPR to the same kinds of problems—to all high

school applicants to U.S. colleges and universities—they might find themselves

in quite different circumstances. Suppose S is a recently hired admissions

officer at a small, prestigious eastern liberal arts college; and S1 is

a recently hired admissions officer at a small community college in a Texas

border town. Because of their systematically different environments, it

is possible that ASPR’s reliability score for S would not be ASPR’s reliability

score for S1. There are many familiar examples of environmental

disparity. For example, concluding that a lake trout is safe to eat

will depend on the lake it comes from and perhaps also on the trout’s

age. Many examples come from strategies involved in interpreting behaviors

across different cultures. While most of us would interpret being spit

on by a priest in a Christian church to be a very bad sign, we have been told that being spit on by a priest in Senegal is a very good sign (one of

purification).

The problem of environmental disparity is troubling for our view

because it makes it hard to figure out just what a rule’s real reliability score

is supposed to be. Is ASPR’s real reliability score 67% because that’s its

score (let’s suppose) on all high school applicants? Or is it different for

different people? We will argue that real reliability scores attach to reasoning

strategies, or more specifically, to an individual’s use of a reasoning

strategy. So we will argue that if S and S1 are in different environments,

their use of ASPR might well have different real reliability scores.

To handle the problem of environmental disparity, let’s introduce the

notion of a reasoning strategy’s expected range for a subject in an environment.

The intuitive notion is straightforward: Given a person’s disposition to apply

a certain reasoning strategy R, there is a certain distribution of problems she

can expect to face, given her environment. How exactly this expected range is

to be defined will depend on the particulars of the case. We can often expect

counterfactual-supporting generalizations to play an important role in defining

the expected range of a reasoning strategy for a subject in an environment.

For example, small, prestigious, eastern liberal arts colleges tend to

attract a certain distribution of students, while small community colleges in

southern border towns tend to attract a different distribution of students.

There is a quite powerful, complicated web of causal connections that

maintains and explains those distributions. Once we know what a reasoning

strategy’s expected range is (for a person in an environment), we can approximate

the strategy’s real reliability score. We test the strategy on a representative

sample of problems in the expected range. The strategy’s observed

reliability score in that range will approximate the reasoning strategy’s real

reliability score for that person in that environment.

But what about those cases in which there are no generalizations one

can reasonably make about the expected range of problems for a particular

reasoner in an environment? Perhaps the person moves quickly through

relevantly different environments based on whim or unpredictable contingencies.

In such cases, what is our theory to say? To handle these sorts

of cases, we need to introduce the notion of a robustly reliable reasoning

strategy. Intuitively, a robust reasoning strategy is one that is reliable

across a wide range of environments. If there is really no reason to think S

is more likely to face some natural partitions of the rule’s range rather than

others, then the only reasoning strategy that is reliable on S’s expected

range of problems will be a robust reasoning strategy. Let’s turn to this

important notion.

A reasoning strategy is a rule for making judgments on the basis of certain

cues. We can characterize the Goldberg Rule (but not necessarily all reasoning

strategies) in terms of four elements: (a) the cues used to make the

prediction; (b) the formula for combining the cues to make the prediction;

(c) the target of the prediction (i.e., what the prediction is about); and (d)

the range of objects (states, properties, processes, etc.), defined by detectable

cues, about which the rule makes judgments that are thought to

be reliable.

Cues : 4 MMPI personality scales (Pa, Sc, Hy, Pt) and one validity scale (L)

Formula : If [(LюPaюSc)–(HyюPt)] < 45, diagnose patient as neurotic;

otherwise diagnose patient as psychotic

Target : Neurosis or psychosis

Range : All psychiatric patients (assumed to be either psychotic or neurotic)

A reasoning strategy’s real reliability score is its ratio of true to total judgments

in the limit on its expected range of problems. When tested on a set

of 861 patients, the Goldberg Rule had a 70% hit rate; that is, the ratio of

its true predictions to total predictions was .7. So the Goldberg Rule’s observed

reliability score on this particular set of problems was 70%. On the

assumption that this set of problems is representative of the rule’s entire

range of problems, this observed reliability score can be said to approximate

(to a high degree of confidence, given the sample size) the rule’s real

reliability score.

But things are not so simple. Notice an important fact about the real

reliability score of any empirical reasoning rule: It is essentially dependent

on contingent factors. In one environment, the real reliability score of a

reasoning strategy might be high, whereas in another environment, it might

be low. This is the problem of environmental disparity. To see this problem

clearly, consider another example. The academic success prediction rule

(ASPR) works as follows: It makes relative predictions about applicants’

disposition to succeed in college by taking high school rank and aptitude

test score rank, weighing them equally, and then predicting that the best

students will be those with the highest scores. So if Smith’s high school rank

is 87 and her test score rank is 62, Smith gets a 149; if Jones’s scores are 75

and 73 respectively, he gets a 148; and so the ASPR predicts that Smith will

be more academically successful, as measured by GPA and prospects for

graduation, than Jones. We can characterize ASPR as follows:

Cues : High school rank, test score rank

Formula : Target is an increasing function of (hs rankюts rank)

Target : Disposition to succeed academically in college

Range : All high school applicants to U.S. colleges and universities

A rule’s range is just a bunch of objects (states, properties, processes, etc.)

about which the rule allows us to make a judgment (e.g., all U.S. males, all

U.S. children under 10 with reading disorders, NFL football games, etc.). A

typical range can be subdivided into many different natural discriminable

partitions. A natural partition will divide the objects in the range in

terms of properties that could in principle be causally related to the target

property. This restriction is meant to rule out partitions that involve mere-Cambridge properties (e.g., the property of being closer to Des

Moines than to Chicago), grue-ified properties (e.g., the property of being

green before 2010, blue afterward), or other artificial means of carving out

partitions for a range. A discriminable partition of a rule’s range is a partition

based on some feature that can in principle be detected by a reasoner

prior to the rule’s formulation. There are typically going to be many

different ways to divide a rule’s range into discriminable subgroups. For

example, ASPR’s range can be subdivided in terms of many properties of

the applicants (age, geography, quality of high school, etc.). Thus, the requirement

that partitions be discriminable limits the potentially infinite

number of possible partitions of a rule’s range. (One reason to insist on

only discriminable partitions is to avoid objections that might try to partition

a rule’s range into those cases in which the rule gives an accurate

judgment from those in which the rule does not give an accurate judgment.

Permitting such partitions would undermine our view. Rules could

be made to be perfectly reliable if their ranges were to be defined as consisting

of only those cases for which they are accurate. But rules whose

conditions of application cannot be detected cannot be used, and so they

should not play a role in a reason-guiding epistemology.)

The problem of environmental disparity arises when a rule is not consistently

reliable across discriminable partitions of a rule’s range. Suppose

for example that ASPR performs differently when it is applied to native

English speakers and nonnative English speakers. In particular, when applied

to native speakers it has a reliability score of 70%, but when applied

to nonnative speakers it has a reliability score of only 60%. Let’s suppose

that S and S1 have adopted the ASPR for making predictions about the

future academic success of high-school applicants. Even if S and S1 are

disposed to apply the ASPR to the same kinds of problems—to all high

school applicants to U.S. colleges and universities—they might find themselves

in quite different circumstances. Suppose S is a recently hired admissions

officer at a small, prestigious eastern liberal arts college; and S1 is

a recently hired admissions officer at a small community college in a Texas

border town. Because of their systematically different environments, it

is possible that ASPR’s reliability score for S would not be ASPR’s reliability

score for S1. There are many familiar examples of environmental

disparity. For example, concluding that a lake trout is safe to eat

will depend on the lake it comes from and perhaps also on the trout’s

age. Many examples come from strategies involved in interpreting behaviors

across different cultures. While most of us would interpret being spit

on by a priest in a Christian church to be a very bad sign, we have been told that being spit on by a priest in Senegal is a very good sign (one of

purification).

The problem of environmental disparity is troubling for our view

because it makes it hard to figure out just what a rule’s real reliability score

is supposed to be. Is ASPR’s real reliability score 67% because that’s its

score (let’s suppose) on all high school applicants? Or is it different for

different people? We will argue that real reliability scores attach to reasoning

strategies, or more specifically, to an individual’s use of a reasoning

strategy. So we will argue that if S and S1 are in different environments,

their use of ASPR might well have different real reliability scores.

To handle the problem of environmental disparity, let’s introduce the

notion of a reasoning strategy’s expected range for a subject in an environment.

The intuitive notion is straightforward: Given a person’s disposition to apply

a certain reasoning strategy R, there is a certain distribution of problems she

can expect to face, given her environment. How exactly this expected range is

to be defined will depend on the particulars of the case. We can often expect

counterfactual-supporting generalizations to play an important role in defining

the expected range of a reasoning strategy for a subject in an environment.

For example, small, prestigious, eastern liberal arts colleges tend to

attract a certain distribution of students, while small community colleges in

southern border towns tend to attract a different distribution of students.

There is a quite powerful, complicated web of causal connections that

maintains and explains those distributions. Once we know what a reasoning

strategy’s expected range is (for a person in an environment), we can approximate

the strategy’s real reliability score. We test the strategy on a representative

sample of problems in the expected range. The strategy’s observed

reliability score in that range will approximate the reasoning strategy’s real

reliability score for that person in that environment.

But what about those cases in which there are no generalizations one

can reasonably make about the expected range of problems for a particular

reasoner in an environment? Perhaps the person moves quickly through

relevantly different environments based on whim or unpredictable contingencies.

In such cases, what is our theory to say? To handle these sorts

of cases, we need to introduce the notion of a robustly reliable reasoning

strategy. Intuitively, a robust reasoning strategy is one that is reliable

across a wide range of environments. If there is really no reason to think S

is more likely to face some natural partitions of the rule’s range rather than

others, then the only reasoning strategy that is reliable on S’s expected

range of problems will be a robust reasoning strategy. Let’s turn to this

important notion.