7. Reliability scores

К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
85 86 87 88 89 90 91 92 93 94 

You define the reliability score of a reasoning strategy as the ratio of true to

total judgments in the strategy’s expected range. But what about cases (like

the frequency formats) in which the strategy makes probabilistic inferences. If

a reasoning strategy says that the probability of E is 1/3 (where E is a single

event), and E happens (or doesn’t happen), we can’t say that on that basis

that that’s a true judgment. So reliability scores seem undefined for these sorts

of reasoning strategies. And that’s a serious lacuna in your theory.

This worry is analogous to the hoary problem facing the frequentist account

of probability of single event probabilities. Because the frequency

interpretation defines ‘‘probability’’ in terms of observed frequency, no

probability of coming up heads (or tails) can be assigned to an unflipped

coin. And, notoriously, the future posture of unflipped coins has no observed

value. Our problem is similar in that we define a reasoning strategy’s

reliability score in terms of the relative frequency of true judgments

in its expected range. If a reasoning strategy leads one to predict that there

is a 1/3 chance of single event E, how do we determine what the probability

of E really is? If we can’t assign a probability to E, then we have no

way of determining how reliable the probabilistic reasoning strategy is.

Our solution to the problem is analogous to how a frequentist might

handle the problem of single event probabilities. A frequentist will not

explain the probability of a single event in terms of an unobserved,

independently specifiable disposition or propensity. Instead, a frequentist

might say that the probability of a single event is an idealization concerning

the observed values yielded under an indefinite (or infinite) number of

samplings or potentially infinite sequence of trials. Turning to the problem

of assigning reliability scores to probabilistic reasoning strategies, we

should note that we define probability scores in terms of a reasoning

strategy’s expected range for a subject in an environment. The expected

range is an idealization based on the nature of the environment in which a

subject finds herself. The reliability score of a reasoning strategy applied to

a single case (whether that strategy yields probability judgments or not) is,

similarly, based on an idealization: It is the ratio of true to total judgments

in the strategy’s expected range, where this range is defined by an indefinite

(or infinite) number of samplings or potentially infinite sequence of trials.

The introduction of an idealized expected range provides a way (or

more likely, a number of ways) to assess the accuracy of a probabilistic

reasoning strategy. Take a probabilistic reasoning strategy, R. Next take all

the propositions R judges to have (say) probability 1/3. In R’s expected

range, we should expect 1/3 of those propositions to be true. So if we have

a perfectly accurate probabilistic reasoning strategy, R, then for all propositions

that R takes to have probability n/m, the frequency of those

propositions that are true in R’s expected range will be n/m. We can

measure R’s accuracy in terms of a correlation coefficient that represents

how closely R’s probability judgments reflect the actual frequencies of

truths in R’s expected range. (Notice, this is just how overconfidence in

subjects was assessed. When we examine those cases in which subjects

assign very high probabilities to events, those events turn out to be true at

much lower frequencies. See chapter 2, section 3.4.)

You define the reliability score of a reasoning strategy as the ratio of true to

total judgments in the strategy’s expected range. But what about cases (like

the frequency formats) in which the strategy makes probabilistic inferences. If

a reasoning strategy says that the probability of E is 1/3 (where E is a single

event), and E happens (or doesn’t happen), we can’t say that on that basis

that that’s a true judgment. So reliability scores seem undefined for these sorts

of reasoning strategies. And that’s a serious lacuna in your theory.

This worry is analogous to the hoary problem facing the frequentist account

of probability of single event probabilities. Because the frequency

interpretation defines ‘‘probability’’ in terms of observed frequency, no

probability of coming up heads (or tails) can be assigned to an unflipped

coin. And, notoriously, the future posture of unflipped coins has no observed

value. Our problem is similar in that we define a reasoning strategy’s

reliability score in terms of the relative frequency of true judgments

in its expected range. If a reasoning strategy leads one to predict that there

is a 1/3 chance of single event E, how do we determine what the probability

of E really is? If we can’t assign a probability to E, then we have no

way of determining how reliable the probabilistic reasoning strategy is.

Our solution to the problem is analogous to how a frequentist might

handle the problem of single event probabilities. A frequentist will not

explain the probability of a single event in terms of an unobserved,

independently specifiable disposition or propensity. Instead, a frequentist

might say that the probability of a single event is an idealization concerning

the observed values yielded under an indefinite (or infinite) number of

samplings or potentially infinite sequence of trials. Turning to the problem

of assigning reliability scores to probabilistic reasoning strategies, we

should note that we define probability scores in terms of a reasoning

strategy’s expected range for a subject in an environment. The expected

range is an idealization based on the nature of the environment in which a

subject finds herself. The reliability score of a reasoning strategy applied to

a single case (whether that strategy yields probability judgments or not) is,

similarly, based on an idealization: It is the ratio of true to total judgments

in the strategy’s expected range, where this range is defined by an indefinite

(or infinite) number of samplings or potentially infinite sequence of trials.

The introduction of an idealized expected range provides a way (or

more likely, a number of ways) to assess the accuracy of a probabilistic

reasoning strategy. Take a probabilistic reasoning strategy, R. Next take all

the propositions R judges to have (say) probability 1/3. In R’s expected

range, we should expect 1/3 of those propositions to be true. So if we have

a perfectly accurate probabilistic reasoning strategy, R, then for all propositions

that R takes to have probability n/m, the frequency of those

propositions that are true in R’s expected range will be n/m. We can

measure R’s accuracy in terms of a correlation coefficient that represents

how closely R’s probability judgments reflect the actual frequencies of

truths in R’s expected range. (Notice, this is just how overconfidence in

subjects was assessed. When we examine those cases in which subjects

assign very high probabilities to events, those events turn out to be true at

much lower frequencies. See chapter 2, section 3.4.)