7. Reliability scores
К оглавлению1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94
You define the reliability score of a reasoning strategy as the ratio of true to
total judgments in the strategy’s expected range. But what about cases (like
the frequency formats) in which the strategy makes probabilistic inferences. If
a reasoning strategy says that the probability of E is 1/3 (where E is a single
event), and E happens (or doesn’t happen), we can’t say that on that basis
that that’s a true judgment. So reliability scores seem undefined for these sorts
of reasoning strategies. And that’s a serious lacuna in your theory.
This worry is analogous to the hoary problem facing the frequentist account
of probability of single event probabilities. Because the frequency
interpretation defines ‘‘probability’’ in terms of observed frequency, no
probability of coming up heads (or tails) can be assigned to an unflipped
coin. And, notoriously, the future posture of unflipped coins has no observed
value. Our problem is similar in that we define a reasoning strategy’s
reliability score in terms of the relative frequency of true judgments
in its expected range. If a reasoning strategy leads one to predict that there
is a 1/3 chance of single event E, how do we determine what the probability
of E really is? If we can’t assign a probability to E, then we have no
way of determining how reliable the probabilistic reasoning strategy is.
Our solution to the problem is analogous to how a frequentist might
handle the problem of single event probabilities. A frequentist will not
explain the probability of a single event in terms of an unobserved,
independently specifiable disposition or propensity. Instead, a frequentist
might say that the probability of a single event is an idealization concerning
the observed values yielded under an indefinite (or infinite) number of
samplings or potentially infinite sequence of trials. Turning to the problem
of assigning reliability scores to probabilistic reasoning strategies, we
should note that we define probability scores in terms of a reasoning
strategy’s expected range for a subject in an environment. The expected
range is an idealization based on the nature of the environment in which a
subject finds herself. The reliability score of a reasoning strategy applied to
a single case (whether that strategy yields probability judgments or not) is,
similarly, based on an idealization: It is the ratio of true to total judgments
in the strategy’s expected range, where this range is defined by an indefinite
(or infinite) number of samplings or potentially infinite sequence of trials.
The introduction of an idealized expected range provides a way (or
more likely, a number of ways) to assess the accuracy of a probabilistic
reasoning strategy. Take a probabilistic reasoning strategy, R. Next take all
the propositions R judges to have (say) probability 1/3. In R’s expected
range, we should expect 1/3 of those propositions to be true. So if we have
a perfectly accurate probabilistic reasoning strategy, R, then for all propositions
that R takes to have probability n/m, the frequency of those
propositions that are true in R’s expected range will be n/m. We can
measure R’s accuracy in terms of a correlation coefficient that represents
how closely R’s probability judgments reflect the actual frequencies of
truths in R’s expected range. (Notice, this is just how overconfidence in
subjects was assessed. When we examine those cases in which subjects
assign very high probabilities to events, those events turn out to be true at
much lower frequencies. See chapter 2, section 3.4.)
You define the reliability score of a reasoning strategy as the ratio of true to
total judgments in the strategy’s expected range. But what about cases (like
the frequency formats) in which the strategy makes probabilistic inferences. If
a reasoning strategy says that the probability of E is 1/3 (where E is a single
event), and E happens (or doesn’t happen), we can’t say that on that basis
that that’s a true judgment. So reliability scores seem undefined for these sorts
of reasoning strategies. And that’s a serious lacuna in your theory.
This worry is analogous to the hoary problem facing the frequentist account
of probability of single event probabilities. Because the frequency
interpretation defines ‘‘probability’’ in terms of observed frequency, no
probability of coming up heads (or tails) can be assigned to an unflipped
coin. And, notoriously, the future posture of unflipped coins has no observed
value. Our problem is similar in that we define a reasoning strategy’s
reliability score in terms of the relative frequency of true judgments
in its expected range. If a reasoning strategy leads one to predict that there
is a 1/3 chance of single event E, how do we determine what the probability
of E really is? If we can’t assign a probability to E, then we have no
way of determining how reliable the probabilistic reasoning strategy is.
Our solution to the problem is analogous to how a frequentist might
handle the problem of single event probabilities. A frequentist will not
explain the probability of a single event in terms of an unobserved,
independently specifiable disposition or propensity. Instead, a frequentist
might say that the probability of a single event is an idealization concerning
the observed values yielded under an indefinite (or infinite) number of
samplings or potentially infinite sequence of trials. Turning to the problem
of assigning reliability scores to probabilistic reasoning strategies, we
should note that we define probability scores in terms of a reasoning
strategy’s expected range for a subject in an environment. The expected
range is an idealization based on the nature of the environment in which a
subject finds herself. The reliability score of a reasoning strategy applied to
a single case (whether that strategy yields probability judgments or not) is,
similarly, based on an idealization: It is the ratio of true to total judgments
in the strategy’s expected range, where this range is defined by an indefinite
(or infinite) number of samplings or potentially infinite sequence of trials.
The introduction of an idealized expected range provides a way (or
more likely, a number of ways) to assess the accuracy of a probabilistic
reasoning strategy. Take a probabilistic reasoning strategy, R. Next take all
the propositions R judges to have (say) probability 1/3. In R’s expected
range, we should expect 1/3 of those propositions to be true. So if we have
a perfectly accurate probabilistic reasoning strategy, R, then for all propositions
that R takes to have probability n/m, the frequency of those
propositions that are true in R’s expected range will be n/m. We can
measure R’s accuracy in terms of a correlation coefficient that represents
how closely R’s probability judgments reflect the actual frequencies of
truths in R’s expected range. (Notice, this is just how overconfidence in
subjects was assessed. When we examine those cases in which subjects
assign very high probabilities to events, those events turn out to be true at
much lower frequencies. See chapter 2, section 3.4.)