Also titled, It is Impossible To Seek Evidence that Confirms Your Hypothesis, only Evidence that Tests It, Except for Maybe in a Few Odd Cases, And I Think Not Even Then
(I have a feeling I’m getting the titling capitalization wrong.)
(Caveat Lector: I have stolen examples freely in the below from friends, LessWrong posts, books, and other locations.)
Sometimes Very often, one is not certain whether a particular statement is true or false. To resolve this uncertainty, one can seek evidence. The point of this post is that it is impossible to seek exclusively confirmatory or exclusively disconfirmatory evidence.
To put this another way, it is impossible to seek out confirmatory evidence without at the same time risking disconfirmatory evidence. To put this yet another way, it is (at least almost always, and probably always) only possible to seek to test your hypothesis—to seek evidence which will move you to think either that a hypothesis is more likely or less likely to be true. You cannot seek evidence that will move you to think that a hypothesis is more likely to be true, without running the risk of finding that your hypothesis is less likely; and you cannot do the reverse either.
This post will also show that any procedure that promises to make you more confident in a hypothesis, without the accompanying risk of making you less confident in your hypothesis, cannot actually provide evidence.
There’s a mathematical way of saying this, which is superior to the non-mathematical way. But I’m going to attempt to describe this first with words, rather than with math. After I’ve tried to describe it non-mathematically, I’ll move on to the math, which will, at that point, hopefully make sense.
1. First Examples
Imagine that a doctor thinks that a particular chemical substance is likely to cure cancer.
So he decides to set up an experiment to test this. He would like to have definitive evidence, after the experiment is over, that it cures cancer. So he sets up the experiment very, very carefully: he draws from several different demographics when assembling his control and experimental groups; he tries to ensure that each group is large enough to detect even small differences in outcome with p < 0.01 significance; he is really careful to try to anticipate and avoid potential confounders; he does all the double-blinding that every good scientist would do; he probably does all sorts of other things that I can’t even name because I don’t know enough about experimental design. He wants the evidence that pops out of this experiment to be weighty.
And he goes through all this, performs the experiments, waits five years for a bunch of data to come in, and eventually finds that he can’t reject the null hypothesis—the drug probably does nothing for cancer. And all the careful design of his experimental procedure now only means that this is a more crushing refutation.
Note that the only possible way he could open himself up to evidence in favour of the hoped-for outcome was by opening himself up to evidence against the hoped-for outcome. If he had screwed up the experimental design, so that it was more likely to show the experimental group getting better than to show the control group getting better, apart from the efficacy of the drug—well then, obviously the inertness of the drug could not have been indicated by the experiment, but equally obviously the experiment would be useless as evidence showing that the drug was effective. In short—because finding evidence for his hypothesis necessarily involves the possibility of finding evidence against it, you ultimately would have to characterize running the experiment as an attempt to test his hypothesis, not an attempt to confirm it, even if that is what he wanted.
Here is another example.
Imagine that someone creates a new aluminum-steel alloy. They want to show the world that it is, per unit weight, stronger than titanium.
Well, in this case the only thing they can do is set up an experimental rig that smashes into it and see if it breaks more slowly and rebounds more effectively than titanium. And of course, in doing this, they also run the risk of finding that it actually breaks faster and rebounds less effectively than titanium. To show that it is stronger—as they hope to do—they also necessarily run the risk of showing that it is weaker.
Again, you can’t seek to confirm that what you wish to believe is true; you can only test it. The reason for this is that, if the test that you perform is not causally entangled with the thing that you wish to examine, then it cannot provide evidence for your favoured hypothesis about the examined thing: if the test apparatus shows that the steel-aluminum alloy does not break, regardless of whether it is actually strongest, then it cannot provide evidence for the alloy being the strongest. But if the test must be causally entangled with the thing that you wish to examine, then it also necessarily involves the risk of providing evidence contrary to your hypothesis: it risks showing that the evidence does not turn out as your favoured hypothesis predicts that it would.
Well, someone could object—maybe this is true of the hard, experimental sciences. What about something else, such as history?
It works the same in history. Suppose I’m interested in knowing whether Roger Bacon anticipated some optical notion heretofore attributed solely to Christiaan Huygens; I think it would be awesome if he did. So I decide that I’m going to go through every text attributed to Bacon, as well as every questionable text and every oblique reference I can find to him and his works in the corpus of 13th century literature. 25 years, 20 scholarly articles, and 1 dissertation later, I complete my survey—and find, to my dissatisfaction, that Bacon did not anticipate Huygens. This is contrary to my wishes, but now far more definitively proven than it would have been if I did nothing. Again, the opportunity of finding evidence in favour of a hypothesis must be balanced by the opportunity of finding evidence contrary to it.
Ok, well, you could say—maybe this is true of academic things. What about people’s personal lives?
Same here. Suppose you would like to find if someone is trustworthy. If that’s the case, then you’ll probably have to trust them with something—whether big or small. And doing this involves running the risk of having them betray you, and finding that they are not trustworthy.
I could go on with examples forever, but I hope my initial point is clear. You cannot seek confirmatory evidence for a hypothesis; you can seek merely to test a hypothesis.
2. An objection, turned into an advantage
Here’s an objection I suspect some people are thinking about: It really feels like you can seek confirmation of a hypothesis. Let’s look at a case when it feels like you’re successfully seeking confirmation of a hypothesis.
Suppose I think that the Earl of Oxford was Shakespeare. I cannot really give any arguments for it, though, if you ask me for arguments. So I go to the library, check out a dozen Oxfordian books on Shakespeare, and read them all. I also browse through a bunch of Oxfordian websites online, and look through all of their arguments. I also look through their dissection of all the anti-Oxfordian arguments, just for completeness’ sake. So now I’m able to give a dozen different arguments and pieces of evidence in support of the Oxfordian cause, whereas before I wasn’t able to; I was in a prior state of ignorance, and deliberately sought confirmatory evidence that allowed me to be in the posterior state. So it seem to me pretty clear that I was able to seek and to find confirmatory evidence in favor of my hypothesis.
And this is a fairly universal experience. I can start by thinking that some proposition is true, and by wishing that I had more evidence that it was true. To find such evidence, I can read books by other people who think that this proposition is true. And then I find that I apparently have attained confirmatory evidence in favour of my hypothesis, which I have sought. So it seems false to say that I cannot seek out confirmatory evidence for a hypothesis.
I’m not going to address this argument directly, at first. Let me give an analogy.
Suppose you’re interested in whether a particular die is loaded or fair. If it is loaded, it is loaded so that the higher triplet of numbers (4,5,6) comes up far more frequently than the lower triplet (1,2,3) of numbers. If it is fair, then members of each triplet will come up with roughly equal frequency. A good way to test this would be to roll the die 100 or 1000 times, and count how many times it comes up high and how many times it comes up low.
We’re not really interested in doing this tedious experimental work ourselves, though, so we ask a friend to do it and record his results on each throw.
After the experiment, we ask him how it went. And he says “Well, it came up high on the 3rd, 4th, 10th-13th, 16th, [insert long sting of text], 89th, 91st, 95th-98th, and 99th rolls.”
We say “Um, so you mean it came up low on the other rolls?”
He says, “Oh, no, I don’t mean to imply that. It is conceivable that it could have come up low or high on them. I’m just not reporting them, right now.”
And we ask to ourselves, “Hmm… so is this evidence in favour of the die being loaded? All of the instances that he reported were instances which were high. The particular series that he gave us is the kind of series you would expect if the die were loaded. But on the other hand, if he has resolved only to report rolls where it came up high, that doesn’t count as evidence pro at all—he’s just tilting his reporting of the experiment so it seems that it came out in favour die being loaded.”
So we ask him, “Had you resolved to report only on those aforementioned throws before you actually made the throws?”
There are two ways he could respond at this point.
Suppose he says “Yeah, before I rolled the die, I decided only to report the 3rd, the 4th, the 10th-13th, [etc], rolls. And then I stuck to that resolution after I rolled the die.” If he is telling the truth, then we actually do have evidence that the die is loaded—it’s as if he had performed exactly the experiment we wanted to perform, merely with a smaller number of rolls. If the die were not loaded, we would have expected this sample to indicate that the die were not loaded; we were open to refutation as well as confirmation, and so we have found that we in fact genuine evidence in favor of the loading.
On the other hand, suppose he says “Nah, I decided just to report on those throws after I had made them,” and then winks. If this is the case, then we have a very strong suspicions that he is only reporting on throws which turned up with members of the higher triplet. And if he’s filtering the evidence in this way, then we obviously do not have real evidence in favour of the dice being loaded. His resolution to only report evidence that seems to favour the point that the dice is loaded makes the so-called evidence not evidence at all.
So in this case, our friend can only offer real evidence to us if he has resolved to offer all the evidence he finds, whether it supports or does not support the hypothesis that the die is loaded. If he is deliberately filtering out evidence in accord with whether he likes it or not, then he isn’t offering evidence: he’s offering a simulacrum of evidence, which is designed to produce assent but which has nothing to do with the truth of the matter.
Let’s turn to books.
Suppose that a Scientologist is uneasy in his Scientology. So he reads a bunch of books on Scientology by Scientologists, and in the books he finds all sorts of bits of evidence in favour of Scientology. He finds many stories about people who went through their auditing treatment, and whose lives then improved in amazing ways; these stories are evidence that auditing works, he thinks, and if auditing works, well, this is evidence for the historical stories about Thetans that the practice of auditing is based on. He also finds stories about how people tried to dig up scandals about Scientologist leadership, and about how these attempts were thwarted and turned out to be based on lies; and if the arguments against Scientology turn out to be based on lies, he thinks, then surely these arguments are not formidable. And he also finds really insightful and interesting things that L. Ron Hubbard said about human nature, which really help him explain himself to himself and navigate the world better; and if some of the things that L. Ron Hubbard seemed really enlightening, he thinks, then maybe he should be more trusting when L. Ron Hubbard says something counterintuitive. All the things that the books says might be true as well—he could verify them all through third sources. And so he seeks bits of information and evidence for Scientology; and so he keeps himself in his delusion.
It’s pretty obvious that the author of the book on Scientology is acting like our deceptive friend in second imagined scenario above. Just as the friend reports only evidence which seems to suggest that the die is loaded, the author of the book is only reporting evidence which suggests that Scientology is true: each is only reporting evidence that appears to favor a particular hypothesis.
And, I hope it is now evident, if the author is doing that, then the evidence that they are reporting simply should not count as evidence, just as our friend’s supposed evidence simply should not count as evidence. You can find pieces of evidence to support any hypothesis you please—the world is very large. And this means that, when arguing for a point, everyone has the opportunity to filter and select like the Scientologist author or the die-rolling friend. But if the die-rolling friend is not providing actual evidence, because he filters—and he clearly isn’t—then neither should the scientologist author or any author who reports only on positive evidence be seen as actually providing evidence. That they’re acting as this kind of filter rules out the possibility that they are providing evidence: instead, they’re providing what I see as a very dangerous, rhetorical simulacrum of evidence.
Of course, this simulacrum of evidence can be improved; both our friend and the scientologist could really work on presentation. Our friend above could include a few cases when the die roll turned out low—that would make it look more like he was being honest. Similarly, the Scientologist author could include a few cases when Scientology was abusive and deceptive, and admit that some unfortunate abuses have occurred—and thereby gain a greater appearance of honesty.
Our friend could even be really devious—he could include all the die rolls, but then mention that his die-rolling technique was bad in certain, particular cases—“Really, I wasn’t rolling the dice so much as setting it down; I wasn’t following the ISO-312 Die-Rolling Manual; and there might have been a magnet near these die in the case of these rolls”—and then, having performed unequal scrutiny on different die-rolls, show how miraculously the really good rolls were only those that turned up higher. The scientologist author could do something similar, and provide a few of the genuine arguments that people make against scientology—but then hold them to much higher standards of evidence than he holds other arguments, or present them in rhetorically unsatisfiying manners.
But doing all this would merely window-dressing which is designed to make the evidence-substitute look like real evidence, as the two examples should make clear.
What one is doing, then, in the case of the Oxfordians should be evident. If the authors that you read are acting like the scientologist author, then what they give is a simulacrum of evidence. If, on the other hand, they present the best evidence in both directions, the most deadly attacks on their own position, as earnestly and rigorously as they present those for the other position; if they have spent equal time searching for flaws in their own arguments and for the flaws in the arguments of their opponents; if they have tried to see the world through the eyes of their opponents, and tried to examine how their supposed evidence would seem in such a case, and how their opponents might effortlessly explain it; if they have done all this, then there is a small chance they are presenting real evidence. Even after reading authors like these, it is still best to read counter-arguing authors, as everyone knows—it’s hard to present evidence for a position you disagree with fairly, even if you’ve resolved to try to do so.
And if the author you read did all this, and if you followed this writing pattern, then reading them might not convince you of the Oxfordian hypothesis—it might convince you of the opposite. So once again, it seems that if one wishes to deal with real evidence in favour of a hypothesis, you cannot ensure that you will not also encounter real evidence against it. You cannot search for confirmatory evidence; you can only search for a test.
3. More objections, which are hopefully fairly stated.
I just laid out a really, really high standard for writers. Let me attempt to meet it.
For a while I wondered whether what I have said applies to the domain of mathematics. The line between valid and invalid arguments seems to be sharper in mathematics than it is everywhere else. A valid mathematical proof, baring you insanity or a mistake made within the proof, seems like fairly absolute evidence for what the proof purports to be a proof of. So, supposing that I sought a proof for a mathematical theorem and found it, it seems like I’ve sought confirmatory evidence without running the risk of finding disconfirmatory evidence.
I think ultimately this is wrong. I could have made a mistake in my proof—this happens all the time. Alternately, while searching for a proof, I could stumble across a reason for thinking that the theorem is false—this kind of thing also happens. And if I search for a proof for a while, and fail to find one, then that fact itself could be evidence against the existence of a proof. So searching for a proof in math does not, as far as I can tell, provide a chance for confirmatory evidence without disconfirmatory evidence.
Suppose, though, that we grant that you can do this in math. I’m ok with that. Mathematicians generally aren’t involved, qua mathematicians, in the kind of disputes and cases of tendentious reasoning that the rest of the world is involved in—or at least that’s what it looks like from the outside. And math is pretty far removed from the rest of a lot of things. So I wouldn’t really mind including an ad hoc exception for mathematics, although I think it is unnecessary.
One could try to extend this (probably unnecessary) ad hoc to things like philosophy, but that would pretty clearly be unneeded. If we are to have an ad-hoc addition to the theory for mathematics, it would only be because it is (nearly) impossible for a conscientious individual to offer unsound arguments in mathematics. (Again, I think this is pretty clearly false.) The notion of proof, of what constitutes acceptable premises, and of what constitutes acceptable work—all of these are far more hazy in philosophy than in mathematics. So the ad-hoc extension does not work, as far as I can tell.
Another counter-argument you can give to this standard is that it would make everything that people write interminably long. I concede that this counter-argument is the case. But sometimes if you are interested in the truth you have to read interminably long things. That’s just how life is.
4. Another counter argument, which isn’t actually a counter-argument at all
Here’s another counter-argument.
Suppose that Kimiko, a high-school student, thinks her boyfriend Bob is cheating on her with Tracey, even though Bob claims that he never even talks with her. Naturally, as the trusting soul she is, Kimiko decides to a install a backdoor in Bob’s cellphone, so she can remotely monitor whether he receives or sends any compromising texts or calls. She installs it early one morning at school, and she checks the backdoor fours hours later at lunchtime. There are two possible results she could find after checking: Bob might have sent / received a message or call, or he might not have sent / received a message or call.
If Bob has received or sent a sexy message, this serves as extremely strong evidence that Bob is cheating: if Bob gets a message from Tracey saying “Beast w. 2 backs 2nite!” then Bob is very probably cheating. On the other hand, if Bob has not received or sent such a message, this is only very weak evidence that Bob is not cheating: even if he is cheating, he might not communicate with her every four hours. Both the world where Bob is cheating and the world where Bob is not cheating would very likely lead to a situation where Bob does not receive any message from Tracey—and for this reason, his not receiving a message seems like only weak evidence that he is not cheating. On the other hand, the world where Bob is cheating is far more likely than the world where Bob is not cheating to involve such a message, so receiving such a message is very strong evidence he is cheating.
So it seems as if Kimiko can seek evidence confirming that Bob is cheating, at least inasmuch as the strength of the possible evidence confirming cheating is different from the strength of possible evidence disconfirming cheating.
The problem is that I actually agree with everything stated above—it’s actually quite vital to my point, and not a counter-argument at all. However, it still would be wrong to characterize Kimiko’s actions as an attempt to confirm cheating, for the following reason.
No messages from Tracey would likely happen in both worlds where Bob is cheating and worlds where Bob is not. So we actually have a relatively low expectation of seeing any message, although a message would be relatively strong evidence. Similarly, because a message from Tracey would almost certainly only happen in a world where Bob is cheating, and perhaps not even in that, we have a strong expectation of seeing no message, which serves as weak evidence that Bob is not cheating. So even though possible evidence is strong in one direction and weak in the other, this is balanced by having a strong expectation of the weak evidence and a weak expectation of the strong evidence.
Let me give another example to try to illustrate the idea.
Suppose I’m going through Roger Bacon’s letters, over the course of going over the entirety of his work. It might be that many of his letters deal mostly, although not exclusively, with matters of a personal nature. So I have very little expectation that I’ll run into any theories previously attributed solely to Huygens in these letters—although, if I run into an apparent instance of such a theory, I’ll have to revise my certainty about Bacon’s opinions drastically. So I have an (extremely) low expectation of (extremely) strong confirmatory evidence while reading these letters. On the other hand, if I don’t run into any such theory, I’ll only revise downwards very slightly the probability that Bacon anticipated such a discovery, because this is not the first place I would expect such a discovery to be written down. So I have an (extremely) high expectation of (extremely) weak disconfirmatory evidence.
One way to say this is that, on average, your anticipated future opinion after running into evidence must be, on average, the same as your current opinion. If you anticipate really strong evidence that might move your opinion a great deal in one direction, but cannot see any way for there to be really strong evidence that moves your opinion in another direction, then you must have only a very weak expectation of very strong evidence in one direction and a very strong expectation of very weak evidence in the other direction. This is why you cannot seek confirmatory evidence, then—your anticipated future average opinion must be the same as your current opinion.
Let me give one more example.
Suppose you are a Catholic, and you hear about some supposed miracle. The Catholic Church tends not to actually endorse miracles; it tends to be reluctant to say “This is miraculous and a sign from God.” Furthermore, you know that there are many fraudulent miracles in all religions; the Catholic Church is no exception. So you think that this particular miracle is very likely a fraud as well, but investigate it nevertheless.
Now, if you investigate extremely carefully, and find no conceivable scientific explanation for the phenomena, then you have pretty good evidence for Catholicism. But you did not expect to find such a miracle when you started to investigate the miracle; you thought it was probably a fraud. So you had a weak expectation of strong evidence, starting off. If, on the other hand, you were to find that the miracle was a fraud; well, that’s what you expected anyhow. You had a very strong expectation of this extremely weak evidence that Catholicism is false, although this evidence is so weak it is probably not even worth keeping track of mentally. (To see that it is nevertheless contrary evidence, imagine what it would be like to investigate a thousand such supposed Catholic miracles and to find every single one of them to be frauds. The thousand is composed of ones; each of the thousand shifts probability by a calculable amount, given a particular amount by which the complete thousand shifts probability.)
5. You’ll skip this part
Now, there’s actually a mathematical way of saying, and proving, what I’ve been trying to say above. I was originally going to go through this really slowly, but then I realized that (a) everyone who doesn’t like math would probably skip this section anyhow and (b) everyone who does like math would prefer that I just give the proof. So let me give the proof.
In the below, “H” stands for the hypothesis under consideration and “E” stands for yet-to-be-found evidence supporting “H.”
2. P(n) = P(n,m) + P(n,~m)
3. [From 1 and 2]: P(H) = P(H,E) + P(H,~E)
4. P(n,m) = P(n|m)P(m)
5. [From 3 and 4]: P(H) = P(H|E)P(E) + P(H|~E)P(~E)
This formula [P(H) = P(H|E)P(E) + P(H|~E)P(~E)] says most precisely what I am trying to say above.
It means that a weak expectation [low P(E)] of strong evidence in favor of the hypothesis [higher positive P(H|E) - P(H)] must be accompanied by a strong expectation [high P(~E)] of weak evidence contrary to the hypothesis [slightly negative P(H|~E) - P(H)]. This is like the case with Kimiko and Bob.
It also means that a medium-strength expectation [middling P(E)] of strong evidence in favor of the hypothesis [higher positive P(H|E) - P(H)] must be matched by a medium-strength expectation [middling P(~E)] of strong evidence against the hypothesis [lower and negative P(H|~E) - P(H)]. This is like the case with the cancer-curing drug.
Why should anyone care about the above?
Well, I think it is important to show that you cannot actually seek confirmatory evidence in favor of a hypothesis. You can seek to convince yourself that a particular hypothesis is true—but what you seek in so trying to convince yourself is not evidence, but a simulacrum of evidence. That seems important, because it can help one distinguish, in one’s own actions, between when one is performing an exercise in self-directed rhetoric and when one is attempting to find the truth.
In my experience, making this distinction is difficult to do; I still wonder all the time which one I’m doing.