27  Some issues with hypothesis testing

Example 27.1 A 2007 paper1 suggests that in coin-tossing there is a particular “dynamical bias” that causes a coin to be slightly more likely to land the same way up as it started. That is, the paper claims that a coin is more likely to land facing Heads up if it starts facing Heads up than if it starts facing Tails up; likewise if it starts facing Tails up. Is this really true? Two students at Berkeley investigated this phenomenon by tossing a coin many, many times.

  1. Define the parameter \(p\) in words and state the null and alternative hypotheses using symbols.




  2. Suppose that out of 10000 flips, 5083 landed the same way they started. Based on this data, can you conclude that a coin is more likely to land the same way it started using a strict “significance level” of \(\alpha=0.05\)? Compute the p-value and state your conclusion at level \(\alpha=0.05\) in context. Note: One goal of the example is to show you one reason why using a strict \(\alpha\) is a bad idea.




  3. Repeat the previous part assuming 5082 out of 10000 flips landed the same way they started.




  4. Compare the two previous parts. Give one reason why you should be wary of using a “significance level” (like 0.05) too strictly.




  5. The students actually tossed the coin 40,000 times!2. Of the 40,000 flips, 20,245 landed the same way they started. Compute the p-value and state a conclusion in context. (Do NOT use a strict \(\alpha\) level.) Which result—20245 out of 40000 or 5083 out of 10000—gives stronger evidence to reject the null hypothesis?




  6. Would you say the results of the real study, in the previous part, are newsworthy? Or important? Or “significant”? Using this problem as an example, explain why “statistically significant” is a poor choice of words. In other words, explain why “statistical significance” is not the same as practical importance.




Example 27.2 (For this exercise, you’ll need to believe it’s possible for a person to have ESP.) “Ganzfeld studies” test psychic ability (like ESP). Ganzfeld studies involve two people, a “sender” and a “receiver”, who are placed in separate acoustically-shielded rooms. The sender looks at a “target” image on a television screen (which may be a static photograph or a short movie segment playing repeatedly) and attempts to transmit with their mind information about the target to the receiver. The receiver is then shown four possible choices of targets, one of which is the correct target and the other three are “decoys.” The receiver must choose the one transmitted by the sender3.

  1. We will now conduct some Ganzfeld tests to see if anyone in the class has ESP.
    • Find a partner. Identify one person as the sender and one as the receiver.
    • The sender rolls a four-sided die, looks at it (without letting the receiver see), and then tries to transmit the number rolled to the mind of the receiver.
    • After receiving the mental image, the receiver identifies the number. Record whether the answer is correct or not.
    • Repeat the above process for a total of 9 trials with the same receiver. The receiver should record the number of trials (out of 9) they got correct.
    • Then switch sender/receiver roles and do it again. Each receiver should record the number of trials (out of 9) that they got correct.




  2. Use your results to conduct a hypothesis test to see if you have evidence that you did better than just guessing (and so you have some kind of ESP!)




  3. Suppose we had set a cut-off of 0.05. Based on this strict cutoff, do you have enough evidence to reject the null hypothesis? That is, based on this cutoff, is your p-value small enough to conclude that you have ESP? (Hint: if you got at least 5 out of 9 correct your answer should be “yes”.)




  4. Consider a student who rejects the null hypothesis. Could their conclusion be wrong? How?




  5. Consider a student who fails to reject the null hypothesis. Could their conclusion be wrong? How? (Again, you’ll need to believe it’s possible for a person to have ESP.)




  6. Using the cutoff of 0.05, did any receiver in our class (of about 30 students) discover evidence that they have ESP? Should we be surprised that this happened?





  1. http://statweb.stanford.edu/~cgates/PERSI/papers/dyn_coin_07.pdf↩︎

  2. https://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html↩︎

  3. Watch this clip from the movie Ghostbusters↩︎