11 Approximating Probabilities
- A probability can be interepreted as a theoretical long run relative frequency.
- The probability of event \(A\) can be approximated by simulating, according to the assumptions corresponding to the probability measure \(\text{P}\), the random phenomenon a large number of times and computing the relative frequency of \(A\). \[ {\small \text{P}(A) \approx \frac{\text{number of repetitions on which $A$ occurs}}{\text{number of repetitions}}, \quad \text{for a large number of $\text{P}$-repetitions} } \]
- In practice, many repetitions of a simulation are performed on a computer to approximate what happens in the “long run”. However, we often start by carrying out a few repetitions, (1) by hand to help make the process more concrete, or (2) by computer to check that our code is working correctly
- A probability can be approximated by a relative frequency from a large number of simulated repetitions, but there is some simulation margin of error, due to natural variability in the simulation.
- The margin of error when approximating a single probability based on a simulated relative frequency is roughly on the order \(1/\sqrt{N}\), where \(N\) is the number of independently simulated values used to calculate the relative frequency.
- Use a margin of error of \(2/\sqrt{N}\) when approximating many probabilities based on a single simulation.
- Be careful when conditioning, especially when conditioning on values of continuous random variables.
- The larger \(N\) the more precise the estimate, but there is cost to simulating and storing more repetitions in terms of computational time and memory. More importantly, beware a false sense of precision: A precise estimate of a probability under the assumptions of the model is not necessarily a comparably precise estimate of the true probability.
Example 11.1 Roll a fair four-sided die twice and let \(X\) be the sum and \(Y\) the larger of the two rolls.
- Explain how you would use simulation to approximate
- \(\text{P}(X=6)\)
- \(\text{P}(X=6, Y = 4)\)
- \(\text{P}(X=6)\)
- In 10000 repetitions of the simulation, \(X=6\) on 1915 repetitions. Approximate \(\text{P}(X=6)\), including a margin of error.
- Explain how you would use simulation to approximate the distribution of \(X\). What would the margin of error be?
Example 11.2 Recall the three person meeting problem in Example 10.2; let \(W\) be the time the first person waits for the last person.
- Explain in full detail how you could conduct an appropriate simulation using spinners and use the results to approximate the probability \(\text{P}(W < 15)\).
- In 10000 simulated repetitions, \(W<15\) in 1821. Approximate and interpret \(\text{P}(W < 15)\); include a margin of error.
Example 11.3 Consider simulating a randomly selected U.S. adult and determining whether or not the person has a college degree and whether or not they can fix a problem with a car’s engine. Let \(C\) be the event that the selected adult has a college degree, \(E\) be the event that the selected adult can fix an engine, and \(\text{P}\) correspond to randomly selecting an American adult. Suppose that1 \(\text{P}(C) = 0.40\), \(\text{P}(E) = 0.28\), and \(\text{P}(C\cap E) = 0.07\).
In 10000 repetitions of an appropriate simulation
- \(C\) occurs on 3989 repetitions
- \(E\) occurs on 2837 repetitions
- Both \(C\) and \(E\) occur on 680 repetitions.
Use the simulation results to approximate \(\text{P}(E | C)\).
What is the margin of error for your estimate in the previous part?
What is another method for performing the simulation and estimating \(\text{P}(E |C)\) that has a margin of error of 0.01? What is the disadvantage of this method?
Example 11.4 Continuing Example 11.1, roll two fair four-sided dice and let \(X\) be the sum and \(Y\) the larger of the rolls.
- For each of the following, describe how to approximate the probability with a margin of error of 0.01.
- \(\text{P}(X = 6 | Y = 4)\).
- \(\text{P}(Y = 4 | X = 6)\).
- \(\text{P}(X = 6 | Y = 4)\).
- Explain how to approximate the conditional distribution of \(X\) given \(Y=4\).
Example 11.5 Continuing Example 11.5. We want to approximate the conditional probability that \(W<15\) given that the first person arrives at 12:10. Explain in detail how you would conduct a simulation to approximate this probability. (Be careful!)
Values are based on this Pew Research survey.↩︎