Homework 8

Problem 1

In roulette, a bet on a single number has a 1/38 probability of success and pays 35-to-1. That is, if you bet 1 dollar, your net winnings are -1 with probability 37/38 and +35 with probability 1/38. Consider betting on a single number on each of \(n\) spins of a roulette wheel. Let \(\bar{X}_n\) be your average net winnings per bet.

For each of the values \(n = 10\), \(n = 100\), \(n = 1000\):

Compute \(\text{E}(\bar{X}_n)\)
Compute \(\text{SD}(\bar{X}_n)\)
Coding required. Run a simulation to determine if the distribution of \(\bar{X}_n\) is approximately Normal
Coding required. Use simulation to approximate \(\text{P}(\bar{X}_n >0)\), the probability that you come out ahead after \(n\) bets
If \(n=1000\) use the Central Limit Theorem to approximate \(\text{P}(\bar{X}_{1000} >0)\), the probability that you come out ahead after 1000 bets.
The casino wants to determine how many bets on a single number are needed before they have (at least) a 99% probability of making a profit. (Remember, the casino profits if you lose; that is, if \(\bar{X}_n <0\).) Use the Central Limit Theorem to determine the minimum number of bets (keeping in mind that \(n\) must be an integer). You can assume that whatever \(n\) is, it’s large for the CLT to kick in.

Problem 2

The standard measurement of the alcohol content of drinks is alcohol by volume (ABV), which is given as the volume of ethanol as a percent of the total volume of the drink. In a sample of 67 brands of beer, the mean ABV is 4.61 percent and the standard deviation of ABV is 0.754 percent.

First, to get some practice, answer the following questions by computing by hand using only the information provided here.
Coding required. Then, use Python to produce the summaries and analysis needed to answer the questions. Be sure to also include appropriate plots of the beer data.

Compute a 95% confidence interval for the appropriate population mean.
Write a clearly worded sentence reporting your confidence interval in context.
Is 4.5% a plausible value of the parameter? Explain briefly.
One of the brands of beer in the sample is O’Doul’s, a non-alcoholic beer. The ABV for O’Doul’s is 0.4% (it has a bit of alcohol.) Suppose O’Doul’s is removed from the data set. Compute the sample mean ABV of the remaining 66 brands.
The sample SD of ABV of the remaining 66 brands is 0.55 percent. Explain intuitively why this value is smaller than the sample SD of all 67 brands.
Compute the 95% confidence interval based on the sample with O’Doul’s removed. Compare to the original interval, both in terms of center of the CI and its width. Explain briefly.
Based on the interval based on the sample with O’Doul’s removed, is 4.5% a plausible value of the parameter? Explain briefly.
Which of the analyses is more appropriate: with or without O’Doul’s? Explain your reasoning.

Problem 3

Do non-smokers have better lungs than smokers? One measure of lung function is forced expiratory volume (FEV), the amount of air (in liters) an individual can exhale in the first second of forceful breath. Larger values of FEV are indicative of better functioning lungs. In a study, FEV was measured for a group of 654 subjects, along with whether they smoked and other variables like age (years). Note: I’m purposely not yet giving you much information about how the sample was collected.

Coding required. Use Python to summarize the FEV data. Summarize the distribution of FEV both overall and separately for smokers and non-smokers.
Suppose you want to fill in the blanks in the following sentence: 95% of FEV values in the population are between [blank] and [blank]. Assuming this sample is representative, provide reasonable values that fill in the blanks, and explain your reasoning. Hint: it is better to give a correct rough estimate than a precise but incorrect answer, so think before trying to compute anything. (Since you don’t have much information about the sample, you don’t have to clarify yet what “population” is appropriate.)
Using appropriate summary statistics, compute by hand a 95% confidence for the appropriate difference in population means.
Coding required. Use Python to compute the confidence interval and compare to what you computed by hand.
Write a clearly worded sentence interpreting the confidence interval in context.
Assuming this is a representative sample, are we reasonably confident that one group (smokers or non-smokers) tends to have better FEV? Why? Which group?
Coding required. You should have observed something surprising in the previous parts. Investigate some of the other variables in the data set and produce a few appropriate plots or summaries that provide an explanation for the surprising result. Write a clearly worded sentence or two summarizing your explanation.

Problem 4

(I had much longer instructions written, but I think they weren’t helpful. If you have questions about how the applet is working, don’t hesitate to ask.)

You are going to use a simulation applet that randomly generates confidence interals for a population mean to help you understand some ideas. The applet calculates a confidence interval for each set of randomly generated data.

In the box for “Statistic” select means “Means”.
In the box next to “Method” select “t”.
Start with “Number of intervals” equal to 1 at first to see what is happening, but then change it to lots.

Experiment with different simulations; be sure to change

Distribution
Population Mean
Population SD
Sample size
Confidence level

Then write a paragraph or two or some bullet points of your main observations. Based on this applet, what can you say about how confidence intervals work? How does changing each of the inputs affect the confidence intervals and the Results? Include a few screenshots from the applet to support your conclusions.