24  Confidence Intervals

Example 24.1 Consider the general problem of estimating a population mean \(\mu\) based on a random sample of size \(n\). A natural point estimator of \(\mu\) is the sample mean \(\bar{X}\). The central limit theorem says that, if \(n\) is large enough, \(\bar{X}\) has an approximate Normal(\(\mu\), \(\sigma/\sqrt{n}\)) distribution, where \(\sigma\) is the population standard deviation. We’ll use this theory to derive a confidence interval for \(\mu\).

  1. Fill in the blank: for 95% of samples, the sample mean \(\bar{X}\) is within [blank] of the population mean \(\mu\).




  2. Fill in the blank: for 95% of samples, the interval with endpoints \(\bar{X}\pm\) [blank] will contain the population mean \(\mu\).




  3. The previous part suggests a formula for a CI for \(\mu\). However, this is not a practically useful formula; esplain why. Then suggest how we can fix it.




  4. Everything else being equal, how does the sample size affect the confidence interval and its margin of error: the larger the sample size…? Does this make sense?




  5. Everything else being equal, how does the confidence level (e.g. 95%) affect the confidence interval and its margin of error: the larger the confidence level…? Does this make sense?




Confidence level 90% 95% 99% 99.7% 99.99%
Multiple (\(z^*\approx t^*\)) 1.7 2.0 2.6 3.0 4.0

Example 24.2 A frequently used example data set is the Ames housing data set which consists of residential properties sold in Ames, Iowa from 2006 to 2010. For more information about the variables in this data set, refer to the data documentation.

Suppose we wish to use this data set to estimate \(\mu\), the population mean sale price of single family homes.

In the sample, there are 2425 single family homes with a sample mean sale price ($K) of 185 and sample standard deviation 83.

  1. Compute a 95% confidence interval.




  2. Write a clearly worded sentence containing the conclusion of the confidence interval in context.




  3. To what extent can you generalize your results? That is, what is the relevant “population”? What are some issues to consider?




  4. Assuming that this is a random sample from the appropriate population, does the 95% confidence interval you computed in the previous part provide an accurate estimate of the corresponding parameter? That is, is \(\mu\) contained within the CI? Explain.




  5. Assuming that this is a random sample from the appropriate population, are we reasonably confidence that \(\mu\) is greater than 180K? Explain.




  6. What if we wanted 95% confidence, but we wanted the margin of error to be no bigger than 1K. How could we achieve this?




Example 24.3 For each of the following statements, identify if it is a valid interpretation of the confidence interval from Example 24.2. If not, explain why.

  1. We estimate that 95% of single family homes in the population have sale price ($K) between 181 and 188.




  2. We estimate with 95% confidence that the sample mean sale price ($K) of single family homes is between 181 and 188.




  3. If many samples of this size were selected about 95% would produce a CI of (181, 188).




  4. In about 95% of samples of this size, the population mean sale price of single family homes will be between 181 and 188.




  5. If many hypothetical unbiased random samples of this size were selected, and a 95% confidence interval computed based on each sample, about 95% of these confidence intervals would contain the population mean sale price.




Example 24.4 Continuing Example 24.2.

  1. Consider the distribution of sale price. What shape do you expect this distribution to have? Why?




  2. A student says, “The distribution of sale price is clearly skewed to the right. Since it’s not Normal the confidence interval we computed, and the resulting conclusion about the population mean, is not valid.” Do you agree with this critique? Explain.




  3. Considering that the distribution of sale price is skewed to the right, suggest a valid criticism of the confidence interval we computed. Hint: what parameter is it estimating?




Example 24.5 Suppose we wish to estimate population mean GPA of students at each of the 23 CSU campuses. For each campus, you select a random sample of students, collect the GPAs of the students, and then use the sample data to compute a 95% confidence interval. That is you compute 23 separate 95% confidence intervals, based on 23 independent random samples, for each of 23 population means.

  1. First just consider Cal Poly. What is the probability that your confidence interval based on the Cal Poly sample contains the true population mean GPA at Cal Poly?




  2. What is the probability that all 23 of your 95% confidence intervals contain their respective population mean?




  3. How confident are you that all 23 of your 95% confidence intervals contain their respective population mean?




    • When computing multiple confidence intervals, the multiple (\(z^*\) or \(t^*\)) of each confidence interval should be increased to account for the issue of multiple comparisons.
      • For example, with a multiple of 2 in a single confidence interval, we are 95% confident that the interval contains the corresponding parameter.
      • However, if we use a multiple of 2 for each of 10 confidence intervals, we are only 60% confident1 that all of the 10 95% confidence intervals contain their respective parameters.
    • A Bonferroni adjustment, splits the error rate evenly across all intervals. For example, for simultaneous 95% confidence in 10 intervals (i.e. a total error rate of 0.05), use the \(z^*/t^*\) multiple for 99.5% confidence when constructing each interval.
      • A Bonferroni adjustment guarantees simultaneous confidence
      • But it is conservative, and tends to produce wider intervals than necessary
      • There are other methods; for example, the Tukey-Kramer method is often used for all pairwise comparisons of means (e.g., in ANOVA)
  4. Suppose we want simultaneous 95% confidence that all 23 CIs contain their respective mean. Find the \(z^*\) multiple corresponding to a Bonferroni adjustment. How many times wider will each CI be than the unadjusted intervals?




  5. How confident are you that all 23 of your adjusted confidence intervals contain their respective population mean?




Example 24.6 Continuing with the Ames housing data set. Now suppose we want to compare sale price of single family homes with other homes (including condos, townhouses, etc). In particular, suppose we wish to estimate \(\mu_S - \mu_N\), the difference in population mean sale price between single family homes and non-single family homes.

  1. Suggest a point estimator, compute its variance, and suggest a formula for a CI for \(\mu_S - \mu_N\).




  2. The table below summarizes the sample data. Compute a 95% confidence interval.

    count mean std min 25% 50% 75% max
    SingleFamily
    False 505.0 161.511398 60.394023 55.000 120.0 147.4 190.0 392.5
    True 2425.0 184.812041 82.821802 12.789 130.0 165.0 220.0 755.0






  3. Write a clearly worded sentence interpreting the confidence interval from the previous part in context.




  4. Are we reasonably confindent that single family homes tend to have higher sale prices than other homes? Explain.





  1. This number assumes independence of confidence intervals — \(0.95^{10}\approx 0.60\) — but similar ideas apply more generally.↩︎