22  Mean Square Error of an Estimator

Example 22.1 Continuing the situation of estimating \(\mu\) for a Poisson(\(\mu\)) distribution based on a random sample \(X_1, \ldots, X_n\) of size \(n\). Consider the estimators \[\begin{align*} S^2 & = \frac{1}{n-1}\sum_{i=1}^n \left(X_i - \bar{X}\right)^2\\ \hat{\sigma}^2 & = \frac{1}{n}\sum_{i=1}^n \left(X_i - \bar{X}\right)^2 = \left(\frac{n-1}{n}\right)S^2 \end{align*}\] We have seen that \(S^2\) is an unbiased estimator of \(\mu\) but \(\hat{\sigma}^2\) is not. Does that mean we should prefer \(S^2\) over \(\hat{\sigma}^2\)?

  1. Assume \(n=3\) and \(\mu=2.3\). Use simulation to approximate the distribution of \(\hat{\sigma}^2\) and its expected value and standard deviation.




  2. Compare the simulation results to those for \(S^2\) (from a previous handout.) For which estimator is the bias smaller? For which estimator is the variance smaller? Is there a clear preference between the two estimators when \(\mu = 2.3\)?




    • Estimators can be evaluated based on both bias and variability.
    • Mean square error is a combined measure of both an estimator’s bias and its variability.
    • The mean square error (MSE) of an estimator \(\hat{\theta}\) of a parameter \(\theta\) is \[ \text{MSE}_\theta(\hat{\theta}) = \textrm{E}\left(\left(\hat{\theta}-\theta\right)^2\right), \qquad \text{a function of $\theta$} \]
    • Mean square error measures, on average, how far the estimator deviates from the parameter it is estimating.
    • The MSE of an estimator is a function of the parameter \(\theta\). It can be shown that1 \[\begin{align*} \text{MSE}_\theta(\hat{\theta}) & = \textrm{Var}(\hat{\theta}) + \left(\textrm{E}(\hat{\theta})-\theta\right)^2\\ & = \textrm{Var}(\hat{\theta}) +\left(\text{bias}_\theta(\hat{\theta})\right)^2 \end{align*}\]
    • There can be many “reasonable” estimators of a parameter. Given two estimators, it’s often the situation that neither one has smaller MSE for all potential values of \(\theta\). Choosing an estimator often involves a tradeoff between bias and variability.
  3. Use the simulation results to compute the MSE of \(S^2\) and \(\hat{\sigma}^2\) when \(\mu = 2.3\). Determine which estimator has smaller MSE when \(\mu=2.3\), but then explain why this information is not very useful.




  4. Recall \[ \textrm{Var}(S^2) = \frac{2\mu^2}{n-1} + \frac{\mu}{n} \] Find and plot the MSE functions of \(S^2\) and \(\hat{\sigma}^2\). Which estimator is preferred? Discuss.




Example 22.2 Continuing estimating \(\mu\) for a Poisson(\(\mu\)) distribution. Consider \(\bar{X}\) and the constant estimator 2.3.

  1. Find the MSE of the constant estimator 2.3.




  2. Find the MSE of \(\bar{X}\).




  3. Suppose \(n=3\). Plot the MSE functions of the two estimators. Does either estimator have a better MSE? Explain.




  4. Donny Don’t says: “neither estimator has smaller MSE for all values of \(\mu\). So we’ll use the constant estimator 2.3 when \(\mu\) is near 2.3 (between 1.57 and 3.36 if \(n=3\)) and we’ll use \(\bar{X}\) for other values of \(\mu\)”. Do you agree with Donny’s strategy? Do you see any problems with it?




  5. What happens to the MSEs as \(n\) increases? In particular, what happens to the range of values of \(\mu\) for which the constant estimator 2.3 has smaller MSE than \(\bar{X}\)?





  1. Recall: By definition \(\textrm{Var}(Y)=\textrm{E}[(Y-\textrm{E}(Y))^2]\) but remember the useful computational formula
    \(\textrm{Var}(Y) = \textrm{E}(Y^2) - (\textrm{E}(Y))^2\). Apply this to \(Y=\hat{\theta}-\theta\), and remember that \(\hat{\theta}\) is random but \(\theta\) is not.↩︎