Chi-Square Goodness of Fit

Suppose you go to the local casino to play a game based on a single roll of a six-sided die. After playing for a while you begin to suspect that the casino’s dice might not be fair, but how can you be sure? The casino is unlikely to let you take the dice away for physical testing, so you will have to rely on what you can observe in the casino. If the die is fair, you would expect that each outcome has a one in six chance of occurring, so if the die were rolled 60 times, you would expect to see each side about ten times.

What we expect to see (also called our model) gives us the expected values that we want to test our observations (our sample) against. The hypothesis test we perform to check how unusual our observations are is called a Chi-Square Goodness-of-Fit test.

Chi-Square (\chi^{2}) Goodness-of-Fit test

A Chi-Square (\chi^{2}) Goodness-of-Fit test is a hypothesis test in which we test if our observations "fit" the model of expected outcomes. To achieve this we compute a \chi^{2} test-statistic and then determine if this test statistic is extreme enough for us to Reject H0. Both the Critical Value method and the p-value method are appropriate forms of decision statistics, whereas the Confidence Interval method can be a little bit tricky to calculate and understand due to the asymmetric nature of the Chi-Square distribution. The \chi^{2} test statistic is given by:

\chi^{2} = \sum \dfrac{\left(Obs - Exp\right)^{2}}{Exp}

In words this formula says:

1. Take each observation and subtract the expected value Obs - Exp
2. Square that result \left(Obs - Exp\right)^{2}
3. Divide the squared difference by the expected value \frac{\left(Obs - Exp\right)^{2}}{Exp}
4. Add up all of the values from Step 3 \sum \frac{\left(Obs - Exp\right)^{2}}{Exp}
5. The sum obtained in Step 4 is our \chi^{2} statistic \chi^{2} = \sum \frac{\left(Obs - Exp\right)^{2}}{Exp}

The expected value (Exp) is determined by the model we are testing, as specified by the Null Hypothesis. The model also specifies the categories that we use when collecting our sample data (Obs). A typical Null Hypothesis for a Chi-Square (\chi^{2}) Goodness-of-Fit test states something like "The population distribution of \ldots follows the model \ldots", where the model is typically spelled out in terms of the proportion of observations expected in each category.

A Chi-Square (\chi^{2}) Goodness-of-Fit test has the following assumptions that need to be checked (included is how to check them):

  1. Representative: The samples collected should be representative of the population of interest
    Check the Research Design to see if the sampling method suggests that the samples should be representative of the population of interest
  2. Independence: The samples should be independent of one another
    Check the Research Design to see if the sampling method suggests that the samples should be independent of one another
  3. Counted Data: The data must be counts of observations of each category
    Check that the data collected is a frequency of observations in each category
  4. Expected Frequency: The minimum expected value for each cell must be at least 5
    Compute the expected values for each cell and confirm that all are at least 5

The degrees of freedom for the test is simply the number of categories minus one, so in our dice rolling example we would have df = 5. Now that we have all the components, let’s work through an example.

Example: Is the casino die fair?

While attending an academic conference at the local casino, you observe (and record) the outcomes of 1200 rolls of a die for a popular gambling game. You notice that the casino seems to be making a lot of money and wonder if the die they are using is perhaps unfairly advantaging the casino. You decide to conduct a hypothesis test using the data you have recorded (shown below).

Outcome 1 2 3 4 5 6
Count 207 189 224 193 215 172

Step 1: State the hypotheses, significance level (\alpha), any assumptions/definitions

H0: The population distribution of outcomes for the casino die is uniform
(i.e. Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = \frac{1}{6} )

HA: The population distribution of outcomes for the casino die is not uniform
(i.e. at least one of \left\{Pr(1), Pr(2), Pr(3), Pr(4), Pr(5), Pr(6)\right\} \neq \frac{1}{6} )

To test, using a significance level (\alpha) of 0.05, we will perform a Chi-Square (\chi^{2}) Goodness-of-Fit test.

Assumptions and Definitions:

  1. Representative: We assume that the samples collected are representative of the population of interest
  2. Independence: We assume that the samples collected are independent of one another
  3. Counted Data: The data collected are counts of observations of each outcome
  4. Expected Frequency: The expected value for each outcome is 200

Since the necessary conditions for inference are met, we can proceed with our test.

Step 2: Based upon the sample data, compute an appropriate decision statistic

For a Chi-Square (\chi^{2}) Goodness-of-Fit test our test statistic is:

\chi^{2} = \sum \frac{\left(Obs - Exp\right)^{2}}{Exp}

on 6-1=5 degrees of freedom. Using the Chi-Square Critical Value tables we can see that the Critical Value for this test at the 5% level of significance is 11.070. The figure below shows the rejection region for this test.

Inserting the observed and expected values we find:

\chi^{2} = \frac{\left(207 - 200\right)^{2}}{200} + \frac{\left(189 - 200\right)^{2}}{200} + \frac{\left(224 - 200\right)^{2}}{200} + \frac{\left(193 - 200\right)^{2}}{200} + \frac{\left(215 - 200\right)^{2}}{200} + \frac{\left(172 - 200\right)^{2}}{200}
= 9.02

Indicating this value on our picture, we see:

Using the Chi-Squared Critical Value tables we find that:

Pr\left(\chi^{2}_{5} \geq 9.02\right) > 0.100

Using technology (e.g. Excel’s  \texttt{CHIDIST}  function or R’s  \texttt{pchisq()}  function) we find:

Pr\left(\chi^{2}_{5} \geq 9.02\right) \approx 0.108

Step 3: Reject / Fail to Reject the Null Hypothesis (H0). State your conclusion.

Using the Critical Value method:

Thus we conclude that we Fail to Reject H0 with \chi^{2}_{5} = 9.02, {\chi^{2}_{5}}_{crit} = 11.070 and \alpha = 0.05.

The sample data does not provide statistically significant evidence that the population distribution of outcomes for the casino die is not uniform (i.e. at least one of \left\{Pr(1), Pr(2), Pr(3), Pr(4), Pr(5), Pr(6)\right\} \neq \frac{1}{6} )

Using the p-value method:

Thus we conclude that we Fail to Reject H0 with \chi^{2}_{5} = 9.02, p = 0.108 and \alpha = 0.05.

The sample data does not provide statistically significant evidence that the population distribution of outcomes for the casino die is not uniform (i.e. at least one of \left\{Pr(1), Pr(2), Pr(3), Pr(4), Pr(5), Pr(6)\right\} \neq \frac{1}{6} )

Note that we did not conduct any post-hoc test in this case. This is because when we decide to Reject H0 in a Chi-Square (\chi^{2}) Goodness-of-Fit test, we are saying that the sample data suggest that the model is not an appropriate fit and that’s it. There is nothing else to add.

Last updated: 10 September 2020