This post is the fourth and last post in a statistics series that have appeared each Monday in May. The series walks through the main steps in hypothesis testing, which are:

1. Statement of Claim 1, called the null hypothesis ($H_0$), and Claim 2, called the alternative hypothesis ($H_a$).
2. Collection and summary of relevant data from a random sample, using a test statistic.
3. Assessment of the likelihood of observing the data observed if $H_0$ was true, using the p-value.
4. Determination of whether or not the evidence is strong enough to reject $H_0$

In parts 1, 2, and 3, we established our hypotheses of interest, assessed the z-test statistics, and calculated the corresponding p-values in our APA case study  “Meta-Analysis of Intellectual and Neuropsychological Test Performance in Attention-Deficit/Hyperactivity Disorder”.

This study compared the cognitive and neuropsychological performance of ADHD subjects to healthy subjects across a span of assessments. The null hypothesis, $H_0$, suggested that no difference existed in the cognitive and neuropsychological performances between the two groups, while the alternative hypotheses, $H_a$ we focused on suggested:

1. The overall cognitive ability, as measured by a Full Scale IQ, of ADHD subjects is significantly less than that of healthy subjects, and
2. The total neuropsychological performance, as measured by a variety of neuropsychological assessments, of ADHD subjects is significantly less than that of healthy subjects.

For the first alternative hypothesis, z-tests found a difference of 27.72 standard deviations (z = 27.72) between the average score of ADHD subjects on the FSIQ and the average score of healthy subjects on the FSIQ. The corresponding p-value is p < 0.00001.

For the second alternative hypothesis, z-tests found a difference of 2.5 standard deviations (z = 2.5) between the two groups on the neuropsychological assessments. The corresponding p-value is p = 0.00621.

Now we progress to the last step, that of interpreting the p-value and drawing conclusions in context. This study uses a confidence level of 95%. When researchers set a confidence level, they are saying, “We are 95% confident that the circumstance is such-and-such a way.” Setting a confidence level helps us determine how significant the data observed is.

Results are significant if they are very unlikely to be observed under the null hypothesis; in this case, assuming that the cognitive and neuropsychological ability of ADHD subjects is the same as the general non-ADHD population.

The above p-values relate to us that there is a 0.001% probability of observing the difference in mean scores between the two groups on the FSIQ if the cognitive ability between them is indeed equal, and that there is a 0.621% probability of observing the difference in mean scores between the two groups on the neuropsychological assessments if the neuropsychological ability between them is indeed equal.

Using a 95% confidence level, we say that the results are significant if the p-value is less than 0.05. We can also say that the probability of observing the data under the null hypothesis is less that 5%.

Clearly, p < 0.00001 and p = 0.00621 fall far below the 0.05 threshold. What, then, does this mean?

The p-values for both cases are small enough to reject the null hypothesis that the cognitive and neuropsychological abilities of ADHD subjects is the same as healthy subjects. Thus, we can accept both alternative hypotheses that predicted inferior performance of ADHD subjects to healthy subjects.

X $H_0: \mu = \mu_0$
$H_a: \mu_FSIQ < \mu_{0(FSIQ)}$
$H_a: \mu_total < \mu_{0(total)}$

That’s the ultimate goal of hypothesis testing: determination of statistical significance. Significance is important because it puts the results in context.

Say, for example, that a survey found the 90% of Americans opposed stricter gun legislation. Big, important number, right? That depends on how large the sample size polled was. What if the survey involved only 10 participants? The z-statistic and, by extension, the p-value would reflect this. Specifically, the z-test would be small (not much deviation from the null value) and the p-value would be large (observed data likely under the null hypothesis).

I hope you’ve enjoyed these hypothesis testing posts. By far, this is my favorite topic in statistics.

Other posts in “Hypothesis Testing”