This post is the third in four that will appear each Monday in May. The series walks through the main steps in hypothesis testing, which are:

1. Statement of Claim 1, called the null hypothesis ($H_0$), and Claim 2, called the alternative hypothesis ($H_a$).
2. Collection and summary of relevant data from a random sample, using a test statistic.
3. Assessment of the likelihood of observing the data observed if $H_0$ was true, using the p-value.
4. Determination of whether or not the evidence is strong enough to reject $H_0$

In the first part, we identified the null hypothesis and alternative hypotheses of interest in our case study, “Meta-Analysis of Intellectual and Neuropsychological Test Performance in Attention-Deficit/Hyperactivity Disorder”. In our second part, identified the z-test statistics for the variables of interest, namely the mean scores of ADHD groups versus healthy groups on various cognitive and neuropsychological assessments. These test statistics told us how much the observed results deviated from the expected results as predicted by the null hypothesis.

Recall that the null hypothesis and alternative hypotheses in our review are as follows:

$H_0: \mu = \mu_0$
$H_a: \mu_{FSIQ} < \mu_{0{(FSIQ)}}$
$H_a: \mu_{total} < \mu_{0{(total)}}$

The first alternative hypothesis focuses on overall cognitive ability, while the second focuses on total neuropsychological performance. For the FSIQ [Full Scale IQ] scores of the ADHD groups relative to the control groups,  z = 27.72, meaning that the mean scores of the ADHD group differed from the control group by 27.72 standard deviations. For the comparison of neuropsychological performance, z = $\geq 2.5$.

Now in step 3 of hypothesis testing, we use a p-value to evaluate the significance of the results.

A p-value basically tells us that we have a x percent change of observing the results that we did given that the null hypothesis is true. In our case study, the p-value says that the probability of observing a difference of 27.72 standard deviations in the FSIQ and 2.5 or more standard deviations in the range of neuropsychological assessments between the ADHD group and the non-ADHD group is such-and-such amount if there really is no difference in the mean scores on a given cognitive or neuropsychological assessment between the two (the $H_0$).

To use a p-value, you must first set a confidence level. When a study uses, say, a 90% confidence level, the researchers are informing their colleagues, “We can say with 90% confidence that these results are true.” If their research successfully rejected the null hypothesis of their study, then the p-value must have been less than 1.0 – 0.9, or 0.1.

Most statistical studies use a confidence level of 95%, and our ADHD is no different. With a confidence level of 95%, the p-value must be less than 1.0 – 0.95, or 0.05, for the researchers to reject the null hypothesis. A p-value less than 0.05 says that it is very unlikely to observe data at least as extreme as those that we did if the null hypothesis was true. Thus, the difference must come not from study design or error, but rather from circumstances that the alternative hypotheses posit. Ergo, the null hypothesis is very likely wrong.

As the above graph shows, a small enough p-value falls outside the range of “Fail to reject $H_0$“. (The -2 and +2 represent a difference of -2 and +2 standard deviations. Any points beyond this degree of deviation falls outside of a 95% confidence interval.)

Unlike a test statistic, p-values are not hand-calculated. To employ them in our most scientific and official ADHD research, we turn to statistical packages (Excel, R, Minitab, etc.), to z tables (obviously only applicable to z-statistics), or online calculators. We’ll use the latter, as z-tables commonly found online go only up to 3.49.

For z = 27.72, which concerns the first $H_a$, p < 0.00001.
For z = 2.5, which concerns the second $H_a$, p = 0.00621.

Both of these p-values fall below the confidence level 0.05. So, what now? In the next and last “Hypothesis Testing” post, we will wrap up this case study by interpreting the p-values in context.

Other posts in “Hypothesis Testing”