Comparing means of more than two groups

We can use the **analysis of variance (ANOVA)** is a special type of non-parametric test used to compare means between normally distributed populations from more than groups.

## ANOVA basics

**Assumptions**:

- Samples are taken randomly
- Measurements from each population is normally distributed
- The variances are equal between all populations

MS_{groups}: mean square of groups

MS_{error}: mean square of error

## Calculating the ANOVA test statistic

Step 1: Partition the sum of squares

Calculate a **grand mean** by taking the sum of the product of the means and sample size of each group divided by the **N** total number of observations.

Sum of squares of the groups

Sum of squares of the error

or

Step 2: Calculate the mean squares

k = number of groups

Step 3: Build ANOVA table

Source of variance | Sum of squares | df | Mean squares | F | P |
---|---|---|---|---|---|

Groups | SS_{groups} | groups – 1 | MS_{groups} | MS_{groups} / MS_{error} | P-value |

Error | SS_{error} | observations – groups | MS_{error} | ||

Total | SS_{total} | df_{error} + df_{groups} |

Step 4: If the null is rejected, perform a post-hoc test to determine differences between groups

This can be done using a **Tukey-Kramer test**

## Determining the variance explained by differences in groups

## Checking assumptions

The normality assumption can be checked visually by looking at a **Q-Q plot **of the residuals. If the points fit the straight line well, we can claim that they are normally distributed.

Homogeneity of variances can be checked either by using a **Leveneâ€™s test **or **Bartlettâ€™s test**.