Parametric tests

Comparing means

This week we will be discussing how to compare between the means of normally distributed populations using population parameters (μ and σ). We call these tests parametric tests.

T-distribution

In real-life studies, however, we rarely know the population standard deviation (σ).

We need to estimate it using the standard error of the mean (SE). This then changes our normal sampling distribution to a t-distribution.

t = \frac{\bar{Y} - \mu}{SE_\bar{Y}}

T-critical value

The critical value of a t-distribution is combined 5% area under the tails of the distribution (2.5% on both sides). Only 2.5% of one tail is considered for a one-tailed test.

95% critical value for the mean

 \bar{Y} - t_{0.05(2),df} * SE_{\bar{Y}} < \mu < \bar{Y} + t_{0.05(2),df} * SE_{\bar{Y}}

T-test suite

One-sample t-test: compare the mean of a sample to a proposed population mean.

Assumptions:

  • Data is a random sample from the population
  • Y is normally distributed in the population

t = \frac{\bar{Y} - \mu_{0}}{SE_\bar{Y}}

Paired t-test: compare the mean difference between two samples where samples are paired  with each other (e.g., before and after observations from the each individual in a sample)

Assumptions:

  • Data is a random sample from the population
  • Paired differences normally distributed in the population

t = \frac{\bar{d} - \mu_{d0}}{SE_\bar{d}}}

We can also compare the means of two samples that are not paired, most comparisons of means will fall under the two-sample t-test.

Assumptions:

  • Data is a random sample from the population
  • Data is normally distributed in the population
  • Variance between samples is equal

t = \frac{\bar{Y}_{1} - \bar{Y}_{2}}{SE_{\bar{Y_{1}} - \bar{Y_{2}}}}

SE_{\bar{Y_{1}} - \bar{Y_{2}}} = \sqrt{s^{2}_{p}(\frac{1}{n_{1}} + \frac{2}{n_{2}})}

s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}

Assumption tests

Checking normality

Checking normality is relatively straight forward. We produce a histogram of our observations and use the eyeball test. If it looks normal (i.e., like a bell curve), then we can assume our data is normal.

There is also a hypothesis test called the Shapiro-Wilk test, which is a goodness-of-fit test for the normal distribution. A P-value > 0.05 indicates that the data is likely to be from a population that is normally distributed.

Checking homogeneity of variances

While we can use the “eyeball test” to determine if our data is from a normal distribution. We can’t eyeball equal variances. Instead, we can use a hypothesis tests. One way is the F-test, the other is Levene’s test.

# One sample t assuming a population mean of 15
data <- c(13, 14, 13, 12, 14, 15, 16, 13, 14, 12)
t.test(data, mu = 15)
# Paired t test
group1 <- c(14, 12, 13, 12, 15, 12, 15, 15, 12, 13)
group2 <- c(12, 15, 14, 12, 13, 13, 14, 13, 12, 12)
t.test(group1, group2, paired = TRUE)
# Two sample t
group1 <- c(14, 12, 13, 12, 15, 12, 15, 15, 12, 13)
group2 <- c(12, 15, 14, 12, 13, 13, 14, 13, 12, 12)
# Test equal variances
var.test(group1, group2)
# If variances are equal (Student's t-test)
t.test(group1, group2, var.equal = TRUE)
# If variances are not equal (Welch's t-test)
t.test(group1, group2) # This is the default in R
view raw t_test.r hosted with ❤ by GitHub
R code
import scipy.stats as stats
import numpy as np
## One sample t
data=[13, 14, 13, 12, 14, 15, 16, 13, 14, 12]
x=stats.ttest_1samp(a=data, popmean=15)
print(x)
# Two sample t
group1=np.array([14, 12, 13, 12, 15, 12, 15, 15, 12, 13])
group2=np.array([12, 15, 14, 12, 13, 13, 14, 13, 12, 12])
# Test equal variances
vartest=stats.levene(group1, group2)
print(vartest)
# If variances are equal
varequal=stats.ttest_ind(a=group1, b=group2, equal_var=True)
print(varequal)
# If variances are not equal
varNOTequal=stats.ttest_ind(a=group1, b=group2, equal_var=False)
print(varNOTequal)
# Paired t test
pairedT=stats.ttest_rel(a=group1, b=group2)
print(pairedT)
view raw t_test.py hosted with ❤ by GitHub
Python code