Estimating with uncertainty

Sampling distributions, standard error, and 95% continence intervals

Sampling distributions

Estimate: an inference about a population parameter based on a subsample of that population. Since samples are only a subset of the entire population, there is a chance that estimates from repeated sampling from that population can be off by a particular amount. We can visualize this uncertainty by plotting a sampling distribution.

For this figure, the true population mean is 43.92 mm.

Standard error: The standard deviation of the sampling distribution.

95% confidence interval: the range likely to contain the true population mean.
R code
# Dealing with uncertainty in Python
import pandas as pd
import numpy as np
from scipy.stats import t
from palmerpenguins import load_penguins
penguins = load_penguins().dropna()
# Split the data frame into groups
groups = penguins.groupby('species')
# Calculate sample means, standard deviations, and standard errors for each group
means = groups['bill_length_mm'].mean()
std_devs = groups['bill_length_mm'].std()
sizes = groups['bill_length_mm'].size()
sems = groups['bill_length_mm'].sem()
# Calculate 95% CIs for each group
cis = {}
for group, mean, std_dev, size, sem in zip(means.index, means, std_devs, sizes, sems):
ci_1 = mean – (2 * sem)
ci_2 = mean + (2 * sem)
cis[group] = ci_1, ci_2
view raw hosted with ❤ by GitHub
Python code