T-Test Calculator

Free t-test calculator. Run one-sample, two-sample (Student or Welch's), and paired t-tests. Get t-statistic, p-value, degrees of freedom, critical values, confidence interval, and Cohen's d effect size.

Sample 1
Sample 2
Welch's t-Test

t = 1.9297

t = (72.5 − 68.1) / √(s₁²/n₁ + s₂²/n₂)
df
54.36
p-value
0.0589
t-crit
±2.005
Fail to reject H₀ — not statistically significant
At α = 0.05, p-value (0.0589) is greater than α.

Test Statistics

H₀: μ = μ₂ · H₁: μ μ₂

t-statistic
Test statistic
1.9297
Degrees of Freedom
df
54.3621
p-value
Two-tailed
0.0589
Critical t-value
At α = 0.05, two-tailed
±2.0046
Standard Error
SE of the mean difference
2.2801
Mean Difference
x̄₁ − x̄₂
4.4000

Confidence Interval & Effect Size

95% CI for the mean difference

95% Confidence Interval
[-0.1706, 8.9706]
Point estimate: 4.4000 · margin: ±4.5706
Cohen's d (effect size)
0.509
Medium effect
Guideline: |d| < 0.2 negligible · 0.2–0.5 small · 0.5–0.8 medium · ≥ 0.8 large

Critical t-Values (Two-Tailed)

Reference table for common df and α levels

dfα = 0.10α = 0.05α = 0.02α = 0.01
5±2.015±2.571±3.365±4.032
10±1.812±2.228±2.764±3.169
15±1.753±2.131±2.602±2.947
20±1.725±2.086±2.528±2.845
25±1.708±2.060±2.485±2.787
30±1.697±2.042±2.457±2.750
40±1.684±2.021±2.423±2.704
60±1.671±2.000±2.390±2.660
120±1.658±1.980±2.358±2.617
±1.645±1.960±2.326±2.576

What Is a T-Test?

The Student's t-test for comparing means

A t-test is a statistical hypothesis test that compares means when the population standard deviation is unknown and must be estimated from the sample. It was developed by William Sealy Gosset (under the pen name “Student”) in 1908 while working at the Guinness Brewery, and is one of the most widely used tools in applied statistics.

The t-test tells you whether an observed difference between means is statistically significant — meaning it's unlikely to have arisen by random chance alone — or whether it's small enough that you should withhold judgment.

Three Flavors of t-Test
One-Sample
Compare one sample mean to a known or hypothesized population mean.
Two-Sample
Compare the means of two independent groups (e.g. treatment vs. control).
Paired
Compare matched pairs — typically before/after measurements on the same subjects.

T-Test Formulas

One-sample, two-sample (Student & Welch's), and paired

One-Sample t-Test
t = (x̄ − μ₀) / (s / √n)
= sample mean
μ₀ = hypothesized mean
s = sample SD
df = n − 1
Two-Sample t-Test (Student, pooled)
t = (x̄₁ − x̄₂) / √(sp² · (1/n₁ + 1/n₂))
where sp² = ((n₁−1)s₁² + (n₂−1)s₂²) / (n₁ + n₂ − 2) is the pooled variance, and df = n₁ + n₂ − 2.
Welch's t-Test (unequal variance)
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
Welch's uses the Welch–Satterthwaite approximation for degrees of freedom, which is typically non-integer and smaller than n₁+n₂−2. It's the recommended default when variances or sample sizes differ.
Paired t-Test
t = d̄ / (sd / √n)
Compute the per-pair differences d = before − after, then run a one-sample t-test on those differences. df = n − 1 where n is the number of pairs.
Worked Example (One-Sample)

A coffee roaster claims each bag weighs 340 g. A sample of n = 25 bags gives x̄ = 336 g with s = 6 g.

t = (336 − 340) / (6 / √25) = −4 / 1.2 = −3.333
df = 24, two-tailed p ≈ 0.0028

p < 0.05, so you'd reject H₀ and conclude the mean weight differs from 340 g.

How to Interpret the Results

t, p-value, critical value, confidence interval, and effect size

t-statistic
Number of standard errors between the observed and hypothesized mean. Larger |t| = stronger evidence against H₀.
p-value
Probability of observing a t-value at least as extreme as yours if H₀ were true. Reject H₀ when p < α.
Critical t-value
The t-value that corresponds exactly to your α. The rejection region is |t| > tcrit (two-tailed).
Confidence Interval
Range of plausible values for the true mean difference at the chosen confidence level (e.g. 95%).
Cohen's d — Effect Size

Statistical significance answers “is there a difference?” — effect size answers “how big is it?”. Cohen's d standardizes the mean difference by the standard deviation.

|d| < 0.2
Negligible
0.2–0.5
Small
0.5–0.8
Medium
≥ 0.8
Large

Assumptions & When to Use a T-Test

Conditions that must hold for results to be reliable

Continuous data
The dependent variable is measured on an interval or ratio scale (height, score, reaction time).
Approximate normality
The sampling distribution of the mean is normal. For n ≥ 30 the CLT makes this robust to mild skew.
Independent observations
Each data point is independent of the others. Paired designs relax this for the two groups but require independent pairs.
Equal variances (Student only)
Required for the pooled Student's test. If unmet, use Welch's — it's safer and almost always the better default.
A/B testing
Compare user-level continuous metrics — revenue per user, session duration, or order value — between a control and a variant. For binary conversion rates, use a proportion z-test instead.
Education research
Compare test scores before and after an intervention (paired) or between two teaching methods (two-sample).
Clinical trials
Test whether a drug reduces blood pressure versus placebo, or evaluate repeated biomarker measurements.
Quality control
Verify that a batch's mean weight, length, or tensile strength matches the specification (one-sample).

T-Test vs. Z-Test: When to Use Which

Choosing the right test for your data

CriterionT-TestZ-Test
Population σUnknown — estimated from sample (s)Known
Sample sizeAny — especially n < 30Usually n ≥ 30
DistributionStudent's t (heavier tails, df-dependent)Standard normal
Critical value (95%, two-tailed)2.045 (df = 29)1.960
Best defaultAlmost always the safer choiceOnly when σ is truly known

As n grows, the t-distribution converges to the standard normal: by df = 120 the 95% critical value is 1.980 vs 1.960. In practice, default to the t-test unless you have compelling reason to use a z-test.

Common Mistakes to Avoid

Pitfalls that invalidate your t-test results

Using Student's t-test when variances differ
Mistake: Defaulting to pooled Student's t-test without checking variance equality
Correct: Use Welch's t-test by default — it handles unequal variances and sample sizes gracefully with minimal loss of power
Running an independent test on paired data
Mistake: Treating before/after measurements on the same people as two independent samples
Correct: Use a paired t-test — it removes between-subject variability and gives much more power
One-tailed test after seeing the direction
Mistake: Peeking at the data, then switching to a one-tailed test to lower the p-value
Correct: Pre-specify a one-tailed test only when your hypothesis is directional before data collection
Ignoring effect size
Mistake: Declaring a finding important just because p < 0.05
Correct: Always report Cohen's d or a confidence interval — a tiny effect with huge n can be 'significant' yet meaningless
Applying the t-test to non-continuous data
Mistake: Running a t-test on counts, proportions, or ordinal Likert scales
Correct: Use a chi-square test for counts, a proportion z-test for percentages, or a Mann-Whitney U for ordinal data
Ignoring normality with small samples
Mistake: Running a t-test on highly skewed data with n < 20
Correct: For small, non-normal samples use a non-parametric alternative (Wilcoxon signed-rank or Mann-Whitney U)

Frequently Asked Questions

Common questions and detailed answers

Embed T-Test Calculator

Add this calculator to your website or blog for free.