What is a t-test used for?

A t-test compares means to determine whether an observed difference is statistically significant. You use a one-sample t-test to compare a sample mean to a known or hypothesized population mean, a two-sample t-test to compare the means of two independent groups, and a paired t-test to compare matched pairs (for example, before-and-after measurements on the same subjects).

When should I use Welch's t-test instead of Student's t-test?

Use Welch's t-test whenever your two samples might have unequal variances or unequal sample sizes — which is almost always. Welch's uses a Satterthwaite-adjusted degrees of freedom and is nearly as powerful as Student's when variances really are equal, but stays valid when they aren't. Most modern statistical software (including R's default t.test) uses Welch's.

What is a good sample size for a t-test?

For a t-test to be reliable, a rule of thumb is n ≥ 30 per group so the Central Limit Theorem makes normality assumptions safe, though smaller samples are fine when the underlying data is roughly normal. For detecting a medium effect (Cohen's d = 0.5) at α = 0.05 with 80% power, you need about 64 per group for a two-sample test and about 34 for a paired test.

How do I know if my t-test result is statistically significant?

Compare your p-value to your significance level α (usually 0.05). If p < α, reject the null hypothesis — the difference is statistically significant. Equivalently, if |t| exceeds the critical t-value for your degrees of freedom and α, reject H₀. This calculator displays both the p-value and the critical value, plus a clear reject / fail-to-reject banner.

What does a negative t-value mean?

The sign of t only reflects the direction of the difference, not its strength. A negative t means the sample mean is below the hypothesized value (one-sample) or the first group's mean is below the second group's (two-sample). For a two-tailed test, the decision depends only on |t|, so the sign doesn't affect significance — only interpretation.

What is the difference between a t-test and an ANOVA?

A t-test compares the means of one or two groups. An ANOVA (Analysis of Variance) generalizes this to three or more groups. Running multiple pairwise t-tests on more than two groups inflates your false-positive rate — use ANOVA instead, then follow up with post-hoc tests (like Tukey's HSD) if the ANOVA is significant.

How is Cohen's d calculated and why does it matter?

Cohen's d = (mean difference) / (standard deviation). For two samples it uses the pooled SD. It standardizes effect size independent of sample size, so |d| < 0.2 is negligible, 0.2–0.5 is small, 0.5–0.8 is medium, and ≥ 0.8 is large. Statistical significance tells you whether an effect exists; Cohen's d tells you how big it is — report both.

Can I run a t-test in Excel or Google Sheets?

Yes. In Excel, use =T.TEST(array1, array2, tails, type) where type 1 = paired, 2 = Student's pooled, 3 = Welch's, or run Data > Data Analysis > t-Test. Google Sheets uses the same T.TEST function. But a dedicated calculator like this one gives you t, df, p-value, critical value, confidence interval, and Cohen's d in one view — without data cleanup.

T-Test Calculator

Sample 1

Mean (x̄₁)

SD (s₁)

n₁

Sample 2

Mean (x̄₂)

SD (s₂)

n₂

Variance Assumption

Tail Type

Significance (α)

Welch's t-Test

t = 1.9297

t = (72.5 − 68.1) / √(s₁²/n₁ + s₂²/n₂)

54.36

p-value

0.0589

t-crit

±2.005

Fail to reject H₀ — not statistically significant

At α = 0.05, p-value (0.0589) is greater than α.

Test Statistics

H₀: μ₁ = μ₂ · H₁: μ₁ ≠ μ₂

t-statistic

Test statistic

1.9297

Degrees of Freedom

54.3621

p-value

Two-tailed

0.0589

Critical t-value

At α = 0.05, two-tailed

±2.0046

Standard Error

SE of the mean difference

2.2801

Mean Difference

x̄₁ − x̄₂

4.4000

Confidence Interval & Effect Size

95% CI for the mean difference

95% Confidence Interval

[-0.1706, 8.9706]

Point estimate: 4.4000 · margin: ±4.5706

Cohen's d (effect size)

0.509

Medium effect

Guideline: |d| < 0.2 negligible · 0.2–0.5 small · 0.5–0.8 medium · ≥ 0.8 large

Critical t-Values (Two-Tailed)

Reference table for common df and α levels

df	α = 0.10	α = 0.05	α = 0.02	α = 0.01
5	±2.015	±2.571	±3.365	±4.032
10	±1.812	±2.228	±2.764	±3.169
15	±1.753	±2.131	±2.602	±2.947
20	±1.725	±2.086	±2.528	±2.845
25	±1.708	±2.060	±2.485	±2.787
30	±1.697	±2.042	±2.457	±2.750
40	±1.684	±2.021	±2.423	±2.704
60	±1.671	±2.000	±2.390	±2.660
120	±1.658	±1.980	±2.358	±2.617
∞	±1.645	±1.960	±2.326	±2.576

What Is a T-Test?

The Student's t-test for comparing means

A t-test is a statistical hypothesis test that compares means when the population standard deviation is unknown and must be estimated from the sample. It was developed by William Sealy Gosset (under the pen name “Student”) in 1908 while working at the Guinness Brewery, and is one of the most widely used tools in applied statistics.

The t-test tells you whether an observed difference between means is statistically significant — meaning it's unlikely to have arisen by random chance alone — or whether it's small enough that you should withhold judgment.

Three Flavors of t-Test

One-Sample

Compare one sample mean to a known or hypothesized population mean.

Two-Sample

Compare the means of two independent groups (e.g. treatment vs. control).

Paired

Compare matched pairs — typically before/after measurements on the same subjects.

T-Test Formulas

One-sample, two-sample (Student & Welch's), and paired

One-Sample t-Test

t = (x̄ − μ₀) / (s / √n)

x̄ = sample mean

μ₀ = hypothesized mean

s = sample SD

df = n − 1

Two-Sample t-Test (Student, pooled)

t = (x̄₁ − x̄₂) / √(s_p² · (1/n₁ + 1/n₂))

where s_p² = ((n₁−1)s₁² + (n₂−1)s₂²) / (n₁ + n₂ − 2) is the pooled variance, and df = n₁ + n₂ − 2.

Welch's t-Test (unequal variance)

t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)

Welch's uses the Welch–Satterthwaite approximation for degrees of freedom, which is typically non-integer and smaller than n₁+n₂−2. It's the recommended default when variances or sample sizes differ.

Paired t-Test

t = d̄ / (s_d / √n)

Compute the per-pair differences d = before − after, then run a one-sample t-test on those differences. df = n − 1 where n is the number of pairs.

Worked Example (One-Sample)

A coffee roaster claims each bag weighs 340 g. A sample of n = 25 bags gives x̄ = 336 g with s = 6 g.

t = (336 − 340) / (6 / √25) = −4 / 1.2 = −3.333

df = 24, two-tailed p ≈ 0.0028

p < 0.05, so you'd reject H₀ and conclude the mean weight differs from 340 g.

How to Interpret the Results

t, p-value, critical value, confidence interval, and effect size

t-statistic

Number of standard errors between the observed and hypothesized mean. Larger |t| = stronger evidence against H₀.

p-value

Probability of observing a t-value at least as extreme as yours if H₀ were true. Reject H₀ when p < α.

Critical t-value

The t-value that corresponds exactly to your α. The rejection region is |t| > t_crit (two-tailed).

Confidence Interval

Range of plausible values for the true mean difference at the chosen confidence level (e.g. 95%).

Cohen's d — Effect Size

Statistical significance answers “is there a difference?” — effect size answers “how big is it?”. Cohen's d standardizes the mean difference by the standard deviation.

|d| < 0.2

Negligible

0.2–0.5

Small

0.5–0.8

Medium

≥ 0.8

Large

Assumptions & When to Use a T-Test

Conditions that must hold for results to be reliable

Continuous data

The dependent variable is measured on an interval or ratio scale (height, score, reaction time).

Approximate normality

The sampling distribution of the mean is normal. For n ≥ 30 the CLT makes this robust to mild skew.

Independent observations

Each data point is independent of the others. Paired designs relax this for the two groups but require independent pairs.

Equal variances (Student only)

Required for the pooled Student's test. If unmet, use Welch's — it's safer and almost always the better default.

A/B testing

Compare user-level continuous metrics — revenue per user, session duration, or order value — between a control and a variant. For binary conversion rates, use a proportion z-test instead.

Education research

Compare test scores before and after an intervention (paired) or between two teaching methods (two-sample).

Clinical trials

Test whether a drug reduces blood pressure versus placebo, or evaluate repeated biomarker measurements.

Quality control

Verify that a batch's mean weight, length, or tensile strength matches the specification (one-sample).

T-Test vs. Z-Test: When to Use Which

Choosing the right test for your data

Criterion	T-Test	Z-Test
Population σ	Unknown — estimated from sample (s)	Known
Sample size	Any — especially n < 30	Usually n ≥ 30
Distribution	Student's t (heavier tails, df-dependent)	Standard normal
Critical value (95%, two-tailed)	2.045 (df = 29)	1.960
Best default	Almost always the safer choice	Only when σ is truly known

As n grows, the t-distribution converges to the standard normal: by df = 120 the 95% critical value is 1.980 vs 1.960. In practice, default to the t-test unless you have compelling reason to use a z-test.

Common Mistakes to Avoid

Pitfalls that invalidate your t-test results

Using Student's t-test when variances differ

Mistake: Defaulting to pooled Student's t-test without checking variance equality

Correct: Use Welch's t-test by default — it handles unequal variances and sample sizes gracefully with minimal loss of power

Running an independent test on paired data

Mistake: Treating before/after measurements on the same people as two independent samples

Correct: Use a paired t-test — it removes between-subject variability and gives much more power

One-tailed test after seeing the direction

Mistake: Peeking at the data, then switching to a one-tailed test to lower the p-value

Correct: Pre-specify a one-tailed test only when your hypothesis is directional before data collection

Ignoring effect size

Mistake: Declaring a finding important just because p < 0.05

Correct: Always report Cohen's d or a confidence interval — a tiny effect with huge n can be 'significant' yet meaningless

Applying the t-test to non-continuous data

Mistake: Running a t-test on counts, proportions, or ordinal Likert scales

Correct: Use a chi-square test for counts, a proportion z-test for percentages, or a Mann-Whitney U for ordinal data

Ignoring normality with small samples

Mistake: Running a t-test on highly skewed data with n < 20

Correct: For small, non-normal samples use a non-parametric alternative (Wilcoxon signed-rank or Mann-Whitney U)

Frequently Asked Questions

Common questions and detailed answers

First, compute the mean difference you care about — x̄ − μ₀ for a one-sample test, x̄₁ − x̄₂ for two samples, or d̄ for paired data. Next, compute the standard error: s/√n for one-sample and paired, or √(s₁²/n₁ + s₂²/n₂) for Welch's. Then divide: t = (mean difference) / SE. Finally, compare |t| to the critical t-value at your chosen degrees of freedom and α level, or look up the exact p-value from the t-distribution.

The p-value is the probability of observing a t-statistic at least as extreme as yours if the null hypothesis were true. For a two-tailed test, p = 2 × P(T > |t|) where T follows a Student's t-distribution with your degrees of freedom. For a one-tailed test, use the appropriate single tail. This calculator computes the exact p-value using the regularized incomplete beta function — no tables needed.

A paired t-test is used when the two sets of measurements are linked — the same subjects measured twice (before/after), matched controls, or twin pairs. By operating on the per-pair differences it removes between-subject variability and gives more power. An independent two-sample test is used when the two groups contain different, unrelated subjects. Using the wrong one can seriously mislead you.

Four core assumptions: (1) the dependent variable is continuous (interval or ratio scale); (2) observations are independent; (3) the sampling distribution of the mean is approximately normal — the CLT handles this for n ≥ 30; and (4) for the pooled Student's two-sample test, variances are roughly equal (use Welch's if not). Paired tests additionally assume the differences are approximately normally distributed.

Embed T-Test Calculator

Add this calculator to your website or blog for free.

Related calculators from other categories

Last updated Apr 15, 2026

T-Test Calculator

Test Statistics

Confidence Interval & Effect Size

Critical t-Values (Two-Tailed)

What Is a T-Test?

T-Test Formulas

How to Interpret the Results

Assumptions & When to Use a T-Test

T-Test vs. Z-Test: When to Use Which

Common Mistakes to Avoid

Frequently Asked Questions

Embed T-Test Calculator

You Might Also Like