MythStats

Methodology

How we run the statistics, how we cite the data, and what we do when we get something wrong.

Paired t-test (default)

Every timing-myth analysis on MythStats uses the paired t-test on year-matched differences as the primary test. Treatment and control are paired on year and on city, so year-on-year level effects (a high-crime year, a pandemic year, a population shift) drop out of the comparison.

Welch's independent-samples t-test on the same data is conservative because it discards the pairing information and lets year-level variation into the denominator. Welch is included only when the structural pairing is missing.

This lesson comes directly from the launch piece: on the Mother's Day analysis, per-city Welch tests are individually not significant, while the paired test on year-matched differences gives clean significance because pairing pulls the right noise out.

Sign-test (robustness)

The sign-test asks a single question: across N year-matched comparisons, how often did the treatment fall on the predicted side of the control? It makes no distributional assumption and produces a binomial p-value from the directional count alone.

We use the sign-test as the cleanest defensible single-number claim. On the launch piece, 10 out of 10 city-years went the predicted direction, giving a sign-test p of 0.002 with no reliance on Gaussian-like distributions.

Welch (when the pairing is absent)

Welch's independent-samples t-test compares two groups with unequal variances. It is the right test when treatment and control are not naturally paired across a structural axis. It is the wrong primary test for timing-myth analyses where pairing on year and city is straightforward.

Sources and data provenance

Each article cites primary data sources at the bottom of the article and in the JSON-LD isBasedOn field. We prefer government open-data portals, peer-reviewed datasets, and clearinghouses with documented release schedules. When we use a proxy variable, we name it and quantify the proxy weakness in the methodology drawer.

Corrections policy

When a verdict materially changes on annual data refresh, or when a methodology error is found, or when a reader-reported data error is verified, we publish a correction at the corrections log. Each correction has its own permalink. Original verdict, new verdict, what changed, when, and why are all recorded.

We frame corrections as the system working, not as the system failing. Refreshing analysis when new data arrives is the point.

Public dataset access

Every article links to the underlying dataset under CC BY 4.0. Replications, alternative cuts, and challenges are welcome. The methodology drawer on each article links back here for the analytical pattern; the article's download link points at the raw data.