Research Tools › Prediction Model Sample Size (Riley)

Minimum Sample Size for Clinical Prediction Models (Binary)

⚗️ Frontier method: Based on the Riley et al. (2019) prediction-model sample-size criteria — relatively recent and parameter-sensitive. This tool implements the three criteria for a binary outcome; before formal submission, cross-check with the official R package pmsampsize and confirm the source of your R²cs value.

Developing a clinical prediction model (diagnostic/prognostic risk model) requires enough sample to control overfitting. This tool returns the minimum sample size for a binary outcome by Riley's three criteria: ① global shrinkage S=0.9; ② difference between apparent and adjusted Nagelkerke R² ≤0.05; ③ precision of the overall mean risk estimate ±0.05 — taking the largest of the three. Computed locally in your browser; data are not uploaded.

① Input parameters

Number of candidate predictor parameters P all coefficients to be estimated, including dummies, splines, interactions
Outcome prevalence / incidence
Expected Cox-Snell R² (adjusted) from a previous model/pilot; if only the C-statistic is available it must be converted first

How to use & methodology

Why not just use "10 events per variable (EPV=10)"?

EPV=10 is a rule of thumb that ignores expected model performance and the interaction between the number of predictors and the outcome proportion. Riley 2019 proposed sample-size criteria based on shrinkage, optimism and risk-estimate precision — better justified, recommended by TRIPOD and often required by reviewers.

Where does R²cs come from?

Ideally from the Cox-Snell R² reported by a previous similar prediction model, or from fitting pilot data. If only an expected C-statistic (AUC) is available, convert it to R²cs using the Riley (2020)/pmsampsize built-in method — that conversion is approximate, so doing it directly in pmsampsize is recommended.

Which parameters should P count?

P is the total number of predictor parameters to be estimated: a continuous variable counts as 1 (or by the number of spline terms if splined), a k-level categorical variable counts as k−1 dummies, and each interaction counts as 1. Do not just count the number of variables.

Why take the maximum of the three criteria?

The three criteria respectively ensure small overfitting shrinkage, close apparent vs adjusted performance, and a sufficiently precise overall risk estimate. If any one is unmet the sample is insufficient, so the minimum is the largest of the three required sample sizes.