Research Tools › Propensity Score

Propensity Score (PSM / IPTW)

A mainstream method for controlling confounding in observational research: estimate the propensity score by logistic regression, then use 1:1 caliper matching (PSM) or inverse-probability-of-treatment weighting (IPTW) to balance the covariate distribution between treated and control groups. It gives covariate balance (standardised mean difference, SMD) before/after matching/weighting and a Love plot, with an optional outcome effect. Computed locally in your browser; data are not uploaded.

① Data

Paste data with a header: first row = column names, then one record per row (Tab/comma-separated). The treatment and outcome columns are coded 0/1; covariates are numeric (dummy-code categorical variables first).

How to use & methodology

How do I choose between PSM and IPTW?

PSM (matching) is intuitive and easy to present as a matched sample, but discards unmatchable subjects, changes the target population, and shrinks the sample; IPTW (weighting) keeps everyone and estimates the ATE, but is sensitive to extreme weights (when the PS is near 0 or 1). They are often used together to corroborate each other.

Why is SMD better than the P value for assessing balance?

The P value for between-group covariate differences depends on sample size (even tiny differences are significant in large samples); the SMD does not depend on sample size and directly characterises the standardised magnitude of the difference. <0.1 is usually considered well-balanced, making it the recommended metric for matching/weighting.

How do I choose covariates for the propensity-score model?

Include confounders related to the outcome (including pre-treatment variables related to the outcome); do not include intermediate variables that occur after treatment, or variables related only to treatment and not the outcome. Variable selection should be based on subject knowledge and a causal diagram.

Can the propensity score remove all confounding?

No. It can only balance the covariates you measured and included in the model, and is powerless against unmeasured confounding — the fundamental difference from randomisation. Conclusions should include a sensitivity analysis (e.g. E-value) to show robustness to unmeasured confounding.