Research Tools › Kappa Agreement

Kappa Agreement Test (Inter-Rater Agreement)

Assesses the agreement of two raters (or two methods) on a classification, the true agreement after removing chance agreement. It gives Cohen's Kappa (nominal categories); when ratings are ordered grades (e.g. 0/1/2/3), it additionally gives linear and quadratic weighted Kappa. Suited to observer agreement for imaging grades, pathology reads, etc.

① Enter paired ratings

One study object per line, the two raters' ratings separated by a space or comma. Ratings may be numbers or text labels; if all numeric, they are treated as ordered and weighted Kappa is given automatically.

How to use & methodology

How does Kappa differ from a raw agreement rate?

A raw agreement percentage does not remove the part that agrees 'by chance'. Kappa subtracts chance agreement from the observed agreement, better reflecting true agreement, so it is usually lower than the raw percentage.

When do I use weighted Kappa?

When ratings are ordered grades (e.g. BI-RADS 0/1/2/3, pathology grades), a one-grade difference and a three-grade difference differ in severity, so weighted Kappa should be used. Quadratic weighting penalizes small disagreements less and is the common choice for ordered-grade studies.

What Kappa counts as good?

The common Landis & Koch standard: 0.41–0.60 moderate, 0.61–0.80 substantial, above 0.81 almost perfect. But Kappa is affected by an imbalanced category distribution, so interpret in context.

Is Kappa used for continuous-measurement agreement?

No. Kappa is for categorical/grade data. For inter-rater agreement of continuous variables (e.g. measurements), use the intraclass correlation coefficient ICC (this site's Labs has a separate ICC tool).