Gwet's AC1 Inter-Rater Agreement
When two raters classify the same set of objects, Kappa suffers from a paradox when categories are very imbalanced ("high agreement, yet low κ"). Gwet's AC1 avoids this with a more reasonable estimate of expected agreement, giving a more robust result. This tool reports both AC1 and Cohen's κ for comparison (bootstrap 95% CI).
① Paste paired ratings
One object per line, two columns for rater 1 and rater 2's category (text or numbers, separated by space/Tab/comma).
How to use & methodology
How do I choose between AC1 and Kappa?
Both measure inter-rater agreement. When one category is very common (imbalanced), Kappa is deflated by its 'expected agreement', producing the paradox of high agreement but low κ; Gwet AC1 uses a more robust expected agreement and avoids this, and is increasingly recommended. Reporting both, and noting how balanced the data are, is safest.
What is the Kappa paradox?
When nearly all objects fall in one category, even two highly agreeing raters can produce a very low or near-zero Cohen κ, because expected agreement is overstated. This can mislead readers into thinking agreement is poor. AC1 was designed to solve this.
Can categories be text?
Yes. Use the same set of category labels in both columns (e.g. positive/negative, 0/1/2, mild/moderate/severe); the tool detects the category set and builds the table automatically.
What about three or more raters?
Cohen κ and AC1 (this tool) are for two raters. For nominal agreement among three or more raters, use the Fleiss Kappa tool.