Inter-Rater Reliability NotesInter-Rater Reliability Discussion Corner
by Kilem L. Gwet, Ph.D.

Home Book Excerpts Software Research Papers Store Contact

For new posts, please refer to my new blog at:

Sample Size Calculation for Kappa-Like Coefficients (Posted: May 6, 2013)
On June 28, 2012, I posted a note outlining an approach for calculating the required number of subjects necessary in an inter-rater reliability study to ensure a predetermined error margin for a chancecorrected agreement coefficient. The proposed approach requires the knowledge of some parameters that are generally unknown at the design stage. In this post, I am recommending a simpler and more practical approach for estimating the number of subjects as well as the number of raters in a multiple-rater study. (read more... or download PDF file)

Calculating Intraclass Correlation with AgreeStat 2011.1 (Posted: October 24, 2011)
AgreeStat 2011.1 for Excel Windows provides the simplest way for researchers to compute the Intraclass Correlation Coefficient (ICC). It is a self-automated workbook containing a Visual Basic for Applications program, and requires no installation. You simply need to have MS Excel 2007 or 2010 for Windows. The Mac version of AgreeStat 2011.1 will be released in the beginning of 2012. Download the trial version of AgreeStat 2011.1 here, and you will be surprised to see how easy and intuitive it is (read more... or download PDF file)

Fleiss' Generalized Kappa is NOT Kappa. It is a Generalized Version of Scott's Pi (Posted: Tuesday, May 3, 2011)
The term "Kappa" has been used in the inter-rater reliability literature to refer to almost any chance-corrected agreement coefficient. It is unfortunate since this situation has created some confusion among researchers as to what is Kappa and what is not (read more...)

Calculating the AC2 Coefficient using AgreeStat (Posted: Monday, May 2, 2011)
In this post, I would like to show how the AC2 coefficient can be calculated using AgreeStat. I indicated in the Handbook of Inter-Rater Reliability (2nd edition, page 80) that AC2 is actually a weighted AC1 based on the specific set of weights known as Quadratic Weights (read more ...)

Correcting Inter-Rater Reliability for Chance Agreement: Why? (Posted: Monday, July 5, 2010)
In this post, I would like to address the issue as to whether agreement coefficients should or should not be adjusted (or corrected) for the possibility of agreement occurring by pure chance between two raters. A natural and crude way to quantify the extent of agreement between two raters is to compute the relative number of times they both agree about the classification of a number of subjects (read more ...)

Sample Size Determination (Posted: Monday, June 28, 2010)
I have received several e-mails from researchers asking how the sample size should be determined to ensure the validity of inter-rater reliability results. In many instances, researchers worry about the validity of their Kappa coefficient (read more ...)