agreeAgreeStat Analytics
Research & Software for Analyzing Inter-Rater Reliability Data

4-Rater Agreement: Unweighted Analysis of Raw Scores
with Benchmarking

Input Data

The dataset shown below is included in one of the example worksheets of AgreeStat 360 and can be downloaded from the program. It contains the ratings that 4 raters assigned to 12 units. None of the 4 raters rated all 12 units.  Therefore the dataset contains several missing ratings.

The objective is to compute the unweighted extent of agreement among the 4 raters, benchmarked with the Landis-Koch scale, using the  using AgreeStat360 for Excel/Windows. More information about the benchmarking of agreement coefficients is available if needed.

raw ratings from 4 raters

Analysis with AgreeStat/360

To see how AgreeStat360 processes this dataset to produce various agreement coefficients, please play the video below.  This video can also be watched on for more clarity if needed.


The output that AgreeStat360 produces is shown below and contains 3 parts:

  • Summary data: The first part of this output shows the distribution of subjects by rater and category.  The row marginal totals show the number of subjects each rater rated, while the column marginal averages show on average how many subjects each rater classified into each category.

  • Unweighted analysis: Six agreement coefficients are calculated, including Conger's kappa, Gwet's AC1, and more. Each agreement coefficient is associated with precision measures calculated with respect to subjects (i.e. raters are fixed and do not constitue a source of variation), and with respect to both subjects and raters. These precision measures are the standard error, the 95% confidence interval and the p-value.

  • Benchmarking: The last table of this output shows the cumulative probability for each agreement coefficient to belong to the corresponding bechmark scale of column 1. Typically, you would retain the highest scale for which the cumulative membership probability exceeds 0.95.  

benchmarked agreement coefficients among 4 raters