agreeAgreeStat Analytics
Research & Software for Analyzing Inter-Rater Reliability Data

Intraclass Correlation Sample Size Determination
Prospective Power Analysis/2-Way Random Factorial Model


Assume you are in the planning stage of an inter-rater or an intra-rater reliability experiment (you need to first decide which of these studies you are planning). It is also assumed that the two-way random effects design will be adopted.  However, you do not know how many subjects, raters and replicates should be used.

AgreeStat360 can be used to determine the optimal number of subjects, raters as well as the optimal number of ratings to take per subject and per rater.  The input data needed to run this module is described in the figure below. To allow the software to suggest the most meaningful recommendations, it is essential to provide the following:

  • The maximum number of subjects, raters and replicates you would consider as shown in the figure below. In this example I assume a maximum of 500 subjects, 10 raters and 100 replicates will be considered for the experiment.

  • The desired statistical power for detecting the Minimum Detectable Diffefference (MDD) of 0.1 would be 90%. Feel free to modify the MDD and desired power values.

intraclass correlation sample size determination

Analysis with AgreeStat/360

To see how AgreeStat360 processes this dataset to produce various agreement coefficients, please play the video below.  This video can also be watched on for more clarity if needed.


The output that AgreeStat360 produces is shown below.  It contains among other things, the 4-column "Power Table" on the right side showing the magnitude of the power associated with the number of subjects, raters and replicates in the first 3 columns.

  • Using the radio buttons at the top of the input form, indicate whether it is the inter-rater or the intra-rater reliability experiment that is being planned.

  • The highlighted row shows that with 36 subjects, 2 raters and 7 ratings per rater and per subject, for a total of 504 ratings, one can expect to achieve a good power of 0.9015. 

  • If a particular row is of interest due to the number of raters and replicates deemed adequate (e.g. 2 and 7 respectively), you may then use the box at the bottom to modify the number of subjects only and observe how that affects the power.

intraclass correlation sample size calculation with AgreeStat360