To measure agreement of plus disease diagnosis among retinopathy of prematurity (ROP) experts.
A set of 34 wide-angle retinal photographs from infants with ROP was compiled on a secure Web site and was interpreted independently by 22 recognized ROP experts. Diagnostic agreement was analyzed using 3-level (plus, pre-plus, or neither) and 2-level (plus or not plus) categorizations.
In the 3-level categorization, all experts agreed on the same diagnosis in 4 of 34 images (12%), and the mean weighted κ statistic for each expert compared with all others was between 0.21 and 0.40 (fair agreement) for 7 experts (32%) and between 0.41 and 0.60 (moderate agreement) for 15 experts (68%). In the 2-level categorization, all experts who provided a diagnosis agreed in 7 of 34 images (21%), and the mean κ statistic for each expert compared with all others was between 0 and 0.20 (slight agreement) for 1 expert (5%), between 0.21 and 0.40 (fair agreement) for 3 experts (14%), between 0.41 and 0.60 (moderate agreement) for 12 experts (55%), and between 0.61 and 0.80 (substantial agreement) for 6 experts (27%).
Interexpert agreement of plus disease diagnosis is imperfect. This may have important implications for clinical ROP management, continued refinement of the international ROP classification system, development of computer-based diagnostic algorithms, and implementation of ROP telemedicine systems.