Resumen
The justification for making a measurement can be sought in asking what decisions are based on measurement, such as in assessing the compliance of a quality characteristic of an entity in relation to a specification limit, SL. The relative performance of testing devices and classification algorithms used in assessing compliance is often evaluated using the venerable and ever popular receiver operating characteristic (ROC). However, the ROC tool has potentially all the limitations of classic test theory (CTT) such as the non-linearity, effects of ordinality and confounding task difficulty and instrument ability. These limitations, inherent and often unacknowledged when using the ROC tool, are tackled here for the first time with a modernised approach combining measurement system analysis (MSA) and item response theory (IRT), using data from pregnancy testing as an example. The new method of assessing device ability from separate Rasch IRT regressions for each axis of ROC curves is found to perform significantly better, with correlation coefficients with traditional area-under-curve metrics of at least 0.92 which exceeds that of linearised ROC plots, such as Linacre?s, and is recommended to replace other approaches for device assessment. The resulting improved measurement quality of each ROC curve achieved with this original approach should enable more reliable decision-making in conformity assessment in many scenarios, including machine learning, where its use as a metric for assessing classification algorithms has become almost indispensable.