TY - JOUR
T1 - Retrospective use of the Pragmatic-Explanatory Continuum Indicator Summary-2 trial design tool to assess design choices in randomized controlled trials
T2 - an empirical review
AU - Willis, Andrew
AU - Shiely, Frances
AU - Howie, Alison H.
AU - Treweek, Shaun
AU - Taljaard, Monica
AU - Loudon, Kirsty
AU - Murphy, Ellen
AU - Bhakoo, Aarian
AU - Yazdani, Yasaman
AU - Ward, Frank
AU - Janiaud, Perrine
AU - Haren, Andrea
AU - Liang, Aileen Yining
AU - Robinson, Clare
AU - Deng, Daisy
AU - Hemkens, Lars
AU - Greene, Evelyn O.Sullivan
AU - Slattery, Laura
AU - Zwarenstein, Merrick
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/11
Y1 - 2025/11
N2 - Background and Objective: The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool has been widely used to help investigators design randomized trials, facilitating the task of aligning design choices with an explanatory or pragmatic primary trial intention. PRECIS-2 is increasingly being used to retrospectively assess the degree of pragmatism or explanatoriness among published trials within reviews. There is little information on the interrater reliability of the tool and no consensus on the preferred method of achieving an accurate and reliable judgment of trial “pragmatism” when using PRECIS-2 retrospectively. The aims of this study were to assess the level of pragmatism or explanatoriness of trials that cite PRECIS-2 and to assess interrater reliability of PRECIS-2 using different scoring approaches. We compared agreement between two independent ratings within a single pair with agreement between consensus scores reached by two independent pairs of reviewers and whether widening the agreement criteria increased interrater reliability. Methods: Thirty randomized controlled trials (RCTs) were randomly selected from trials citing the PRECIS-2 tool. Two pairs of reviewers, a clinician paired with a methodologist in each case, were trained and independently scored each trial and reached a consensus score within pairs. Agreement between reviewers within pairs and between consensus scores across pairs was assessed using kappa statistics for each of the nine PRECIS-2 domains. Results: RCTs citing PRECIS-2 had predominantly pragmatic design features. Interrater reliability within pairs was low across all domains, with the highest levels found in the two domains of analysis (0.32) and follow-up (0.33). Agreement across pairs on the consensus scores was similarly low. Agreement between reviewers and reviewer pairs was above 70% when agreement was reclassified as “within 1-point difference on the scoring scale” for eight domains, but no improvement was obtained for the remaining domain. Conclusion: Trials citing PRECIS-2 tend to have predominantly pragmatic design features. When using PRECIS-2 to retrospectively score trial publications, agreement between consensus scores across pairs of reviewers was no better than agreement within pairs. Reconfiguring the PRECIS scoring scale and improving scoring guidance may provide a more meaningful, easily interpreted measure of “pragmatism” for trialists wishing to use PRECIS-2 as a review tool. Plain Language Summary: The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool is designed to help researchers match their design decisions to the intended purpose of their trial. The intention of a trial can be “explanatory,” which improves our understanding of how an intervention works, or “pragmatic,” which supports decision-making in health care. Increasingly, the tool has been used for a secondary purpose: in systematic reviews. Here the tool is used to judge the level of “pragmatism” or “explanatoriness” of trials included in the review to aid the understanding of trial results. However, there is debate on the most reliable means of making this judgment. Sometimes judgements are made using one reviewer; other times, multiple reviewers. Our study evaluated interrater reliability of two methods of scoring trial publications using PRECIS-2: individual reviewer scores and pairs of reviewers agreeing on a consensus score. We also found that neither method we tested produced a reliable judgment using PRECIS-2, and the scores from two reviewers agreeing on a consensus were no more reliable than scores from a single reviewer. We performed an additional analysis that showed that simplifying the scoring from the original five-point scale to a three-point scale may give a more reliable judgment of the “pragmatism” or “explanatioriness” of published trials. This simpler method of scoring should be encouraged for retrospective use of PRECIS-2 in systematic reviews.
AB - Background and Objective: The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool has been widely used to help investigators design randomized trials, facilitating the task of aligning design choices with an explanatory or pragmatic primary trial intention. PRECIS-2 is increasingly being used to retrospectively assess the degree of pragmatism or explanatoriness among published trials within reviews. There is little information on the interrater reliability of the tool and no consensus on the preferred method of achieving an accurate and reliable judgment of trial “pragmatism” when using PRECIS-2 retrospectively. The aims of this study were to assess the level of pragmatism or explanatoriness of trials that cite PRECIS-2 and to assess interrater reliability of PRECIS-2 using different scoring approaches. We compared agreement between two independent ratings within a single pair with agreement between consensus scores reached by two independent pairs of reviewers and whether widening the agreement criteria increased interrater reliability. Methods: Thirty randomized controlled trials (RCTs) were randomly selected from trials citing the PRECIS-2 tool. Two pairs of reviewers, a clinician paired with a methodologist in each case, were trained and independently scored each trial and reached a consensus score within pairs. Agreement between reviewers within pairs and between consensus scores across pairs was assessed using kappa statistics for each of the nine PRECIS-2 domains. Results: RCTs citing PRECIS-2 had predominantly pragmatic design features. Interrater reliability within pairs was low across all domains, with the highest levels found in the two domains of analysis (0.32) and follow-up (0.33). Agreement across pairs on the consensus scores was similarly low. Agreement between reviewers and reviewer pairs was above 70% when agreement was reclassified as “within 1-point difference on the scoring scale” for eight domains, but no improvement was obtained for the remaining domain. Conclusion: Trials citing PRECIS-2 tend to have predominantly pragmatic design features. When using PRECIS-2 to retrospectively score trial publications, agreement between consensus scores across pairs of reviewers was no better than agreement within pairs. Reconfiguring the PRECIS scoring scale and improving scoring guidance may provide a more meaningful, easily interpreted measure of “pragmatism” for trialists wishing to use PRECIS-2 as a review tool. Plain Language Summary: The Pragmatic-Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool is designed to help researchers match their design decisions to the intended purpose of their trial. The intention of a trial can be “explanatory,” which improves our understanding of how an intervention works, or “pragmatic,” which supports decision-making in health care. Increasingly, the tool has been used for a secondary purpose: in systematic reviews. Here the tool is used to judge the level of “pragmatism” or “explanatoriness” of trials included in the review to aid the understanding of trial results. However, there is debate on the most reliable means of making this judgment. Sometimes judgements are made using one reviewer; other times, multiple reviewers. Our study evaluated interrater reliability of two methods of scoring trial publications using PRECIS-2: individual reviewer scores and pairs of reviewers agreeing on a consensus score. We also found that neither method we tested produced a reliable judgment using PRECIS-2, and the scores from two reviewers agreeing on a consensus were no more reliable than scores from a single reviewer. We performed an additional analysis that showed that simplifying the scoring from the original five-point scale to a three-point scale may give a more reliable judgment of the “pragmatism” or “explanatioriness” of published trials. This simpler method of scoring should be encouraged for retrospective use of PRECIS-2 in systematic reviews.
KW - Applicability
KW - Explanatory trial
KW - External validity
KW - Pragmatic trial
KW - Randomized controlled trial
KW - Trial design
KW - Trial intention
UR - https://www.scopus.com/pages/publications/105016458418
U2 - 10.1016/j.jclinepi.2025.111959
DO - 10.1016/j.jclinepi.2025.111959
M3 - Article
C2 - 40902865
AN - SCOPUS:105016458418
SN - 0895-4356
VL - 187
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
M1 - 111959
ER -