Randomized Controlled Trials in Medical AI
A Methodological Critique
Keywords:Artificial Intelligence, Randomised Controlled Trials, Clinical Methodology, Machine Learning, Medical Diagnosis
Various publications claim that medical AI systems perform as well, or better, than clinical experts. However, there have been very few controlled trials and the quality of existing studies has been called into question. There is growing concern that existing studies overestimate the clinical benefits of AI systems. This has led to calls for more, and higher-quality, randomized controlled trials of medical AI systems. While this a welcome development, AI RCTs raise novel methodological challenges that have seen little discussion. We discuss some of the challenges arising in the context of AI RCTs and make some suggestions for how to meet them.
Beede, Emma, Elizabeth Baylor, Fred Hersch, Anna Iurchenko, Lauren Wilcox, Paisan Ruamviboonsuk, and Laura M. Vardoulakis. 2020. “A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy.” In CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–12. New York: Association for Computing Machinery. https://doi.org/10.1145/3313831.3376718.
Berner, Eta S. and Tonya J. La Lande. 2016. “Overview of Clinical Decision Support Systems.” In Clinical Decision Support Systems: Theory and Practice, edited by Eta Berner, 1–17. Health Informatics series. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-319-31913-1_1.
Biddle, Justin E. Forthcoming. “On Predicting Recidivism: Epistemic Risk, Tradeoffs, and Values in Machine Learning.” Canadian Journal of Philosophy.
Bjerring, Jens Christian and Jacob Busch. 2020. “Artificial Intelligence and Patient-Centered Decision-Making.” Philosophy & Technology. https://doi.org/10.1007/s13347-019-00391-6.
Cai, Carrie J., Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg et al. 2019. “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making.” In CHI ’19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14. New York: Association for Computing Machinery. https://doi.org/10.1145/3290605.3300234.
Cartwright, Nancy. 2007. “Are RCTs the Gold Standard?” BioSocieties 2 (1): 11–20. https://doi.org/10.1017/S1745855207005029.
Cruz Rivera, Samantha, Xiaoxuan Liu, An-Wen Chan, Alastair K. Denniston, Melanie J. Calvert and the SPIRIT-AI and CONSORT-AI Working Group. 2020. “Guidelines for Clinical Trial Protocols for Interventions Involving Artificial Intelligence: The SPIRIT-AI Extension.” The Lancet Digital Health 2, no. 10: e549–e560. https://doi.org/10.1016/S2589-7500(20)30219-3.
Deaton, Angus and Nancy Cartwright. 2018. “Understanding and Misunderstanding Randomized Controlled Trials.” Social Science & Medicine 210: 2–21. https://doi.org/10.1016/j.socscimed.2017.12.005.
Erasmus, Adrian, Tyler Brunet and Eyal Fisher. 2020. “What is Interpretability?” Philosophy & Technology. https://doi.org/10.1007/s13347-020-00435-2.
Esteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau and Sebastian Thrun. 2017. “Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks.” Nature 542 (7639): 115–118. https://doi.org/10.1038/nature21056.
Friedman, Lawrence, Curt D. Furberg and David L. DeMets. 2010. Fundamentals of Clinical Trials. Fourth edition. New York: Springer.
Fuller, Jonathan. 2019. “The Confounding Question of Confounding Causes in Randomized Trials.” British Journal for the Philosophy of Science 70 (3): 901–926. https://doi.org/10.1093/bjps/axx015.
Gong, Dexin, Lianlian Wu, Jun Zhang, Ganggang Mu, Lei Shen, Jun Liu, Zhengqiang Wang et al. 2020. “Detection of Colorectal Adenomas with a Real-Time Computer-Aided System (ENDOANGEL): A Randomised Controlled Study.” The Lancet Gastroenterology & Hepatology 5, no. 4: 352–361. https://doi.org/10.1016/S2468-1253(19)30413-3.
Grote, Thomas and Philipp Berens. 2020. “On the Ethics of Algorithmic Decision-Making in Healthcare.” Journal of Medical Ethics 46, no. 3: 205–211. http://dx.doi.org/10.1136/medethics-2019-105586.
Grote, Thomas and Philipp Berens. Forthcoming. “Uncertainty, Evidence, and the Integration of Machine Learning into Medical Practice.” The Journal of Medicine and Philosophy.
Gulshan, Varun, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan et al. 2016. “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.” JAMA 316, no. 22: 2402–2410. https://doi.org/10.1001/jama.2016.17216.
Hernán, Miguel Angel. 2004. “A Definition of Causal Effect for Epidemiological Research.” Journal of Epidemiology & Community Health 58, no. 4: 265–271. http://dx.doi.org/10.1136/jech.2002.006361.
Johnson, Gabbrielle M. 2020. “Algorithmic Bias: On the Implicit Biases of Social Technology.” Synthese. https://doi.org/10.1007/s11229-020-02696-y.
Lalumera, Elisabetta and Stefano Fanti. 2019. “Randomized Controlled Trials for Medical Imaging: Conceptual and Practical Problems.” Topoi 38, no. 2: 395–400. https://doi.org/10.1007/s11245-017-9535-z.
Lin, Haotian, Ruiyang Li, Zhenzhen Liu, Jingjing Chen, Yahan Yang, Hui Chen, Zhuoling Lin et al. 2019. “Diagnostic Efficacy and Therapeutic Decision-Making Capacity of an Artificial Intelligence Platform for Childhood Cataracts in Eye Clinics: A Multicentre Randomized Controlled Trial.” EClinicalMedicine 9: 52–59. https://doi.org/10.1016/j.eclinm.2019.03.001.
Liu, Xiaoxuan, Samantha Cruz Rivera, David Moher, Melanie J. Calvert, Alastair K. Denniston and and the SPIRIT-AI and CONSORT-AI Working Group. 2020. “Reporting Guidelines for Clinical Trial Reports for Interventions Involving Artificial Intelligence: The CONSORT-AI Extension.” The Lancet Digital Health 2, no. 10: e537–e548. https://doi.org/10.1016/S2589-7500(20)30218-1.
Liu, Xiaoxuan, Livia Faes, Aditya U. Kale, Siegfried K. Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran et al. 2019. “A Comparison of Deep Learning Performance Against Health-Care Professionals in Detecting Diseases from Medical Imaging: A Systematic Review and Meta-Analysis.” The Lancet Digital Health 1, no. 6: e271–e297. https://doi.org/10.1016/S2589-7500(19)30123-2.
Liu, Yuan, Ayush Jain, Clara Eng, David H. Way, Kang Lee, Peggy Bui, Kimberly Kanada et al. 2020. “A Deep Learning System for Differential Diagnosis of Skin Diseases.” Nature Medicine 26, no. 6: 900–908. https://doi.org/10.1038/s41591-020-0842-3.
Long, Erping, Haotian Lin, Zhenzhen Liu, Xiaohang Wu, Liming Wang, Jiewei Jiang, Yingying An et al. 2017. “An Artificial Intelligence Platform for the Multihospital Collaborative Management of Congenital Cataracts.” Nature Biomedical Engineering 1, no. 2. https://doi.org/10.1038/s41551-016-0024.
McKinney, Scott M., Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian, Trevor Back et al. 2020. “International Evaluation of an AI System for Breast Cancer Screening.” Nature 577: 89–94. https://doi.org/10.1038/s41586-019-1799-6.
Mongan, John, Linda Moy and Charles E. Kahn. 2020. “Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers.” Radiology: Artificial Intelligence 2, no. 2: e200029. https://doi.org/10.1148/ryai.2020200029.
Mori, Yuichi, Shin-ei Kudo and Masashi Misawa. 2020. “Can Artificial Intelligence Standardise Colonoscopy Quality?” The Lancet Gastroenterology & Hepatology 5, no. 4: 331–332. https://doi.org/10.1016/S2468-1253(19)30407-8.
Nagendran, Myura, Yang Chen, Christopher A. Lovejoy, Anthony C. Gordon, Matthieu Komorowski, Hugh Harvey, Eric J. Topol, John P.A. Ioannidis, Gary S. Collins and Mahiben Maruthappu. 2020. “Artificial Intelligence Versus Clinicians: Systematic Review of Design, Reporting Standards, and Claims of Deep Learning Studies.” BMJ 368:m689. https://doi.org/10.1136/bmj.m689.
Oren, Ohad, Bernard J. Gersh and Deepak L. Bhatt. 2020. “Artificial Intelligence in Medical Imaging: Switching from Radiographic Pathological Data to Clinically Meaningful Endpoints.” The Lancet Digital Health 2, no. 9: e486–e488. https://doi.org/10.1016/S2589-7500(20)30160-6.
Park, Yoonyoung, Gretchen Purcell Jackson, Morgan A. Foreman, Daniel Gruen, Jianying Hu and Amar K. Das. 2020. “Evaluating Artificial Intelligence in Medicine: Phases of Clinical Research.” JAMIA Open 3, no. 3: 326–331. https://doi.org/10.1093/jamiaopen/ooaa033.
Russo, Federica and Jon Williamson. 2007. “Interpreting Causality in the Health Sciences.” International Studies in the Philosophy of Science 21, no. 2: 157–170. https://doi.org/10.1080/02698590701498084.
Schaffner, Ken, ed. 1985. Logic of Discovery and Diagnosis in Medicine. Pittsburgh Series in Philosophy and History of Science. Berkeley: University of California Press.
Senn, Stephen. 2013. “Seven Myths of Randomization in Clinical Trials.” Statistics in Medicine 32, no. 9: 1439–1450. https://doi.org/10.1002/sim.5713.
Steel, Daniel. 2011. “Causal Inference and Medical Experiments.” Gifford, Fred (Ed.): Handbook of the Philosophy of Science: Philosophy of Medicine. Vol. 16. North-Holland: 159-185. https://doi.org/10.1016/B978-0-444-51787-6.50006-4.
Su, Jing-Ran, Zhen Li, Xue-Jun Shao, Chao-Ran Ji, Rui Ji, Ru-Chen Zhou, Guang-Chao Li et al. 2020. “Impact of a Real-Time Automatic Quality Control System on Colorectal Polyp and Adenoma Detection: A Prospective Randomized Controlled Study (With Videos).” Gastrointestinal Endoscopy 91, no. 2: 415–424. https://doi.org/10.1016/j.gie.2019.08.026.
Sullivan, Emily. Forthcoming. “Understanding from Machine Learning Models.” British Journal for the Philosophy of Science.
Topol, Eric J. 2019. “High-Performance Medicine: The Convergence of Human and Artificial Intelligence.” Nature Medicine 25, no. 1: 44–56. https://doi.org/10.1038/s41591-018-0300-7.
———. 2020. “Welcoming New Guidelines for AI Clinical Research.” Nature Medicine 26, no. 9: 1318–1320. https://doi.org/10.1038/s41591-020-1042-x.
Tschandl, Philipp, Christoph Rinner, Zoe Apalla, Giuseppe Argenziano, Noel Codella, Allan Halpern, Monika Janda et al. 2020. “Human-Computer Collaboration for Skin Cancer Recognition.” Nature Medicine 26, no. 8: 1229–1234. https://doi.org/10.1038/s41591-020-0942-0.
Urbach, Peter. 1985. “Randomization and the Design of Experiments.” Philosophy of Science 52, no. 2: 256–273. https://doi.org/10.1086/289243.
———. 1993. “The Value of Randomization and Control in Clinical Trials.” Statistics in Medicine 12, no. 15–16: 1421–1431. https://doi.org/10.1002/sim.4780121508.
Varghese, Julian, Maren Kleine, Sophia Isabella Gessner, Sarah Sandmann and Martin Dugas. 2018. “Effects of Computerized Decision Support System Implementations on Patient Outcomes in Inpatient Care: A Systematic Review.” J Am Med Inform Assoc 25, no. 5: 593–602. https://doi.org/10.1093/jamia/ocx100.
Vleugels, Jasper L.A., Yark Hazewinkel, Paul Fockens and Evelien Dekker. 2017. “Natural History of Diminutive and Small Colorectal Polyps: A Systematic Literature Review.” Gastrointestinal Endoscopy 85, no. 6 (June): 1169–1176. https://doi.org/10.1016/j.gie.2016.12.014.
Wang, Pu, Tyler M. Berzin, Jeremy Romek Glissen Brown, Shishira Bharadwaj, Aymeric Becq, Xun Xiao, Peixi Liu et al. 2019. “Real-Time Automatic Detection System Increases Colonoscopic Polyp and Adenoma Detection Rates: A Prospective Randomised Controlled Study.” Gut 68, no. 10: 1813–1819. https://doi.org/10.1136/gutjnl-2018-317500.
Wang, Pu, Xiaogang Liu, Tyler M. Berzin, Jeremy R. Glissen Brown, Peixi Liu, Chao Zhou, M.M. Lei Lei et al. 2020. “Effect of a Deep-Learning Computer-Aided Detection System on Adenoma Detection During Colonoscopy (CADe-DB Trial): A Double-Blind Randomised Study.” The Lancet Gastroenterology & Hepatology 5, no. 4: 343–351. https://doi.org/10.1016/S2468-1253(19)30411-X.
Wijnberge, Marije, Bart F. Geerts, Liselotte Hol, Nikki Lemmers, Marijn P. Mulder, Patrick Berge, Jimmy Schenk et al. 2020. “Effect of a Machine Learning-Derived Early Warning System for Intraoperative Hypotension Vs Standard Care on Depth and Duration of Intraoperative Hypotension During Elective Noncardiac Surgery: The HYPE Randomized Clinical Trial.” JAMA 323, no. 11: 1052–1060.
Worrall, John. 2002 “What Evidence in Evidence‐Based Medicine?” Philosophy of Science 69, no. 3: 316–30. https://doi.org/10.1086/341855.
———. 2007. “Why There’s No Cause to Randomize.” British Journal for the Philosophy of Science 58, no. 3: 451–488. https://doi.org/10.1093/bjps/axm024.
———. 2010. “Evidence: Philosophy of Science Meets Medicine.” Journal of Evaluation in Clinical Practice 16, no. 2: 356–362. https://doi.org/10.1111/j.1365-2753.2010.01400.x.
Wu, Lianlian, Jun Zhang, Wei Zhou, Ping An, Lei Shen, Jun Liu, Xiaoda Jiang et al. 2019. “Randomised Controlled Trial of WISENSE, a Real-Time Quality Improving System for Monitoring Blind Spots During Esophagogastroduodenoscopy.” Gut 68, no. 12: 2161–2169. https://doi.org/10.1136/gutjnl-2018-317366.
How to Cite
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 International License or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a prepublication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.
- The Author agrees to digitally sign the Publisher’s final formatted PDF version of the Work.
Grant numbers (BE5601/4-1; Cluster of Excellence “Machine Learning—New Perspectives for Science”, EXC 2064, project number 390727645)