Research Overview

Timothy Miller's work in the field of clinical natural language processing (NLP) has covered a broad array of applications, from clinical research-enabling phenotyping applications as part of the i2b2 center for biomedical computing, to semantic processing of clinical texts, to core contributions to NLP and machine learning. A major thread that ties all this work together is an interest in the value of syntax. He has been responsible for syntactic contributions in temporal relation extraction (Lin etal, 2014, Miller et al, 2013 and Miller et al, in preparation), UMLS relation extraction (Dligach et al, 2013), coreference resolution (Miller et al, 2012, Zheng et al, 2012), and negation detection (Miller et al, in preparation). This also includes contribution of code to open source projects Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) and ClearTK. In cTAKES he developed a constituency parser module, and contributed syntactic features to all the relation extraction modules. In ClearTK he contributed java tree kernel code (part of their version 2.0 release) that dramatically improves tree kernel learning, and enables new kernel development. This code was the backbone for a new kernel (Descending Path Kernel) described
in Lin et al. (2014).

Despite these advances, he is struck by the diversity in clinical sub-domains and how this affects performance. He has been involved with several clinical language annotation projects, and has been lucky enough to be able to use these syntactic and semantic annotations. However, the difficulty of distributing clinical data and the differences between domains will limit the applicability of methods developed on only one corpus. Timothy saw first hand evidence of this by working on different coreference corpora (ODIE and i2b2 Challenge), where performance suffered greatly between corpora. As a result, he has come to be interested in approaches that make use of unsupervised structure learning and world knowledge extraction.

Publications

  1. Detecting stigmatizing language in clinical notes with large language models for addiction care. Npj Health Syst. 2026; 3(1):15. View Abstract
  2. Scaling Biomedical Knowledge Graph Retrieval for Interpretable Reasoning: Applications to Clinical Diagnosis Prediction. medRxiv. 2026 Jan 13. View Abstract
  3. Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study. medRxiv. 2025 Aug 01. View Abstract
  4. FDA Approval of Cardiac Valve Devices Implanted in a National Cohort of Pediatric Patients, 2016-2022. JAMA Pediatr. 2025 May 01; 179(5):570-573. View Abstract
  5. Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies. J Am Med Inform Assoc. 2025 02 01; 32(2):357-364. View Abstract
  6. When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? Find ACL EMNLP. 2024 Nov; 2024:5414-5428. View Abstract
  7. Generalizable clinical note section identification with large language models. JAMIA Open. 2024 Oct; 7(3):ooae075. View Abstract
  8. Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence. J Am Med Inform Assoc. 2024 Aug 01; 31(8):1638-1647. View Abstract
  9. Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study. J Med Internet Res. 2024 04 04; 26:e53367. View Abstract
  10. Cumulus: A federated EHR-based learning system powered by FHIR and AI. medRxiv. 2024 Feb 06. View Abstract
  11. The SMART Text2FHIR Pipeline. AMIA Annu Symp Proc. 2023; 2023:514-520. View Abstract
  12. Improving model transferability for clinical note section classification models using continued pretraining. J Am Med Inform Assoc. 2023 12 22; 31(1):89-97. View Abstract
  13. A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital. JAMIA Open. 2023 Oct; 6(3):ooad047. View Abstract
  14. Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles. Proc Conf Assoc Comput Linguist Meet. 2023 Jul; 2023:125-130. View Abstract
  15. Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy. JCO Clin Cancer Inform. 2023 07; 7:e2300048. View Abstract
  16. Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. JCO Clin Cancer Inform. 2023 05; 7:e2200196. View Abstract
  17. Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining. medRxiv. 2023 Apr 24. View Abstract
  18. The SMART Text2FHIR Pipeline. medRxiv. 2023 Mar 27. View Abstract
  19. Classifying unstructured electronic consult messages to understand primary care physician specialty information needs. J Am Med Inform Assoc. 2022 08 16; 29(9):1607-1617. View Abstract
  20. US Food and Drug Administration Approval of High-risk Cardiovascular Devices for Use in Children and Adolescents, 1977-2021. JAMA. 2022 08 09; 328(6):580-582. View Abstract
  21. Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence. J Biomed Inform. 2022 10; 134:104151. View Abstract
  22. Improving FDA postmarket adverse event reporting for medical devices. BMJ Evid Based Med. 2023 04; 28(2):83-84. View Abstract
  23. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys. 2021 Jul 01; 110(3):641-655. View Abstract
  24. Pre-training phenotyping classifiers. J Biomed Inform. 2021 01; 113:103626. View Abstract
  25. Incorporating Risk Factor Embeddings in Pre-trained Transformers Improves Sentiment Prediction in Psychiatric Discharge Summaries. Proc Conf Empir Methods Nat Lang Process. 2020 Nov; 2020:35-40. View Abstract
  26. Classifying Electronic Consults for Triage Status and Question Type. Proc Conf Assoc Comput Linguist Meet. 2020 Jul; 2020:1-6. View Abstract
  27. Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open. 2020 Jul; 3(2):185-189. View Abstract
  28. Rethinking domain adaptation for machine learning over clinical language. JAMIA Open. 2020 Jul; 3(2):146-150. View Abstract
  29. Does BERT need domain adaptation for clinical negation detection? J Am Med Inform Assoc. 2020 04 01; 27(4):584-591. View Abstract
  30. Supervised methods to extract clinical events from cardiology reports in Italian. J Biomed Inform. 2019 07; 95:103219. View Abstract
  31. Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform. 2017 05; 69:251-258. View Abstract
  32. Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2016 Mar; 23(2):387-95. View Abstract
  33. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc. 2015 Apr; 22(e1):e151-61. View Abstract
  34. ClinicalTrials.gov as a data source for semi-automated point-of-care trial eligibility screening. PLoS One. 2014; 9(10):e111055. View Abstract
  35. Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013; 8(8):e69932. View Abstract
  36. A system for coreference resolution for the clinical narrative. J Am Med Inform Assoc. 2012 Jul-Aug; 19(4):660-7. View Abstract

Contact Timothy Miller

Email: