Information

Related Research Units

Research Background

Paul Avillach holds an MD in public health and epidemiology from the University of Bordeaux and a PhD in biomedical informatics from the University of Marseilles. Avillach's research focuses on the development of novel methods and techniques for the integration of multiple heterogeneous clinic cohorts, electronic health records data, and multiple types of genomics data to encompass biological observations. He is PI and Co-Investigator on several large projects at DBMI, including the BD2K PIC-SURE Center of Excellence, the Global Rare Diseases Registry project, the PCORI ARCH project, and the PCORI Phelan-Mcdermid Syndrome project.

Visit the Avillach Lab for more information.

Publications

  1. PIC-SURE: an open-source platform for integrating clinical and genomic data. NPJ Digit Med. 2025 Dec 30; 9(1):96. View Abstract
  2. Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy. Pediatr Neurol. 2025 Dec; 173:149-155. View Abstract
  3. VarPPUD: Pinpointing diagnostic variants from sets of prioritized, strong candidate variants. PLoS Comput Biol. 2025 Sep; 21(9):e1013414. View Abstract
  4. Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy. medRxiv. 2025 Feb 13. View Abstract
  5. Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study. JMIR Med Inform. 2025 Jan 22; 13:e54133. View Abstract
  6. Phenome-wide profiling identifies genotype-phenotype associations in Phelan-McDermid syndrome using family-sourced data from an international registry. Mol Autism. 2024 09 30; 15(1):40. View Abstract
  7. Development and validation of an open-source pipeline for automatic population of case report forms from electronic health records: a pediatric multi-center prospective study. EBioMedicine. 2024 Oct; 108:105337. View Abstract
  8. VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. medRxiv. 2024 Apr 20. View Abstract
  9. Neurological diagnoses in hospitalized COVID-19 patients associated with adverse outcomes: A multinational cohort study. PLOS Digit Health. 2024 Apr; 3(4):e0000484. View Abstract
  10. Clinical phenotypes and outcomes in children with multisystem inflammatory syndrome across SARS-CoV-2 variant eras: a multinational study from the 4CE consortium. EClinicalMedicine. 2023 Oct; 64:102212. View Abstract
  11. Building a collaborative cloud platform to accelerate heart, lung, blood, and sleep research. J Am Med Inform Assoc. 2023 06 20; 30(7):1293-1300. View Abstract
  12. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform. 2023 03; 139:104306. View Abstract
  13. Acute respiratory distress syndrome after SARS-CoV-2 infection on young adult population: International observational federated study based on electronic health records through the 4CE consortium. PLoS One. 2023; 18(1):e0266985. View Abstract
  14. Hospitalizations Associated With Mental Health Conditions Among Adolescents in the US and France During the COVID-19 Pandemic. JAMA Netw Open. 2022 12 01; 5(12):e2246548. View Abstract
  15. Long-term kidney function recovery and mortality after COVID-19-associated acute kidney injury: An international multi-centre observational cohort study. EClinicalMedicine. 2023 Jan; 55:101724. View Abstract
  16. SurvMaximin: Robust federated approach to transporting survival risk prediction models. J Biomed Inform. 2022 10; 134:104176. View Abstract
  17. Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization. J Biomed Inform. 2022 09; 133:104147. View Abstract
  18. International electronic health record-derived post-acute sequelae profiles of COVID-19 patients. NPJ Digit Med. 2022 Jun 29; 5(1):81. View Abstract
  19. Changes in laboratory value improvement and mortality rates over the course of the pandemic: an international retrospective cohort study of hospitalised patients infected with SARS-CoV-2. BMJ Open. 2022 06 23; 12(6):e057725. View Abstract
  20. International comparisons of laboratory values from the 4CE collaborative to predict COVID-19 mortality. NPJ Digit Med. 2022 Jun 13; 5(1):74. View Abstract
  21. Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res. 2022 05 18; 24(5):e37931. View Abstract
  22. Validation of a computational phenotype for finding patients eligible for genetic testing for pathogenic PTEN variants across three centers. J Neurodev Disord. 2022 03 23; 14(1):24. View Abstract
  23. Long-term Survival after Hematopoietic Cell Transplant for Sickle Cell Disease Compared to the United States Population. Transplant Cell Ther. 2022 06; 28(6):325.e1-325.e7. View Abstract
  24. Streamlining statistical reproducibility: NHLBI ORCHID clinical trial results reproduction. JAMIA Open. 2022 Apr; 5(1):ooac001. View Abstract
  25. Multi-PheWAS intersection approach to identify sex differences across comorbidities in 59 140 pediatric patients with autism spectrum disorder. J Am Med Inform Assoc. 2022 01 12; 29(2):230-238. View Abstract
  26. Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res. 2021 Nov 30; 23(11):e34625. View Abstract
  27. Multinational characterization of neurological phenotypes in patients hospitalized with COVID-19. Sci Rep. 2021 10 12; 11(1):20238. View Abstract
  28. International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res. 2021 10 11; 23(10):e31400. View Abstract
  29. Medication Use in the Management of Comorbidities Among Individuals With Autism Spectrum Disorder From a Large Nationwide Insurance Database. JAMA Pediatr. 2021 09 01; 175(9):957-965. View Abstract
  30. Finding commonalities in rare diseases through the undiagnosed diseases network. J Am Med Inform Assoc. 2021 07 30; 28(8):1694-1702. View Abstract
  31. Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc. 2021 07 14; 28(7):1411-1420. View Abstract
  32. National Trends in Disease Activity for COVID-19 Among Children in the US. Front Pediatr. 2021; 9:700656. View Abstract
  33. A high-throughput phenotyping algorithm is portable from adult to pediatric populations. J Am Med Inform Assoc. 2021 06 12; 28(6):1265-1269. View Abstract
  34. International Analysis of Electronic Health Records of Children and Youth Hospitalized With COVID-19 Infection in 6 Countries. JAMA Netw Open. 2021 06 01; 4(6):e2112596. View Abstract
  35. Retracted: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning. JMIR Med Inform. 2021 04 07; 9(4):e24754. View Abstract
  36. What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask. J Med Internet Res. 2021 03 02; 23(3):e22219. View Abstract
  37. International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study. medRxiv. 2021 Feb 05. View Abstract
  38. Multinational Prevalence of Neurological Phenotypes in Patients Hospitalized with COVID-19. medRxiv. 2021 Jan 29. View Abstract
  39. GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets. Brief Bioinform. 2021 01 18; 22(1):55-65. View Abstract
  40. Population attitudes toward contraceptive methods over time on a social media platform. Am J Obstet Gynecol. 2021 06; 224(6):597.e1-597.e14. View Abstract
  41. Vascular and metabolic risk factor differences prior to dementia diagnosis: a multidatabase case-control study using European electronic health records. BMJ Open. 2020 11 14; 10(11):e038753. View Abstract
  42. The urgent need for research coordination to advance knowledge on COVID-19 in children. Pediatr Res. 2021 08; 90(2):250-252. View Abstract
  43. Association of Affordable Care Act Implementation With Ambulance Utilization for Asthma Emergencies in New York City, 2008-2018. JAMA Netw Open. 2020 11 02; 3(11):e2025586. View Abstract
  44. The case for open science: rare diseases. JAMIA Open. 2020 Oct; 3(3):472-486. View Abstract
  45. Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services. J Am Med Inform Assoc. 2020 09 01; 27(9):1425-1430. View Abstract
  46. Adverse drug event presentation and tracking (ADEPT): semiautomated, high throughput pharmacovigilance using real-world data. JAMIA Open. 2020 Oct; 3(3):413-421. View Abstract
  47. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med. 2020; 3:109. View Abstract
  48. A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nat Med. 2020 09; 26(9):1375-1379. View Abstract
  49. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience. 2020 08 01; 9(8). View Abstract
  50. Development and validation of a Paediatric Early Warning Score for use in the emergency department: a multicentre study. Lancet Child Adolesc Health. 2020 08; 4(8):583-591. View Abstract
  51. A Semi-Automated Approach for Multilingual Terminology Matching: Mapping the French Version of the ICD-10 to the ICD-10 CM. Stud Health Technol Inform. 2020 Jun 16; 270:18-22. View Abstract
  52. Treatment pathway analysis of newly diagnosed dementia patients in four electronic health record databases in Europe. Soc Psychiatry Psychiatr Epidemiol. 2021 Mar; 56(3):409-416. View Abstract
  53. Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open. 2020 Jul; 3(2):185-189. View Abstract
  54. Methotrexate and relative risk of dementia amongst patients with rheumatoid arthritis: a multi-national multi-database case-control study. Alzheimers Res Ther. 2020 04 06; 12(1):38. View Abstract
  55. dbgap2x: an R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP). Bioinformatics. 2020 02 15; 36(4):1305-1306. View Abstract
  56. Correction: The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet Med. 2020 Feb; 22(2):449. View Abstract
  57. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst. 2019 11 27; 9(5):417-421. View Abstract
  58. Non-alcoholic fatty liver disease and risk of incident acute myocardial infarction and stroke: findings from matched cohort study of 18 million European adults. BMJ. 2019 10 08; 367:l5367. View Abstract
  59. A framework for the investigation of rare genetic disorders in neuropsychiatry. Nat Med. 2019 10; 25(10):1477-1487. View Abstract
  60. The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet Med. 2020 02; 22(2):371-380. View Abstract
  61. An exploratory phenome wide association study linking asthma and liver disease genetic variants to electronic health records from the Estonian Biobank. PLoS One. 2019; 14(4):e0215026. View Abstract
  62. Associations of antepartum suicidal behaviour with adverse infant and obstetric outcomes. Paediatr Perinat Epidemiol. 2019 03; 33(2):137-144. View Abstract
  63. Comparison of variation in frequency for SNPs associated with asthma or liver disease between Estonia, HapMap populations and the 1000 genome project populations. Int J Immunogenet. 2019 Apr; 46(2):49-58. View Abstract
  64. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol. 2019 Feb; 34(2):153-162. View Abstract
  65. Real-world data reveal a diagnostic gap in non-alcoholic fatty liver disease. BMC Med. 2018 08 13; 16(1):130. View Abstract
  66. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018 05 29; 18(1):30. View Abstract
  67. Adverse obstetric and neonatal outcomes complicated by psychosis among pregnant women in the United States. BMC Pregnancy Childbirth. 2018 05 02; 18(1):120. View Abstract
  68. Rcupcake: an R package for querying and analyzing biomedical data through the BD2K PIC-SURE RESTful API. Bioinformatics. 2018 04 15; 34(8):1431-1432. View Abstract
  69. Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women. PLoS One. 2018; 13(2):e0192943. View Abstract
  70. Development of the Precision Link Biobank at Boston Children's Hospital: Challenges and Opportunities. J Pers Med. 2017 Dec 15; 7(4). View Abstract
  71. Health assessment of French university students and risk factors associated with mental health disorders. PLoS One. 2017; 12(11):e0188187. View Abstract
  72. Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports. Am J Med Genet B Neuropsychiatr Genet. 2018 10; 177(7):613-624. View Abstract
  73. Dementia prevalence and incidence in a federation of European Electronic Health Record databases: The European Medical Informatics Framework resource. Alzheimers Dement. 2018 02; 14(2):130-139. View Abstract
  74. CodeMapper: semiautomatic coding of case definitions. A contribution from the ADVANCE project. Pharmacoepidemiol Drug Saf. 2017 Aug; 26(8):998-1005. View Abstract
  75. Combining clinical and genomics queries using i2b2 - Three methods. PLoS One. 2017; 12(4):e0172187. View Abstract
  76. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience. Int J Med Inform. 2017 06; 102:21-28. View Abstract
  77. A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data. 2016 10 25; 3:160096. View Abstract
  78. Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project. PLoS One. 2016; 11(8):e0160648. View Abstract
  79. An informatics research agenda to support precision medicine: seven key areas. J Am Med Inform Assoc. 2016 07; 23(4):791-5. View Abstract
  80. Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies. EGEMS (Wash DC). 2016; 4(1):1189. View Abstract
  81. Prevalence of Inflammatory Bowel Disease Among Patients with Autism Spectrum Disorders. Inflamm Bowel Dis. 2015 Oct; 21(10):2281-8. View Abstract
  82. Evaluating the Impact of Computerized Provider Order Entry on Medical Students Training at Bedside: A Randomized Controlled Trial. PLoS One. 2015; 10(9):e0138094. View Abstract
  83. Detection of Drug-Drug Interactions Inducing Acute Kidney Injury by Electronic Health Records Mining. Drug Saf. 2015 Sep; 38(9):799-809. View Abstract
  84. [Limiting a Medline/PubMed query to the "best" articles using the JCR relative impact factor]. Rev Epidemiol Sante Publique. 2014 Dec; 62(6):361-5. View Abstract
  85. Guide to good practices to ensure privacy protection in secondary use of medical records. Rev Epidemiol Sante Publique. 2014 Jun; 62(3):207-14. View Abstract
  86. Etiologies and diagnostic work-up of extreme macrocytosis defined by an erythrocyte mean corpuscular volume over 130°fL: A study of 109 patients. Am J Hematol. 2014 Jun; 89(6):665-6. View Abstract
  87. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform. 2015 Mar; 16(2):280-90. View Abstract
  88. Signal detection of potentially drug-induced acute liver injury in children using a multi-country healthcare database network. Drug Saf. 2014 Feb; 37(2):99-108. View Abstract
  89. Urinary retinol binding protein is a marker of the extent of interstitial kidney fibrosis. PLoS One. 2014; 9(1):e84708. View Abstract
  90. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput Biol. 2013; 9(12):e1003405. View Abstract
  91. Gathering and exploring scientific knowledge in pharmacovigilance. PLoS One. 2013; 8(12):e83016. View Abstract
  92. Characteristics and outcomes of sudden cardiac arrest during sports in women. Circ Arrhythm Electrophysiol. 2013 Dec; 6(6):1185-91. View Abstract
  93. Drug-induced acute myocardial infarction: identifying 'prime suspects' from electronic healthcare records-based surveillance system. PLoS One. 2013; 8(8):e72148. View Abstract
  94. Major regional disparities in outcomes after sudden cardiac arrest during sports. Eur Heart J. 2013 Dec; 34(47):3632-40. View Abstract
  95. Pilot evaluation of an automated method to decrease false-positive signals induced by co-prescriptions in spontaneous reporting databases. Pharmacoepidemiol Drug Saf. 2014 Feb; 23(2):186-94. View Abstract
  96. A reference standard for evaluation of methods for drug safety signal detection using electronic healthcare record databases. Drug Saf. 2013 Jan; 36(1):13-23. View Abstract
  97. The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf. 2013 May; 22(5):459-67. View Abstract
  98. Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project. J Am Med Inform Assoc. 2013 May 01; 20(3):446-52. View Abstract
  99. Effect of competition bias in safety signal generation: analysis of a research database of spontaneous reports in France. Drug Saf. 2012 Oct 01; 35(10):855-64. View Abstract
  100. Risk factors and clinical outcome of unsuspected pulmonary embolism in cancer patients: a case-control study. J Thromb Haemost. 2012 Oct; 10(10):2032-8. View Abstract
  101. Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc. 2013 Jan 01; 20(1):184-92. View Abstract
  102. Automatic filtering and substantiation of drug safety signals. PLoS Comput Biol. 2012; 8(4):e1002457. View Abstract
  103. EU-ADR healthcare database network vs. spontaneous reporting system database: preliminary comparison of signal detection. Stud Health Technol Inform. 2011; 166:25-30. View Abstract
  104. A potential competition bias in the detection of safety signals from spontaneous reporting databases. Pharmacoepidemiol Drug Saf. 2010 Nov; 19(11):1166-71. View Abstract
  105. Design and evaluation of a semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European EU-ADR project. Stud Health Technol Inform. 2010; 160(Pt 2):1085-9. View Abstract
  106. A semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European eu-ADR project. Stud Health Technol Inform. 2009; 150:190-4. View Abstract
  107. Using discharge abstracts to evaluate a regional perinatal network: assessment of the linkage procedure of anonymous data. Int J Telemed Appl. 2009; 2009:181842. View Abstract
  108. Improving the quality of the coding of primary diagnosis in standardized discharge summaries. Health Care Manag Sci. 2008 Jun; 11(2):147-51. View Abstract
  109. Using knowledge for indexing health web resources in a quality-controlled gateway. Stud Health Technol Inform. 2008; 136:205-10. View Abstract
  110. Building application-related patient identifiers: what solution for a European country? Int J Telemed Appl. 2008; 678302. View Abstract
  111. A model for indexing medical documents combining statistical and symbolic knowledge. AMIA Annu Symp Proc. 2007 Oct 11; 31-5. View Abstract
  112. Interoperability issues regarding patient identification in Europe. Annu Int Conf IEEE Eng Med Biol Soc. 2007; 2007:6161. View Abstract
  113. Proposal of a French health identification number interoperable at the European level. Stud Health Technol Inform. 2007; 129(Pt 1):503-7. View Abstract
  114. How to manage secure direct access of European patients to their computerized medical record and personal medical record. Stud Health Technol Inform. 2007; 127:246-55. View Abstract

Contact Paul Avillach

Phone: 857-891-0512
Email: