Paul Avillach holds an MD in public health and epidemiology from the University of Bordeaux and a PhD in biomedical informatics from the University of Marseilles. Avillach's research focuses on the development of novel methods and techniques for the integration of multiple heterogeneous clinic cohorts, electronic health records data, and multiple types of genomics data to encompass biological observations. He is PI and Co-Investigator on several large projects at DBMI, including the BD2K PIC-SURE Center of Excellence, the Global Rare Diseases Registry project, the PCORI ARCH project, and the PCORI Phelan-Mcdermid Syndrome project.
PIC-SURE: an open-source platform for integrating clinical and genomic data. NPJ Digit Med. 2025 Dec 30; 9(1):96. View Abstract
Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy. Pediatr Neurol. 2025 Dec; 173:149-155. View Abstract
VarPPUD: Pinpointing diagnostic variants from sets of prioritized, strong candidate variants. PLoS Comput Biol. 2025 Sep; 21(9):e1013414. View Abstract
Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy. medRxiv. 2025 Feb 13. View Abstract
Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study. JMIR Med Inform. 2025 Jan 22; 13:e54133. View Abstract
Phenome-wide profiling identifies genotype-phenotype associations in Phelan-McDermid syndrome using family-sourced data from an international registry. Mol Autism. 2024 09 30; 15(1):40. View Abstract
Development and validation of an open-source pipeline for automatic population of case report forms from electronic health records: a pediatric multi-center prospective study. EBioMedicine. 2024 Oct; 108:105337. View Abstract
VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. medRxiv. 2024 Apr 20. View Abstract
Neurological diagnoses in hospitalized COVID-19 patients associated with adverse outcomes: A multinational cohort study. PLOS Digit Health. 2024 Apr; 3(4):e0000484. View Abstract
Clinical phenotypes and outcomes in children with multisystem inflammatory syndrome across SARS-CoV-2 variant eras: a multinational study from the 4CE consortium. EClinicalMedicine. 2023 Oct; 64:102212. View Abstract
Building a collaborative cloud platform to accelerate heart, lung, blood, and sleep research. J Am Med Inform Assoc. 2023 06 20; 30(7):1293-1300. View Abstract
Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform. 2023 03; 139:104306. View Abstract
Acute respiratory distress syndrome after SARS-CoV-2 infection on young adult population: International observational federated study based on electronic health records through the 4CE consortium. PLoS One. 2023; 18(1):e0266985. View Abstract
Hospitalizations Associated With Mental Health Conditions Among Adolescents in the US and France During the COVID-19 Pandemic. JAMA Netw Open. 2022 12 01; 5(12):e2246548. View Abstract
Long-term kidney function recovery and mortality after COVID-19-associated acute kidney injury: An international multi-centre observational cohort study. EClinicalMedicine. 2023 Jan; 55:101724. View Abstract
Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization. J Biomed Inform. 2022 09; 133:104147. View Abstract
International electronic health record-derived post-acute sequelae profiles of COVID-19 patients. NPJ Digit Med. 2022 Jun 29; 5(1):81. View Abstract
Changes in laboratory value improvement and mortality rates over the course of the pandemic: an international retrospective cohort study of hospitalised patients infected with SARS-CoV-2. BMJ Open. 2022 06 23; 12(6):e057725. View Abstract
International comparisons of laboratory values from the 4CE collaborative to predict COVID-19 mortality. NPJ Digit Med. 2022 Jun 13; 5(1):74. View Abstract
Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res. 2022 05 18; 24(5):e37931. View Abstract
Validation of a computational phenotype for finding patients eligible for genetic testing for pathogenic PTEN variants across three centers. J Neurodev Disord. 2022 03 23; 14(1):24. View Abstract
Long-term Survival after Hematopoietic Cell Transplant for Sickle Cell Disease Compared to the United States Population. Transplant Cell Ther. 2022 06; 28(6):325.e1-325.e7. View Abstract
Multi-PheWAS intersection approach to identify sex differences across comorbidities in 59 140 pediatric patients with autism spectrum disorder. J Am Med Inform Assoc. 2022 01 12; 29(2):230-238. View Abstract
Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res. 2021 Nov 30; 23(11):e34625. View Abstract
Multinational characterization of neurological phenotypes in patients hospitalized with COVID-19. Sci Rep. 2021 10 12; 11(1):20238. View Abstract
International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res. 2021 10 11; 23(10):e31400. View Abstract
Medication Use in the Management of Comorbidities Among Individuals With Autism Spectrum Disorder From a Large Nationwide Insurance Database. JAMA Pediatr. 2021 09 01; 175(9):957-965. View Abstract
Finding commonalities in rare diseases through the undiagnosed diseases network. J Am Med Inform Assoc. 2021 07 30; 28(8):1694-1702. View Abstract
Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc. 2021 07 14; 28(7):1411-1420. View Abstract
National Trends in Disease Activity for COVID-19 Among Children in the US. Front Pediatr. 2021; 9:700656. View Abstract
A high-throughput phenotyping algorithm is portable from adult to pediatric populations. J Am Med Inform Assoc. 2021 06 12; 28(6):1265-1269. View Abstract
International Analysis of Electronic Health Records of Children and Youth Hospitalized With COVID-19 Infection in 6 Countries. JAMA Netw Open. 2021 06 01; 4(6):e2112596. View Abstract
Retracted: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning. JMIR Med Inform. 2021 04 07; 9(4):e24754. View Abstract
What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask. J Med Internet Res. 2021 03 02; 23(3):e22219. View Abstract
International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study. medRxiv. 2021 Feb 05. View Abstract
Multinational Prevalence of Neurological Phenotypes in Patients Hospitalized with COVID-19. medRxiv. 2021 Jan 29. View Abstract
GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets. Brief Bioinform. 2021 01 18; 22(1):55-65. View Abstract
Population attitudes toward contraceptive methods over time on a social media platform. Am J Obstet Gynecol. 2021 06; 224(6):597.e1-597.e14. View Abstract
Vascular and metabolic risk factor differences prior to dementia diagnosis: a multidatabase case-control study using European electronic health records. BMJ Open. 2020 11 14; 10(11):e038753. View Abstract
The urgent need for research coordination to advance knowledge on COVID-19 in children. Pediatr Res. 2021 08; 90(2):250-252. View Abstract
Association of Affordable Care Act Implementation With Ambulance Utilization for Asthma Emergencies in New York City, 2008-2018. JAMA Netw Open. 2020 11 02; 3(11):e2025586. View Abstract
The case for open science: rare diseases. JAMIA Open. 2020 Oct; 3(3):472-486. View Abstract
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services. J Am Med Inform Assoc. 2020 09 01; 27(9):1425-1430. View Abstract
Adverse drug event presentation and tracking (ADEPT): semiautomated, high throughput pharmacovigilance using real-world data. JAMIA Open. 2020 Oct; 3(3):413-421. View Abstract
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med. 2020; 3:109. View Abstract
A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nat Med. 2020 09; 26(9):1375-1379. View Abstract
EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience. 2020 08 01; 9(8). View Abstract
Development and validation of a Paediatric Early Warning Score for use in the emergency department: a multicentre study. Lancet Child Adolesc Health. 2020 08; 4(8):583-591. View Abstract
A Semi-Automated Approach for Multilingual Terminology Matching: Mapping the French Version of the ICD-10 to the ICD-10 CM. Stud Health Technol Inform. 2020 Jun 16; 270:18-22. View Abstract
Treatment pathway analysis of newly diagnosed dementia patients in four electronic health record databases in Europe. Soc Psychiatry Psychiatr Epidemiol. 2021 Mar; 56(3):409-416. View Abstract
Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open. 2020 Jul; 3(2):185-189. View Abstract
Methotrexate and relative risk of dementia amongst patients with rheumatoid arthritis: a multi-national multi-database case-control study. Alzheimers Res Ther. 2020 04 06; 12(1):38. View Abstract
dbgap2x: an R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP). Bioinformatics. 2020 02 15; 36(4):1305-1306. View Abstract
Correction: The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet Med. 2020 Feb; 22(2):449. View Abstract
FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst. 2019 11 27; 9(5):417-421. View Abstract
Non-alcoholic fatty liver disease and risk of incident acute myocardial infarction and stroke: findings from matched cohort study of 18 million European adults. BMJ. 2019 10 08; 367:l5367. View Abstract
A framework for the investigation of rare genetic disorders in neuropsychiatry. Nat Med. 2019 10; 25(10):1477-1487. View Abstract
The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet Med. 2020 02; 22(2):371-380. View Abstract
An exploratory phenome wide association study linking asthma and liver disease genetic variants to electronic health records from the Estonian Biobank. PLoS One. 2019; 14(4):e0215026. View Abstract
Associations of antepartum suicidal behaviour with adverse infant and obstetric outcomes. Paediatr Perinat Epidemiol. 2019 03; 33(2):137-144. View Abstract
Comparison of variation in frequency for SNPs associated with asthma or liver disease between Estonia, HapMap populations and the 1000 genome project populations. Int J Immunogenet. 2019 Apr; 46(2):49-58. View Abstract
Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol. 2019 Feb; 34(2):153-162. View Abstract
Real-world data reveal a diagnostic gap in non-alcoholic fatty liver disease. BMC Med. 2018 08 13; 16(1):130. View Abstract
Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018 05 29; 18(1):30. View Abstract
Adverse obstetric and neonatal outcomes complicated by psychosis among pregnant women in the United States. BMC Pregnancy Childbirth. 2018 05 02; 18(1):120. View Abstract
Rcupcake: an R package for querying and analyzing biomedical data through the BD2K PIC-SURE RESTful API. Bioinformatics. 2018 04 15; 34(8):1431-1432. View Abstract
Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women. PLoS One. 2018; 13(2):e0192943. View Abstract
Development of the Precision Link Biobank at Boston Children's Hospital: Challenges and Opportunities. J Pers Med. 2017 Dec 15; 7(4). View Abstract
Health assessment of French university students and risk factors associated with mental health disorders. PLoS One. 2017; 12(11):e0188187. View Abstract
Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports. Am J Med Genet B Neuropsychiatr Genet. 2018 10; 177(7):613-624. View Abstract
Dementia prevalence and incidence in a federation of European Electronic Health Record databases: The European Medical Informatics Framework resource. Alzheimers Dement. 2018 02; 14(2):130-139. View Abstract
CodeMapper: semiautomatic coding of case definitions. A contribution from the ADVANCE project. Pharmacoepidemiol Drug Saf. 2017 Aug; 26(8):998-1005. View Abstract
Combining clinical and genomics queries using i2b2 - Three methods. PLoS One. 2017; 12(4):e0172187. View Abstract
The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience. Int J Med Inform. 2017 06; 102:21-28. View Abstract
A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data. 2016 10 25; 3:160096. View Abstract
Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project. PLoS One. 2016; 11(8):e0160648. View Abstract
An informatics research agenda to support precision medicine: seven key areas. J Am Med Inform Assoc. 2016 07; 23(4):791-5. View Abstract
Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies. EGEMS (Wash DC). 2016; 4(1):1189. View Abstract
Prevalence of Inflammatory Bowel Disease Among Patients with Autism Spectrum Disorders. Inflamm Bowel Dis. 2015 Oct; 21(10):2281-8. View Abstract
Evaluating the Impact of Computerized Provider Order Entry on Medical Students Training at Bedside: A Randomized Controlled Trial. PLoS One. 2015; 10(9):e0138094. View Abstract
Detection of Drug-Drug Interactions Inducing Acute Kidney Injury by Electronic Health Records Mining. Drug Saf. 2015 Sep; 38(9):799-809. View Abstract
[Limiting a Medline/PubMed query to the "best" articles using the JCR relative impact factor]. Rev Epidemiol Sante Publique. 2014 Dec; 62(6):361-5. View Abstract
Guide to good practices to ensure privacy protection in secondary use of medical records. Rev Epidemiol Sante Publique. 2014 Jun; 62(3):207-14. View Abstract
Etiologies and diagnostic work-up of extreme macrocytosis defined by an erythrocyte mean corpuscular volume over 130°fL: A study of 109 patients. Am J Hematol. 2014 Jun; 89(6):665-6. View Abstract
Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform. 2015 Mar; 16(2):280-90. View Abstract
Signal detection of potentially drug-induced acute liver injury in children using a multi-country healthcare database network. Drug Saf. 2014 Feb; 37(2):99-108. View Abstract
Urinary retinol binding protein is a marker of the extent of interstitial kidney fibrosis. PLoS One. 2014; 9(1):e84708. View Abstract
Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput Biol. 2013; 9(12):e1003405. View Abstract
Gathering and exploring scientific knowledge in pharmacovigilance. PLoS One. 2013; 8(12):e83016. View Abstract
Characteristics and outcomes of sudden cardiac arrest during sports in women. Circ Arrhythm Electrophysiol. 2013 Dec; 6(6):1185-91. View Abstract
Major regional disparities in outcomes after sudden cardiac arrest during sports. Eur Heart J. 2013 Dec; 34(47):3632-40. View Abstract
Pilot evaluation of an automated method to decrease false-positive signals induced by co-prescriptions in spontaneous reporting databases. Pharmacoepidemiol Drug Saf. 2014 Feb; 23(2):186-94. View Abstract
A reference standard for evaluation of methods for drug safety signal detection using electronic healthcare record databases. Drug Saf. 2013 Jan; 36(1):13-23. View Abstract
The EU-ADR Web Platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf. 2013 May; 22(5):459-67. View Abstract
Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project. J Am Med Inform Assoc. 2013 May 01; 20(3):446-52. View Abstract
Effect of competition bias in safety signal generation: analysis of a research database of spontaneous reports in France. Drug Saf. 2012 Oct 01; 35(10):855-64. View Abstract
Risk factors and clinical outcome of unsuspected pulmonary embolism in cancer patients: a case-control study. J Thromb Haemost. 2012 Oct; 10(10):2032-8. View Abstract
Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc. 2013 Jan 01; 20(1):184-92. View Abstract
Automatic filtering and substantiation of drug safety signals. PLoS Comput Biol. 2012; 8(4):e1002457. View Abstract
EU-ADR healthcare database network vs. spontaneous reporting system database: preliminary comparison of signal detection. Stud Health Technol Inform. 2011; 166:25-30. View Abstract
A potential competition bias in the detection of safety signals from spontaneous reporting databases. Pharmacoepidemiol Drug Saf. 2010 Nov; 19(11):1166-71. View Abstract
Design and evaluation of a semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European EU-ADR project. Stud Health Technol Inform. 2010; 160(Pt 2):1085-9. View Abstract
A semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European eu-ADR project. Stud Health Technol Inform. 2009; 150:190-4. View Abstract
Using discharge abstracts to evaluate a regional perinatal network: assessment of the linkage procedure of anonymous data. Int J Telemed Appl. 2009; 2009:181842. View Abstract
Improving the quality of the coding of primary diagnosis in standardized discharge summaries. Health Care Manag Sci. 2008 Jun; 11(2):147-51. View Abstract
Using knowledge for indexing health web resources in a quality-controlled gateway. Stud Health Technol Inform. 2008; 136:205-10. View Abstract
Building application-related patient identifiers: what solution for a European country? Int J Telemed Appl. 2008; 678302. View Abstract
A model for indexing medical documents combining statistical and symbolic knowledge. AMIA Annu Symp Proc. 2007 Oct 11; 31-5. View Abstract
Interoperability issues regarding patient identification in Europe. Annu Int Conf IEEE Eng Med Biol Soc. 2007; 2007:6161. View Abstract
Proposal of a French health identification number interoperable at the European level. Stud Health Technol Inform. 2007; 129(Pt 1):503-7. View Abstract
How to manage secure direct access of European patients to their computerized medical record and personal medical record. Stud Health Technol Inform. 2007; 127:246-55. View Abstract