An atlas of exposome–phenome associations in health and disease risk

📆 3/30/2026 3:23 PM

United States News News

United States Latest News,United States Headlines

📆 3/30/2026 3:23 PM
📰 NatureMedicine

⏱ Reading Time:
2233 sec. here
39 min. at publisher
📊 Quality Score:
News: 896%
Publisher: 53%

Nongenetic exposures comprising the ‘exposome’, including diet, lifestyle, infections and pollutants, shape many clinical phenotypes yet the evidence remains fragmented.

Nongenetic exposures comprising the ‘exposome’, including diet, lifestyle, infections and pollutants, shape many clinical phenotypes yet the evidence remains fragmented. Here we conducted an exposome-wide association study incorporating 619 exposure indicators and 305 quantitative phenotypes across ten independent waves of the US Centers for Disease Control and Prevention National Health and Nutrition Examination Survey.

Replicable and stable signals were most concentrated in cardiometabolic and anthropometric phenotypes, linking objective nutrient biomarkers and lipophilic pollutants with body mass index, glycated hemoglobin and lipid profiles. Triglycerides, an important marker for cardiovascular risk, emerged as the phenotype most strongly associated with multidomain exposures, notably trans fatty acids, persistent pollutants and vitamin E isoforms. In pulmonary traits, tobacco-specific and carcinogen biomarkers were more prominently associated with reduced lung function than short-lived nicotine metabolites, refining exposomic links to forced expiratory volume in 1 s. Whereas individual exposures showed modest effects, aggregate ‘poly-exposomic’ models explained phenotypic variation comparable to genome-wide polygenic scores. Exposome globes further reveal an interconnected architecture where exposures rarely act in isolation, complicating causal attribution while providing a more holistic view of environmental risk. Our findings highlight which exposures are most likely to add value to disease risk assessment, population surveillance as well as further exposure prioritization and next-generation longitudinal exposomics.. Despite this, the structural relationship between the exposome—defined as the totality of environmental exposures in broad physical, chemical and psychosocial domains—and human health remains obscure, characterized by a lack of systematic mapping across its broad domains. Until now, interrogating exposome–phenome relationships has been limited to studies that target a few candidate exposures and phenotypes. These candidate studies are presented selectively in millions of papers on claimed associations yielding fragmented and often biased snapshots of the exposome–phenotype maze. For example, disciplines such as nutritional epidemiology have yielded numerous associations regarding single dietary factors and patterns in disease outcomes have been nonrobust, may not be readily applicable in new exposome epidemiology scenarios, for example, if most of the true associations to be discovered have small effect sizes and not readily discernible biological plausibility, analogy, coherence and specificity and there is no possibility to validate in experimental studies. Consequently, the opportunity to integrate environmental data into precision medicine remains underrealized. Precision medicine approaches, however, are dominated by genetic factors. Which exposures, if measured, would meaningfully improve risk stratification or refine prognosis and how large are those effects relative to demographics and genetics? Many phenotypes routinely used for care, diagnosis, staging and risk prediction, such as lipids , hemoglobin A1C% and fasting glucose, estimated glomerular rate /creatinine, inflammatory markers ), and spirometry ), may be partially driven by modifiable exposures. Prioritizing clinical phenotypes by the magnitude and replicability of associations, contextualizing connections between smoking and nutrient biomarkers and quantifying variance explained to gauge utility for risk equations are needed for evaluating the role of the exposome in precision medicine. Here, we hypothesize that the exposome exhibits a replicable associational architecture where aggregate factors explain clinically relevant phenotypic variance and disease risk. To evaluate this, we systematically quantify these relationships, executing an ‘exposome-wide association study’, establishing the data-driven foundation required to integrate the exposome into precision medicinein ten serial cross-sectional surveys that were sampled in years 1999–2000, 2001–2002, 2003–2004, 2005–2006, 2007–2008, 2009–2010, 2011–2012, 2013–2014, 2015–2016 and 2017–2018. We cataloged a total of 374 real-valued continuous phenotypes and 810 biomarkers or self-report questionnaire responses that measure pollutant, dietary, infectious or smoking-related exposures across all ten surveys. Supplementary Fig.shows the distribution of demographic characteristics for each association. The median age was 40 , 34 to 42) years and the median income-to-poverty ratio was 2.9 . All associations are presented in Supplementary TableTop left: Samples of the phenotypic domain comprising 305 phenotypes. Top right: Samples of the exposomic domain comprising 619 exposures. These data are harmonized across eight cohort samples of the NHANES 1999–2018. Bottom: Resources to describe the architecture of phenome–exposome associations, including exposome globes, the Exposome–Phenome Atlas and digital resources for conducting P-ExWAS . Figure created in BioRender; Patel, C.. We used survey-weighted regression to associate phenotypes with all exposures under nine different modeling scenarios that adjust for demographic and social attributes: the main reported model, which consists of age, age, sex, income , ethnicity , education and survey year ; base model, with no adjustments; sex and survey year; age, ageFull size image A total of 305 phenotypes across 18 categories are depicted in the columns and 625 exposures across 18 categories are depicted in the rows. Each entry in the matrix is the linear association between exposure and phenotype. Gray shading denotes associations that could not be estimated owing to pairwise missingness or a total sample size lower than 500.The number of associations across all phenotype–exposure associations that passed the Bonferroni threshold was 5,674 was 16–654 . The average percentage of associations that were Bonferroni significant was 5% . The most associations were found for serum bilirubin, waist circumference and body mass index .). For example, the anthropometric phenome category saw the highest number of associations . Of the exposome variables, smoking and dietary/nutrient biomarkers were implicated in the most phenotype–exposure associations: ~15% and 13%, respectively occurred in two surveys 5% of the time. By contrast, if a phenotype–exposure association achieved an FDR significance across all surveys . However, phenotype–exposure replicated rates vary depending on the number of surveys available for a phenotype–exposure association. Specifically, for FDR-significant phenotype–exposure associations assessed in only two surveys were found in both surveys 39% of the time at awas 0, 5, 0, 26, 6, 14, 14, 20 and 18% for associations in 2, 3, 4, 5, 6, 7, 8, 9 and 10 surveys. We also assessed the percentage that were nominally significant in multiple survey waves. For the 1,211 associations estimated in ten survey waves , there were 76%, 11%, 4%, 2.5%, 1%, 1%, 1%, 0.5%, 1% and 1% of associations that were nominally significant in exactly 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 surveys. In other words, the replication rate among 1,211 associations estimated in ten different cohort samples was 13% . For example, phenotypes in the inflammation category had exposures that explained 3% of variance on average across all phenotype–exposure associations that were Bonferroni significant. For exposures, pollutant factors explained on average 0–3% of variation across all phenotypes. Organochlorine exposures explained ~3% of variance on average across all phenotypes. On average, dietary biomarkers accounted on average 1% of variation; however, dietary factors measured through an interview explained on average 0.5% of variation in phenotype.owing to the additive contribution of 20 exposures simultaneously. For phenotypes that had greater than 20 exposures associated at a FDR level of significance, we imputed exposure data where missing . When considering 20 exposome factors simultaneously in 119 phenotypes, the median variance explained is 3.5% , greater than the median, red points). Triglycerides are an important clinical phenotype used to screen for cardiovascular disease. We found that aggregate exposome, particularly lipophilic dietary and pollutant-related exposures, described large variance in levels of triglycerides in the USA , alpha-tocopherol and gamma-tocopherol independently contributed the most to the variance and were positively associated with triglycerides . For associations between Bonferroni-significant exposome factors and phenotypes , the 5th to 95th percentile range was −0.17 to 0.19 .). The median correlation between exposures was 0.01, and the median absolute value correlation was 0.05 across all correlations. For exposure–exposure correlations that passed the Bonferroni threshold ., Exposome correlation globe for exposures associated with hemoglobin A1C or BMI. Node colors include: pollutants , infection and nutrients ., Distribution of exposure–exposure correlations. Gray color denotes randomly selected correlations. Blue line denotes exposome correlations for exposures associated with BMI or hemoglobin A1C. Black denotes correlations that achieved Bonferroni significance. Lines depict correlations at −0.25 and 0.25.depicts exposures identified in their association with BMI and A1C%. Correlations are only drawn between nodes whose absolute value of correlation was greater than 0.25; these correlations are among the top 15% of the distribution of correlations subtracted from univariate estimates to assess the impact of adjustment. The average difference between a minimally adjusted and a fully adjusted model and each scenario was 0.01 , demonstrating some bias due to demographic adjustment. The s.d. sizes were the largest for the fully adjusted minus univariate model and fully adjusted minus ethnicity model . The s.d. sizes were the smallest for the fully adjusted minus age, sex and ethnicity models and the fully adjusted minus age, sex and income/education models exhibited a switch of coefficient sign between the univariate model and the adjusted model. For example, BMI and blood cadmium had positive associations ; however, when controlling or adjusting for factors in the ‘main’ model, the association becomes stronger and opposite in direction . We estimated the correlations of associations . Across all levels of significance, we observed a 0.84 correlation between day 1 self-report versus day 2 . The correlations between the self-reported values and biomarkers were smaller, and we observed a Pearson correlation of 0.52. For those blood nutrient variables that were Bonferroni significant, we observed a correlation of 0.60. Blood and urine pollutant biomarkers reflect the biological relationship between exposure and excretion. We observed a positive and strong correlation between associations. For example, for blood versus urinary biomarkers, we observed a 0.72 Pearson correlation. When only considering blood biomarkers that were Bonferroni significant, the phenotype–exposure associations between blood versus urine biomarkers was 0.78 for cadmium, 0.96 for cotinine and 0.71 for mercury . The difference in association between cotinine and NNNL is consistent with their biological properties: cotinine, a short half-life metabolite of nicotine, primarily captures recent exposure and is subject to greater day-to-day variability, whereas NNAL, a metabolite of the tobacco-specific nitrosamine with a half-life of 10–16 days, provides a more stable marker of cumulative exposure. Nevertheless, numerous densely correlated exposures emerged associated with FEVWe deployed our ExWAS procedure across timely biomarkers of aging, including epigenetic age ). We also interrogated the indicators of cognitive recall , as they are used in the clinic to stage cognitive decline in older adults . We found shared exposure associations between better cognitive function and other phenotypes interrogated in the population, including higher exhaled nitrous oxide and with urinary creatinine was associated with smoking, heavy metals and physical activity behavior. Among these domains, physical activity explained the most variance tended to have similar phenotypic associations; the degree of shared associational architecture is larger within categories than across categories. For all correlations within exposure variables that were dietary biomarkers, the median absolute value of shared architecture was 0.2 . Similar shared associational architecture was observed within smoking biomarkers . The shared associational architecture between dietary biomarkers and self-reported dietary nutrients had a median absolute value correlation of 0.24. We examined the degree of shared associational architecture between phenotypes. The shared associational architecture between BMI and body weight was 0.98. BMI and cardiorespiratory fitness had opposite associational architecture : a correlation of −0.83. A1C% had an opposite architecture compared with high-density lipoprotein cholesterol .We used data from participants of the UK Biobank to compare genetic versus exposomic predictions. We compared the variance explained due to ~1 M imputed genotypes from genome-wide association studies performed on 29 of the phenotypesdue to exposome was 7.9% . We found that the multiple exposome factors, when modeled simultaneously, had explained variance comparable to the entire genomic array across the phenotypes . For example, the total genetic and exposomic variance explained for BMI was similar, at ~10% for both.Our systematic mapping of the exposome onto the phenome reveals three insights with direct clinical and biomedical implications. First, robust environmental signals are highly concentrated in cardiometabolic and pulmonary phenotypes used to stage and gauge care, establishing lipids , glycemic markers and lung function as the highest-yield targets for our data-driven environmental risk assessment. Objective nutrient biomarkers and lipophilic pollutants are reproducible correlates of BMI, glycated hemoglobin and lipid profiles. Triglycerides stood out as the phenotype most strongly linked to multidomain exposure patterns, with trans fatty acids, banned persistent pollutants and vitamin E isoforms among the most informative contributors, suggesting that lipid risk assessment—important for staging cardiovascular disease—may be sensitive to integrated dietary and pollutant chemical contexts. In pulmonary traits, tobacco-specific biomarkers showed stronger and more stable associations with reduced lung function than short-lived nicotine metabolites, supporting the clinical utility of longer half-life biomarkers when refining smoking-related risk for FEVand related outcomes . Importantly, we also demonstrate that while single exposures have modest association sizes, aggregate ‘poly-exposomic’ profiles explain phenotypic variance comparable to genome-wide polygenic scores; this suggests that multifactor environmental exposome integration is required to meaningfully improve precision risk models beyond age and sex. Third, we identify a critical reliance on objective measurement: biomarkers consistently revealed biomedical associations that self-reported history failed to capture. Collectively, these findings move the field beyond fragmented and nonobjective associations, defining the specific clinical domains and measurement modalities necessary to operationalize the exposome in biomedical research to evaluate medical decision-making. Our findings have several implications. Estimating association sizes and their replication rate across exposure domains helps to prioritize which exposures are most likely to yield clinically meaningful signals and can guide study design and power planning for ExWAS in new cohorts . Most of the exposome tabulated here adds little incremental clinically relevant predictive value for many phenotypes, but a smaller set especially in cardiometabolic and smoking-related domains appear more promising for refining risk equations. We note that cancer-related phenotypes are underrepresented in our atlas and are ripe for further research.) rather than isolated markers. Demographics remain essential for risk adjustment, and modest exposome signal strength probably reflects measurement limitations and the cross-sectional nature of many exposure assessments., complicating causal interpretation and attribution, underscoring the need to view high-priority signals in the context of exposure ‘mixtures’, globes and bundles. This is exemplified by smoking, where biomarker indicators of the behavior with different half-lives may capture distinct time windows of exposure relevant to lung function; however, the exposures are all related to one another and make a complex globe., especially for age, sex and socioeconomic factors, reinforcing the need for transparent modeling and framing. Future work should test whether top signals persist under alternative adjustment strategies and in longitudinal settings, and the exposome field may need to configure model specifications per exposome or phenotype domain and explore mediation and interaction Objective biomarkers appear more consistent and informative than self-reported measures, with strong concordance across blood and urine heavy metal indicators and far weaker signals for dietary recalls compared with nutrient biomarkers. This supports prioritizing standardized biospecimen assays when the goal is clinical translation for precision medicine or robust population surveillance.for targeted follow-up. Next, temporality in disease-specific longitudinal cohorts should be established by measuring exposures at baseline and relating them to time-to-event or longitudinal trajectories. One can apply instrumental-variable approaches, including Mendelian randomization, where genetic variants serve as stand-ins for exposures. Third, aim for ‘functional exposomics’ and measure more granularly and precisely, using proteomics, metabolomics and methylomics to map exposure–responsive pathways and test mediation , but the next major advance will be enhanced measurement paired with expanded study designs. In particular, repeated and personalized exposure profiling that spans chronic burdens and acute signals will be essential for moving to causal implication of the exposome in disease. Such measurement-rich longitudinal studies, will further prioritize the most promising exposure domains for follow-up, clarify temporality, explain personal heterogeneity and identify modifiable drivers of clinical phenotypes. As these studies mature, the field will be positioned to define causally attributable exposures that can be targeted through behavioral, pharmaceutical, environmental or policy interventions and/or incorporated into predictive models for individual risk assessment in the clinic. In this way, our ExWAS serves as a prerequisite for systematic, large-scale integration of exposomic information at the point of care.This study was deemed ‘not human subjects’ research by the Harvard Institutional Review Board: IRB24-1004. NHANES participants have previously consented their data for use in research, and the protocol has been approved by the National Centers for Health Statistics ethics board:shows the pipeline workflow. To enhance replication, the findings are deployed as a database called a ‘Phenome-Exposome Atlas’ to conduct all analyses. The features of the nhanespewas includes cataloging of phenotypes and exposures to associate within the NHANES surveys, R package to associate all of the exposures with phenotypes using a survey-weighted linear modeland providing a user-specified array of modeling assumptions , aggregating pairwise associations across independent surveys to output an overall estimate across surveys and assess replicability and providing a browsable database of summary statistics that contain the overall association size or correlation across all survey years interrogated, the standard error of the association,value and variance explained attributable to the exposure . The code and package is available via GitHub atThis study used an observational, cross-sectional design leveraging ten serial waves of the NHANES from 1999 to 2018. No statistical method was used to predetermine sample size. Sample sizes were determined by the availability of participant data within the public-use NHANES database. To ensure robust associational estimates, we limited our analysis to phenotype–exposure pairs with a minimum of 500 participants across at least two survey cycles. Post hoc power calculations were performed to determine the minimum detectable effect sizes . As this was an observational study using secondary public health data, the experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Statistical analyses were conducted using survey-weighted linear regression to account for the complex, multistage sampling design of NHANES. Standardized coefficients,) and the Benjamini–Yekutieli FDR . Missing data for multiexposure models were handled via multiple imputation with chained equations. All analyses were performed using R and the survey package. To ensure the reproducibility of these findings, the full analytic pipeline is provided as an open-source R package, nhanespewas, available at exogenous in origin , biomarkers of exposure or reflective of lifestyle or behavior . While many of these variables do exhibit heritability , it is secondary, as the origin of the factor is still external. We processed some phenotypic variables and exposure variables. Blood pressure was a phenotypic variable whose measurement was repeated multiple times: we took the average of the measurements. Physical activity questionnaire information was collected in a series of questionnaire items; we processed these measurements to be in total metabolic equivalent hours to be a variable that denotes past, current or never smoking. We next tabulated the sample size for each exposure and phenotype variable pair used in downstream pipelines. We programmatically ingested NHANES data dictionaries and public data files for ten cycles into a SQLite database and built a canonical catalog that maps each exposure and phenotype to all cycle-specific variable names and labels referring to the same construct . The code paths were: download/ and select/ . Tutorials illustrate end-to-end usage. The repository and compiled database links are provided in the ‘Data availability’ and ‘Code availability’ sections. When the NHANES supplies both conventional and International System of Units variables , we converted to a common unit before transformation; for categorical variables whose levels changed across cycles , we recoded to harmonized bins listed in the catalog . For analytes with known method revisions, we preferentially used NHANES-standardized variables when available—for example, liquid chromatography–tandem mass spectrometry -standardized 25-hydroxy-vitamin D and isotope dilution mass spectrometry -standardized serum creatinine—so that pre- and postchange data are comparable on a common scale. For laboratory measures, the NHANES provides per-analyte lower limit of detection and an ‘…LC’ comment code. Values below the LLOD are set to LLOD/√2, and the proportion

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

See Guns N’ Roses Perform ‘Nothin” for First Time at 2026 Tour OpenerGuns N' Roses kicked off their 2026 world tour Saturday in Mexico, delivering a 26-song set that included the live debuts of 'Nothin'' and 'Atlas.'
Read more »

Grupo Frontera, Guns N’ Roses & Turnstile Among Best of Day 2 of Tecate Pa’l Norte 2026Guns N’ Roses chose the Tecate Pa’l Norte festival as the first stop of their new world tour, performing their new singles “Nothin’” and “Atlas” live for the first time — two hidden gems from their 2008 album Chinese Democracy. “It’s good to see you,” lead singer Axl Rose greeted the attendees.
Read more »

Soccer fans throw haymakers in massive brawls at Chivas vs. Atlas gameToday's Video Headlines: 3/30/2026
Read more »

Interstellar comet 3I/ATLAS may be nearly 12 billion years old — so ancient its star system may no longer existKeith Cooper is a freelance science journalist and editor in the United Kingdom, and has a degree in physics and astrophysics from the University of Manchester.
Read more »