Skip to main content
Fig. 4 | Human Genomics

Fig. 4

From: Multi-omics approaches for understanding gene-environment interactions in noncommunicable diseases: techniques, translation, and equity issues

Fig. 4

Schematic Overview of AI/ML-based Multi-Omics Data Integration Workflow. This schematic illustrates a simplified workflow for multi-omics data integration, highlighting key steps in processing, analyzing, and translating multi-omics datasets. The process begins with omics layers (e.g., genomics, transcriptomics, proteomics, metabolomics), integrated using approaches like early integration (merging raw data), mixed integration (combining intermediate features), and late integration (aggregating model outputs). These datasets are analyzed using unsupervised learning methods, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), clustering, Non-Negative Matrix Factorization (NMF), Canonical Correlation Analysis (CCA), autoencoders, and Latent Dirichlet Allocation (LDA), as well as supervised methods like regression, Support Vector Machines (SVMs), Random Forests, Neural Networks, k-Nearest Neighbors (k-NN), Elastic Net, and deep learning. Model performance is evaluated using metrics such as F-measure, Area Under the Receiver Operating Characteristic Curve (AUROC), Cohen’s Kappa, and error measures like Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Validation ensures robustness and biological relevance through larger cohorts, model organisms, functional annotation, and perturbation analyses. Finally, insights are translated into diagnostic classification, clinical outcome prediction, treatment response prediction, and gene-environment (GxE) interaction analysis. This schematic is not exhaustive but provides a simplified guide to navigate the manuscript’s discussion on multi-omics data integration

Back to article page