- Research
- Open access
- Published:
Integration of single-cell sequencing and drug sensitivity profiling reveals an 11-gene prognostic model for liver cancer
Human Genomics volume 18, Article number: 132 (2024)
Abstract
Background
Liver cancer has a high global incidence, particularly in East Asia. Early detection difficulties lead to poor prognosis. Single-cell sequencing precisely identifies gene expression differences in specific cell types, making it valuable in tumor microenvironment research and immune drug development. However, the characteristics of tumor cells themselves are equally important for patient prognosis and treatment.
Methods
We downloaded single-cell sequencing data from GSE189903, grouped cells by cluster markers, and classified epithelial cells into adjacent non-tumor, normal, and tumor cells. Differential gene and survival analyses identified significant differential genes. Using TCGA-LIHC data, we divided 370 patients into test and training sets. We constructed and validated a LASSO model based on these genes in both sets and two external datasets. Functional, immune infiltration, and mutation analyses were performed on high and low-risk groups. We also used RNA-seq and IC50 data of 15 liver cancer cell lines from GDSC, scoring them with our prognostic model to identify potential drugs for high-risk patients.
Results
Dimensionality reduction and clustering of 34 single-cell samples identified five subgroups, with epithelial cells further classified. Differential gene analysis identified 124 significant genes. An 11-gene prognostic model was constructed, effectively stratifying patient prognosis (p < 0.05) and achieving an AUC above 0.6 for 5 year survival prediction in multiple cohorts. Functional analysis revealed that upregulated genes in high-risk groups were enriched in cell adhesion pathways, while downregulated genes were enriched in metabolic pathways. Mutation analysis showed more TP53 mutations in the high-risk group and more CTNNB1 mutations in the low-risk group. Immune infiltration analysis indicated higher immune scores and less CD8 + naive T cell infiltration in the high-risk group. Drug sensitivity analysis identified 14 drugs with lower IC50 in the high-risk group, including clinically approved Sorafenib and Axitinib for treating unresectable HCC.
Conclusion
We established an 11-gene prognostic model that effectively stratifies liver cancer patients based on differentially expressed genes between tumor and adjacent non-tumor cells clustered by scRNA-seq data. The two risk groups had significantly different molecular characteristics. We identified 14 drugs that might be effective for high-risk HCC patients. Our study provides novel insights into tumor cell characteristics, aiding in research on tumor development and treatment.
Introduction
Hepatocellular carcinoma (HCC), or liver cancer, is a serious malignant tumor that typically originates in liver tissue and poses a significant health challenge worldwide. According to data from the World Health Organization (WHO) and global epidemiological studies, Asia is a high-incidence area for liver cancer, particularly in countries such as China, Japan, and Southeast Asian nations [1, 2]. The incidence of liver cancer may vary among different regions and populations, primarily influenced by various factors including dietary habits, alcohol consumption, history of liver disease, family genetic factors, and environmental exposure. In addition, infection with hepatitis B virus (HBV) and hepatitis C virus (HCV) stand as prominent contributing factors for liver cancer [3,4,5]. Due to the complexity and uncertainty of these influencing factors, liver cancer incidence rates remain a significant public health challenge globally. Additionally, the prognosis for liver cancer is poor: Only 5% to 15% of patients are eligible for surgical removal, which is suitable only for early-stage patients without cirrhosis due to diminished hepatic regenerative capacity. However, liver cancer is very hard to detect in the early stages, and over 85% of patients are diagnosed with advanced-stage cancer at the first diagnosis [6]. Therefore, ongoing research and efforts to improve prevention, early detection, and treatment outcomes are necessary.
In recent decades, the swift advancement of cancer genomics has propelled bulk transcriptome sequencing (bulk RNA-seq) into a prominent position in the field of transcriptomics. An increasing number of studies have shown that oncogene expression is higher in peritumoral tissue compared to actual tumor tissue. H. R. Frost highlights that the genetic alterations driving cancer are highly tissue-specific, with normal tissues often showing gene expression patterns closely related to those seen in associated cancers when adjusted for relative expression values. This association can reveal prognostic markers and improve cancer genomic analyses by leveraging data from normal tissues to better understand cancer biology [7]. Zhu et al. supported this by identifying differentially expressed genes (DEGs) in both tumor and peritumoral tissues, noting significant overlaps that can aid in pan-cancer diagnosis and therapy [8]. Yang et al. discovered key genes and pathways in peritumoral tissues that influence tumor prognosis and suggested that adjacent tissues contribute to the tumor microenvironment in lung adenocarcinoma [9]. Lastly, Ke Ming-yao discussed how drug resistance proteins expressed in tumors and adjacent tissues can differ, implying a complex interaction between tumors and surrounding tissues affecting gene expression [10]. Overall, these studies suggest that the tissue-specific characteristics of normal tissues play a crucial role in cancer progression.
Single-cell RNA sequencing (scRNA-seq) is a powerful technology that enables the precise identification of gene expression differences at the individual cell level. By analyzing the genomic information of each cell, this technique provides detailed insights into the cellular heterogeneity within a tissue or organism. It allows researchers to detect subtle variations among different cell populations, enhancing our understanding of cellular functions in both normal and pathological states [11,12,13]. Recent research supports the effectiveness of PD-L1 immunotherapy and highlights the crucial role of scRNA-seq in studying the tumor microenvironment (TME). For example, Sun et al. profiled the comprehensive picture of the early-relapse HCC ecosystem using scRNA-seq, offering deeper insights into immune evasion mechanisms associated with tumor relapse in HCC [14]. However, the intrinsic properties of tumor cells are crucial for understanding the prognosis and treatment outcomes in cancer patients. Many studies have revealed that the differences between peritumoral tissue and tumor tissue play a crucial role in cancer tumorigenesis and development. Compared to bulk RNA-seq, scRNA-seq enables precise identification of gene expression differences in specific cell types, providing cell-specific insights. Therefore, profiling the landscape of peritumoral cells and tumor cells in HCC using scRNA-seq is helpful for prognostic marker identification and drug development in HCC.
In this study, we generated a scRNA-seq cohort containing 34 samples from seven primary liver cancer patients and three bulk RNA-seq cohorts. Differentially expressed genes between adjacent non-tumor cells and tumor cells identified in the scRNA-seq cohort were enrolled for model establishment. The bulk RNA-seq cohorts were used for model establishment and validation. Additionally, we compared the molecular characteristics between patients with high-risk and low-risk to explore the potential mechanism. Furthermore, we identified potential drugs that might be effective for high-risk patients, providing new clues for treatment development in liver cancer.
Methods
Data acquisition and pre-processing
The scRNA-seq dataset GSE189903 was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/), containing scRNA-seq data of 34 samples from seven primary liver cancer patients. The “Seurat” and “harmony” R packages were used to pre-process the scRNA-seq data. The samples were obtained from the tumor core, tumor border, and adjacent non-tumor tissue in the liver. Bulk RNA-seq data, clinical information, immune infiltration data, and SNP mutation site data of TCGA-LIHC were downloaded from the TCGA database (https://portal.gdc.cancer.gov/). Samples with incomplete survival and clinical information were excluded, resulting in a cohort of 370 liver cancer patients for this study. Additionally, a cohort (CHCC-HBV) consisting of 159 HCC samples with hepatitis B virus (HBV) infection was included [15]. The bulk RNA-seq data and clinical information of the CHCC-HBV cohort were downloaded from the National Omics Data Encyclopedia (NODE) database (https://www.biosino.org/node). Another bulk RNA-seq cohort, GSE20140, was also downloaded from the GEO database, containing RNA-seq data and survival data of 80 patients with HCC.
Dimensionality reduction and clustering analysis in scRNA-seq cohort
We utilized the UMAP algorithm to perform dimensionality reduction and clustering analysis on scRNA-seq cohort data. UMAP (Uniform Manifold Approximation and Projection) is a widely-used dimensionality reduction technique in machine learning and data visualization. It is particularly effective for visualizing high-dimensional data in two or three dimensions [16]. Compared to methods like PCA or t-SNE, UMAP offers significant advantages in terms of speed and data structure retention [17]. Initially, we applied the UMAP algorithm to group similar cells from 34 samples in the scRNA-seq cohort into clusters and annotated the different cell groups using the Cell Marker database. Since the goal of this study is to construct a prognostic model for liver cancer, which originates from epithelial or mesenchymal malignant tumors in the liver, we extracted epithelial cells for secondary dimensionality reduction and clustering, categorizing these cells into adjacent non-tumor cells, tumor cells, and normal cells.
Pseudo-temporal analysis
Pseudo-temporal analysis, also known as trajectory analysis, is a method used in single-cell RNA sequencing (scRNA-seq) to infer the temporal ordering of cells based on their gene expression profiles. Unlike traditional time-series experiments, pseudo-temporal analysis does not require actual time points; instead, it reconstructs a sequence of cell states that represent a developmental or differentiation process [18]. We conducted the pseudo-temporal analysis by using the “monocle” R package to help further cluster the epithelial cells into adjacent non-tumor cells, tumor cells, and normal cells.
Establishment and validation of a prognostic model
The DESeq2 package is designed for normalization, visualization, and differential analysis of high-dimensional count data. It employs empirical Bayes techniques to estimate log fold changes and dispersion priors, and computes posterior estimates of these quantities [19, 20]. We used DESeq2 for differential gene analysis of RNA-seq data from adjacent non-tumor cells and tumor cells, with screening criteria of |log fold change (FC)|> 0.5 and p-value < 0.05. The resulting differentially expressed genes were further analyzed for survival significance in the TCGA-LIHC cohort using the survival R package. Genes with significant survival differences (p-value < 0.05) were included in model construction.
In the TCGA-LIHC cohort, 370 patients with complete clinical information were randomly divided into training and testing groups. Using the glmnet R package, we constructed a LASSO-Cox regression model in the training group based on the candidate genes, and tenfold cross validation was used to determine the optimal regularization parameters. LASSO regression adds an L1 penalty to the optimization function, driving many coefficients to zero and selecting features with maximum predictive power, simplifying the model and improving its generalization ability [21]. The Cox model based on the variables selected by LASSO can further improve the prediction accuracy and explanatory ability.
We then scored the patients in the training group using the model and divided them into high-risk and low-risk groups based on the median risk score. Survival analysis and Kaplan–Meier survival curves were generated for these groups, and the log-rank test was used to compare survival differences. The model's prognostic performance was validated using the testing group and the TCGA-LIHC cohort as internal validation, and the CHCC-HBV cohort and GSE20140 as external validation. In addition, multivariable analysis was performed in the TCGA-LIHC cohort with risk score and some clinical indexes including age, gender, TNM stage and tumor stage to reveal the prognostic independence of our model in HCC. Besides, we fabricated a nomogram and validated it with calibration curves by using the “rms” R package.
Comparison of molecular features between high and low risk groups
Based on the constructed model, we evaluated the 370 patients in the TCGA-LIHC cohort, stratifying them into high- and low-risk groups according to the median risk score. Differentially expressed genes between these groups were identified using DESeq2, applying criteria of |logFC|> 0.5 and p-value < 0.05. Besides, differentially expressed proteins between the two groups were identified with a criteria of |logFC|> 0.25 and p-value < 0.05. These genes and proteins subsequently underwent Gene Ontology (GO) functional analysis using the “ClusterProfiler” R package, respectively. GO enrichment analysis elucidates significantly enriched functions or processes within a gene set, encompassing categories such as Molecular Function, Cellular Component, and Biological Process [22].
Extensive research underscores the pivotal role of the tumor immune microenvironment in cancer development and progression. Therefore, we utilized the xCell R package, an advanced tool for immune cell infiltration assessment based on gene expression data [23], to identify potential immune cell subpopulations and quantify their relative abundance within tissues. Immune scores were computed for patients in the TCGA-LIHC cohort, and comparative analyses between the high- and low-risk groups were performed to elucidate the influence of immune cells in the microenvironment on prognosis. The therapeutic response to immune checkpoint inhibitors (ICIs) was predicted by using the Tumor Immune Dysfunction and Exclusion (TIDE) webtools which is a computational tool for evaluating tumor immune escape mechanisms [24]. Moreover, given that gene mutations are fundamental to tumorigenesis, a comparative analysis of mutation differences between the high- and low-risk groups was conducted to enhance our understanding of the molecular mechanisms underlying liver cancer development.
Drug sensitivity analysis
The Genomics of Drug Sensitivity in Cancer (GDSC) database [25], developed by the Sanger Institute in the UK, is an extensive repository that aggregates data on tumor cell sensitivity and responses to various drugs. Currently, it stands as the largest public resource for tumor cell drug sensitivity, detailing the effects of 621 anticancer compounds across over 1,000 cancer cell lines and encompassing 24 pathways. The database is divided into two sections: GDSC1 and GDSC2. GDSC1 primarily comprises sequencing and experimental results from 2010 to 2015, while GDSC2 includes data from 2015 to the present. The latest iteration of GDSC1 (version 8.5) contains information on 970 cell lines, 403 anticancer compounds, and 333,292 drug response IC50 and AUC values. In contrast, GDSC2 encompasses data on 969 cell lines, 297 anticancer compounds, and 243,466 drug response IC50 values, including 15 liver cancer cell lines.
Utilizing our model, we calculated risk scores for the RNA-seq data of 15 liver cancer cell lines in GDSC2, subsequently categorizing them into high-risk and low-risk groups based on the median score. By analyzing the IC50 data for various drugs within these groups, we identified compounds that are significantly more effective in the high-risk group. These findings suggest that these drugs may serve as potential candidates for the clinical treatment of high-risk liver cancer patients.
Statistical analysis
The survival curves were drawn using Kaplan–Meier survival analysis via the “Survival” R package. The difference in survival time between the two groups was tested by a log-rank test. The time-dependent receiver operating characteristic (ROC) curve analysis was performed using the “timeROC” R package, and the area under the curve (AUC) value was obtained as the criterion for assessing the accuracy of the prognostic signature. Multivariate Cox analysis was employed to evaluate the independence of our prognostic signature on prognosis in LIHC. A nomogram and calibration curve were constructed using the “rms” R package. All statistical analyses and visualizations were performed using R (version 4.0.2). A two-tailed p-value of < 0.05 was considered statistically significant.
Results
Identification of differentially expressed genes between peritumoral cells and tumor cells using scRNA-seq data
We mapped the spatial landscape of cells in 34 samples from the scRNA-seq cohort using dimensional reduction via the Uniform Manifold Approximation and Projection (UMAP) method. The cells were grouped based on their cell lineage, including T cells, B cells, tumor-associated macrophages (TAMs), cancer-associated fibroblasts (CAFs), and epithelial cells, which were determined using lineage-specific marker genes (Fig. 1A–B). Among these cells, we aimed to reveal the relationship between tumor cells and peritumoral cells in liver cancer. Therefore, we further clustered the epithelial cells into three types: tumor cells, peritumoral cells, and normal cells (Fig. 1C), according to the pseudo-temporal analysis (Fig. 1D–E). Differential gene expression analysis was performed between the tumor cell group and the peritumoral cell group. One hundred and eighty-three genes were identified as significantly up-regulated, while 199 genes were identified as significantly down-regulated (Fig. 1F). Subsequently, the survival significance of the differentially expressed genes was analyzed in the TCGA-LIHC cohort, identifying 124 genes as prognostic (Fig. 1G), which were candidate genes for prognostic model establishment.
Identification of prognostic differentially expressed genes between tumor and adjacent non-tumor cells based on scRNA-seq data. A The UMAP algorithm was applied for dimensionality reduction, and five cell clusters were successfully classified and annotated with SingleR and CellMarker according to the composition of marker genes. B Expression levels of marker genes for each cell cluster. C The UMAP algorithm was applied to the epithelial cells for dimensionality reduction, and the epithelial cells were successfully classified into adjacent non-tumor (green dots), normal (orange dots), and tumor (blue dots) cells according to the pseudo-temporal analysis. D–E The pseudo-temporal analysis of epithelial cells. F Volcano plot of DEGs between tumor cells and adjacent non-tumor cells. P < 0.05 and |log2FoldChange|> 0.5 were identified as significant DEGs. The red dots represent upregulated genes and the blue dots represent downregulated genes. G Identification of DEGs with prognosis in TCGA-LIHC. The blue part represents DEGs, the yellow part represents prognostic genes, and the overlapped part represents DEGs with prognostic significance
Establishment and validation of a prognostic model in TCGA-LIHC
The TCGA-LIHC cohort was randomly divided into a training set and a testing set. The LASSO-COX method was used to construct a molecular model in the training set based on the candidate genes mentioned above. A model containing eleven genes was established, including CCT3, CRYAB, ENO1, LDHA, MARCKSL1, MMP7, SAT2, SPP2, STOM, TMEM205, and TRAPPC9 (Fig. 2A). We then calculated the risk score of each patient in the training set and divided them into high and low-risk score groups. Comparison of survival status showed that patients with high-risk scores had significantly shorter survival times (Fig. 2B). Additionally, we evaluated the predictive ability of the model using the ROC curve. The AUC for 1-year, 3-year, and 5-year survival were 0.77, 0.76, and 0.73, respectively (Fig. 2C), suggesting good predictive efficiency. For internal validation, the same process was performed in the testing set and the TCGA-LIHC cohort separately. Without exception, the high-risk score group had a significantly worse prognosis compared to the low-risk score group in both the testing set (Fig. 2D) and the TCGA-LIHC cohort (Fig. 2E, Supplementary Fig. 1). The AUC for 1-year, 3-year, and 5-year survival were all greater than 0.6 (Fig. 2F–G). Additionally, a GEO dataset with early-stage HCC who underwent surgical resection and a cohort with HBV-infected HCCs were also included for external validation. The results were similar to those of the other cohorts (Figs. 2H–I). In addition, multivariable analysis showed that the risk score based on our model was the only independent prognostic factor (Fig. 3A, p-value < 0.001), suggesting that the prognostic ability of our model would not be affected by some other clinical features including age, gender, TNM stage, and tumor stage. Besides, we fabricated a nomogram based on the risk score and the clinical features mentioned above (Fig. 3B), which might help clinicians to predict the risk of a certain patient. The Calibration curve further showed that the nomogram could predict 1-year survival perfectively, but might overstate the risk of patients’ 3-year and 5-year survival (Fig. 3C), suggesting that the nomogram might be more suitable for short-term survival prediction.
Development and validation of the 11-gene model in multiple cohorts. A LASSO coefficient profile, from which the optimal λ is chosen based on the plot. The two dashed lines indicate two specific λ values: lambda.min and lambda.1se. The λ values between these two are considered suitable. The model built with lambda.1se is the simplest, using fewer genes, while the model built with lambda.min has slightly higher accuracy, using more genes. By default, lambda.min is selected. B Survival curve between high-risk and low-risk groups in the training set. C ROC curve of 1 year, 3 year, and 5 year survival of patients in the training set. D–E Survival curves between high-risk and low-risk groups in the testing set D and the TCGA-LIHC cohort E. F–G ROC curves of 1 year, 3 year, and 5 year survival of patients in the testing set F and the TCGA-LIHC cohort G. H–I Survival curves between high-risk and low-risk groups in the CHCC-HBV cohort H and the GSE20140 cohort I
Molecular difference between patients with high and low risk scores
To further explore the molecular differences between the high-risk and low-risk groups in the TCGA-LIHC cohort, differential gene expression analysis was performed between the two groups. A total of 1906 genes were identified as differentially expressed, with 1333 up-regulated genes and 573 down-regulated genes (Fig. 4A). GO analysis was conducted on the up-regulated and down-regulated genes separately. Notably, the up-regulated genes were primarily enriched in pathways related to cell–cell adhesion (Fig. 4B, Supplementary Table 1), while the down-regulated genes were enriched in pathways related to metabolic processes (Fig. 4C, Supplementary Table 2). Besides, differentially up-regulated proteins were enriched in the immune-related pathway (Supplementary Table 3). Additionally, genetic mutations play a crucial role in the formation and development of tumors. We compared the genetic mutations between the high-risk and low-risk groups. Interestingly, the results showed that most mutations occurred in TP53 and CTNNB1 (Fig. 4D), but patients with high-risk scores had significantly more mutations in TP53, while patients with low-risk scores had significantly more mutations in CTNNB1 (Fig. 4E), suggesting that the high and low-risk groups might have distinct mechanisms of tumorigenesis. Moreover, the tumor microenvironment (TME) plays a complex and crucial role in tumor immune evasion and treatment resistance. Comparison of immune cell infiltration between the high-risk and low-risk groups showed that patients with high-risk scores had significantly higher immune scores and higher infiltration of natural killer T cells and Th2 cells, while lower infiltration of CD8 + naive T cells was also found in the high-risk group (Fig. 4 F, G), suggesting that patients with high-risk scores might benefit from immunotherapies. In addition, we further compared the TIDE between the two groups. Five modules were used to estimate TIDE score. We found that high-risk group had significant higher infiltration of myeloid derived suppressor cells (MDSCs) and tumor-associated fibroblasts (CAF), and higher exclusion score (p < 0.05) while no significant difference of tumor-associated macrophages (TAM) subtype M2 infiltration and dysfunction score was found between the two groups(Supplementary Fig. 2).
Molecular differences between patients with high and low risk scores in the TCGA-LIHC cohort. A Volcano plot of DEGs between high-risk and low-risk groups. P < 0.05 and |log2FoldChange|> 0.5 were identified as significant DEGs. The red dots represent upregulated genes and the blue dots represent downregulated genes. B GO analysis of the upregulated genes. C GO analysis of the downregulated genes. D Forest plot of mutations between high-risk and low-risk groups. E Significant differences in mutations between high-risk and low-risk groups by Fisher test. An odds ratio greater than 1 indicates that the high-risk group has more mutations in certain genes, while an odds ratio smaller than 1 indicates the opposite. F Heatmap of immune cells that are significantly different in infiltration between high-risk and low-risk groups with P < 0.05. G The immune score and infiltration of CD8 + naive T cells, Th2 cells, and natural killer T cells between high-risk and low-risk groups. Significant differences tested by Wilcoxon test
Identification of potential drugs for HCC patients with high risk scores
Fifteen LIHC cell lines were enrolled for drug exploration. Based on the median risk score calculated by our model, the fifteen cell lines were divided into two groups: high-risk and low-risk. Comparing the IC50 of drugs recorded in the GDSC database between the two groups showed that twenty drugs had significant differences between the two groups (Fig. 5 A P-value < 0.05), forteen of which had lower IC50 in the high-risk group, including Wnt-C59, TAF1_5496, Sorafenib, SN-38, Savolitinib, OF-1, LGK974, Fulvestrant, EHT-1864, Dihydrorotenone, AZD4547, Axitinib, AGK2, and 720,427, suggesting the effectiveness of these drugs in high-risk HCC patients (Fig. 5B).
Drug identification in the GDSC database. A Heatmap of the 20 drugs with significantly different IC50 values between the high-risk and low-risk groups. B Comparison of fourteen drugs that have significantly lower IC50 values in the high-risk group compared to the low-risk group. Significant differences tested by Wilcoxon test
Discussion
Prognostic models for liver cancer based on bulk RNA-seq have been widely studied and applied in recent years. This approach analyzes the whole transcriptome data of tumor tissues, identifies gene expression features related to prognosis, and uses these features to construct prognostic models [26, 27]. However, with the development of single-cell sequencing technology, many studies have utilized both scRNA-seq and bulk RNA-seq data for marker exploration and model construction to address the low accuracy caused by tumor heterogeneity. For example, Li et al. identified TXNL4A as a new biomarker through a combined analysis of bulk and scRNA-seq data, highlighting its association with CD8 T cell infiltration and HCC survival [28]. Another study by Li et al. developed a model based on NK cell marker genes, demonstrating its efficacy in predicting patient outcomes and therapeutic responses [29]. Tang et al. (2023) explored the prognostic value of exhausted T cells, constructing a robust model from bulk and scRNA-seq data to predict patient survival [30]. However, most of these studies focused on the relationship between tumor and the tumor microenvironment (TME), which might cause researchers to overlook the impact of the characteristics of tumor cells themselves on tumor development.
Therefore, in this study, we established an eleven-gene model in HCC based on the differentially expressed prognostic genes between tumor cells and adjacent cells clustered by scRNA-seq data, which is better able to reflect the intrinsic malignancy of tumor cells and exclude the confounding effects of other cells, such as those in the immune microenvironment. The model genes influence the occurrence, progression, invasion, and metastasis of liver cancer through various mechanisms. For example, elevated levels of CCT3 promoting tumor growth and survival by stabilizing key oncogenic proteins such as YAP and TFCP2 [31, 32]. CRYAB influences tumor prognosis through mechanisms related to anti-apoptotic activity [33]. LDHA is upregulated in cancer cells and plays a significant role in glycolysis and cancer metabolism [34]. The role of other model genes has not been investigated in liver cancer, and our findings provide new clues for a deeper understanding of these genes in liver cancer and help complete the mechanism of liver cancer.
To provide a more comprehensive understanding of the mechanisms driving the differences between the high- and low-risk groups, we compared the molecular characteristics between patients with high and low risk scores and found that patients with high risk scores had more mutations in the TP53 gene, while those with low risk scores had more mutations in the CTNNB1 gene. It is well-known that TP53 mutation is associated with worse outcomes, and patients with CTNNB1 mutations generally have better prognoses [35, 36]. This was consistent with our findings that patients with high risk scores had significantly shorter survival times compared to those with low risk scores. Additionally, we found that patients with high risk scores had significantly higher immune scores, higher infiltration of NK cells and Th2 cells, but lower infiltration of CD8 + naive T cells. Immune score is a comprehensive indicator used to evaluate the prognosis of tumor patients based on the density and distribution of different types of immune cells in the TME. A higher immune score represents higher infiltration of immune cells in the TME [37], suggesting that patients with high risk scores might benefit more from immunotherapies. NK cells, as part of the innate immune system, can rapidly respond to tumor signals and directly kill tumor cells [38]. Th2 cells are associated with anti-inflammatory responses and tissue repair, and can, in certain cases, promote tumor growth [39]. In contrast, CD8 + naive cells require antigen presentation and activation to exert their effector functions, a process easily inhibited by the tumor microenvironment [40]. In addition, the TIDE analysis showed that high-risk group had significant higher infiltration of MDSC and CAF, and higher exclusion score, but the p-value just less than 0.05, while no significant difference of TAM M2 subtype infiltration and dysfunction score was found between the two groups. MDSC, CAF and TAM subtype M2 are three cell types that limit T cell infiltration in tumors. Exclusion score and dysfunction score were used to reflect the T cell clearance/exclusion and function of tumor patients respectively. Combining the findings, it is suggested that the immune system might have less contribution to driving the differences between the high- and low-risk groups.
Finding effective tumor drugs is of great importance. It can increase cure rates, extend patient survival, reduce side effects, overcome resistance, and promote personalized treatment and scientific research. Our study uniquely utilized single-cell RNA sequencing (scRNA-seq) data from tumor cells, ensuring a high degree of purity and specificity in the gene expression profiles analyzed. This approach contrasts with bulk RNA sequencing (bulk RNA-seq) methods, which can include mixed cell populations, potentially confounding the results. By focusing solely on tumor cells, we achieved a more accurate identification of gene expression differences relevant to HCC prognosis and treatment. The use of scRNA-seq data provided a refined understanding of tumor cell-specific drug responses, which is critical for developing personalized treatment strategies. Notably, the drugs identified were tested on liver cancer cell lines, ensuring that the sensitivity results are directly relevant to tumor cells, further validating our findings. The drug sensitivity analysis identified several compounds that may be more effective in high-risk HCC patients, including Sorafenib and Axitinib which are already clinically approved for treating unresectable HCC [41, 42]. The presence of clinically approved drugs among our results offers a strong positive control to underscore the robustness of our approach and enhances confidence in the potential of other identified compounds, which are not yet in clinical use, to offer effective treatment options. Our findings emphasize the importance of molecularly targeted therapies based on precise tumor cell characteristics. The drugs identified through our model should be prioritized for clinical trials to evaluate their efficacy in high-risk HCC patient populations.
Even so, there are still some limitations to our study. Firstly, validating our model in a cohort with a larger sample size would be advantageous. Secondly, validating the fourteen drugs identified in this study through clinical trials would accelerate translating the results into clinical applications. In the future, we will attempt to overcome these shortcomings.
Conclusion
We established an 11-gene prognostic model that effectively stratifies liver cancer patients based on differentially expressed genes between tumor and adjacent non-tumor cells clustered by scRNA-seq data. The two risk groups had significantly different molecular characteristics, offering new clues for a deeper understanding of tumor development in HCC. Additionally, we identified 14 drugs that might be effective for high-risk HCC patients. Our study provides novel insights into tumor cell characteristics, aiding in research on tumor development and treatment.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in the GEO repository https://www.ncbi.nlm.nih.gov/, the TCGA repository https://portal.gdc.cancer.gov/, the NODE repository https://www.biosino.org/node.
References
Ferlay JE M, Lam F, Colombet M, Mery L, Piñeros M, Znaor A, Soerjomataram I, Bray F. Global cancer observatory: cancer today. 2019; Available from: https://gco.iarc.fr/today.
Bray F, et al. Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
El-Serag HB. Epidemiology of viral hepatitis and hepatocellular carcinoma. Gastroenterology. 2012;142(6):1264-1273e1.
Kulik L, El-Serag HB. Epidemiology and management of hepatocellular Carcinoma. Gastroenterology. 2019;156(2):477–91.
Llovet JM, et al. Hepatocellular carcinoma. Nat Rev Dis Primers. 2021;7(1):6.
Marrero JA, et al. Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the american association for the study of liver diseases. Hepatology. 2018;68(2):723–50.
Frost HR. Analyzing cancer gene expression data through the lens of normal tissue-specificity. PLoS Comput Biol. 2021;17(6):e1009085.
Zhu L, et al. Identification of potential biomarkers for pan-cancer diagnosis and prognosis through the integration of large-scale transcriptomic data. Front Pharmacol. 2022;13:870660.
Zhao Y, Zhao X, Qin Y. Influence mechanism of dynamic evolution of Chinese entrepreneurs’ entrepreneurial motivation on performance-the role of turning points and empathy. Front Psychol. 2020;11:474044.
Ming-yao K, A study on the expression of multidrug resistance-associated protein genes in lung cancer tissues and other malignant tumor. Pract J Cancer, 2010.
Saliba AE, et al. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014;42(14):8845–60.
Zheng GX, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049.
Macosko EZ, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161(5):1202–14.
Sun Y, et al. Single-cell landscape of the ecosystem in early-relapse hepatocellular carcinoma. Cell. 2021;184(2):404-421e16.
Yang C, et al. Prognosis and personalized treatment prediction in TP53-mutant hepatocellular carcinoma: an in silico strategy towards precision oncology. Brief Bioinform. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbaa164.
Andrews TS, et al. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc. 2021;16(1):1–9.
Wang HY, Zhao JP, Zheng CH. SUSCC: secondary construction of feature space based on UMAP for rapid and accurate clustering large-scale single cell RNA-seq data. Interdiscip Sci. 2021;13(1):83–90.
Hou W, et al. A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples. Nat Commun. 2023;14(1):7286.
Liu S, et al. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. J Vis Exp. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.3791/62528-v.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
Ternes N, Rotolo F, Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Stat Med. 2016;35(15):2561–73.
Harris MA, et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 2004. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkh036.
Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220.
Jiang P, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24(10):1550–8.
Yang W, et al. Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gks1111.
Zhengdong A, et al. Identification of fatty acids synthesis and metabolism-related gene signature and prediction of prognostic model in hepatocellular carcinoma. Cancer Cell Int. 2024;24(1):130.
Wu X, et al. Analysis of m(6)A-related lncRNAs for prognostic and immunotherapeutic response in hepatocellular carcinoma. J Cancer. 2024;15(7):2045–65.
Li Y, et al. Combined bulk RNA and single-cell RNA analyses reveal TXNL4A as a new biomarker for hepatocellular carcinoma. Front Oncol. 2023;13:1202732.
Li S, et al. Integrated analysis of single-cell and bulk RNA-sequencing reveals tumor heterogeneity and a signature based on NK cell marker genes for predicting prognosis in hepatocellular carcinoma. Front Pharmacol. 2023;14:1200114.
Tang X, et al. Single-cell RNA-seq and bulk RNA-seq explore the prognostic value of exhausted T cells in hepatocellular carcinoma. IET Syst Biol. 2023;17(4):228–44.
Liu Y, et al. CCT3 acts upstream of YAP and TFCP2 as a potential target and tumour biomarker in liver cancer. Cell Death Dis. 2019;10(9):644.
Liu W, et al. Current understanding on the role of CCT3 in cancer research. Front Oncol. 2022;12:961733.
Cheng L, et al. The role of CRYAB in tumor prognosis and immune infiltration: A Pan-cancer analysis. Front Surg. 2022;9:1117307.
Han J, et al. Glycolysis-related lncRNA TMEM105 upregulates LDHA to facilitate breast cancer liver metastasis via sponging miR-1208. Cell Death Dis. 2023;14(2):80.
Kumar S, et al. Assessments of TP53 and CTNNB1 gene hotspot mutations in circulating tumour DNA of hepatitis B virus-induced hepatocellular carcinoma. Front Genet. 2023;14:1235260.
Calderaro J, et al. Histological subtypes of hepatocellular carcinoma are related to gene mutations and molecular tumour classification. J Hepatol. 2017;67(4):727–38.
Huang A, et al. Construction of a tumor immune infiltration macrophage signature for predicting prognosis and immunotherapy response in liver cancer. Front Mol Biosci. 2022;9:983840.
Lu H, et al. Polysaccharide krestin is a novel TLR2 agonist that mediates inhibition of tumor growth via stimulation of CD8 T cells and NK cells. Clin Cancer Res. 2011;17(1):67–76.
Chen X, et al. Clinical effect of iodine-125 seed implantation in patients with primary liver cancer and its effect on Th1/Th2 cells in peripheral blood. J Oncol. 2021;2021:6199732.
Nicolai CJ, et al. NK cells mediate clearance of CD8(+) T cell-resistant tumors in response to STING agonists. Sci Immunol. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/sciimmunol.aaz2738.
Fiedler ERC, et al. In vivo RNAi screening identifies Pafah1b3 as a target for combination therapy with TKIs in BCR-ABL1(+) BCP-ALL. Blood Adv. 2018;2(11):1229–42.
Jiang H, et al. The multikinase inhibitor axitinib in the treatment of advanced hepatocellular carcinoma: the current clinical applications and the molecular mechanisms. Front Immunol. 2023;14:1163967.
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Qunfang Zhou, Jingqiang Wu and Jiaxin Bei. The first draft of the manuscript was written by Qunfang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
40246_2024_698_MOESM1_ESM.pdf
Additional file1: Figure 1. Distribution of risk score based on the model A and Patterns of survival status and survival time of each patient B in the TCGA-LIHC cohort.
40246_2024_698_MOESM2_ESM.pdf
Additional file2: Figure 2. Comparison of MDSC A, CAF B and Exclusion score C between the high and low risk groups in the TCGA-LIHC cohort. *p-value < 0.05, **p-value < 0.01.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhou, Q., Wu, J., Bei, J. et al. Integration of single-cell sequencing and drug sensitivity profiling reveals an 11-gene prognostic model for liver cancer. Hum Genomics 18, 132 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40246-024-00698-2
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40246-024-00698-2