Volver a Noticias Científicas
Investigación Científica

Detalles del Artículo

TIMEL: La integración de aprendizaje profundo y métodos estadísticos revela patrones espaciales del estroma y del sistema inmunitario en el cáncer colorrectal agresivo.

¿Qué significa esto para los pacientes?

AI

Inicia sesión o regístrate para generar explicaciones con IA

Caracterizar el microambiente inmunitario tumoral (TIME) es esencial para comprender las respuestas antitumorales en el cáncer de colon.

Este estudio presenta TIME Landscaper (TIMEL), un marco computacional que utiliza el aprendizaje profundo para identificar estructuras tisulares a nivel microscópico y resume su distribución en imágenes de porta completa (WSI) mediante descriptores estadísticos que reflejan la heterogeneidad intratumoral para el descubrimiento de biomarcadores pronósticos. Se evaluó el rendimiento de seis modelos de clasificación de imágenes de aprendizaje profundo (Inception V3, DenseNet-121, ViT-base, UNI, Prov-GigaPath y Virchow) para segmentar las áreas tumorales, estromales e inmunitarias a nivel microarquitectónico, utilizando casi 50.000 mini-parches. El modelo seleccionado se aplicó a las WSI de las cohortes de descubrimiento (TCGA-COAD; n = 411) y validación (Dartmouth; n = 108), segmentando estos componentes para mapear su distribución espacial. A partir de estos mapas, se calcularon 30 descriptores estadísticos que representan la abundancia, la variación, la forma, la suavidad espacial y la heterogeneidad espacial como características de TIMEL a nivel de porta.

Los análisis de regresión univariante y multivariante identificaron biomarcadores asociados a la supervivencia y relacionados con la metástasis de las cohortes de descubrimiento y validación, respectivamente. Virchow superó a otros modelos en la segmentación de los compartimentos tisulares (AUC: 0,99, 0,98, 0,98 para los componentes tumoral, estromal e inmunitario). La varianza estromal e inmunitaria a nivel de porta se correlacionó con las métricas evaluadas por el patólogo (R = 0,47 y 0,32, ambas p < 0,001). La agrupación espacial inmunitaria predijo una peor supervivencia (HR = 2,73; p < 0,001), mientras que la agrupación tumoral fue protectora (HR = 0,23; p = 0,001).

La suavidad espacial tumoral se asoció con la metástasis en los ganglios linfáticos (OR = 1,65; p = 0,02). TIMEL integra el aprendizaje profundo y la estadística para capturar la heterogeneidad histológica dentro del TIME, mejorando la evaluación pronóstica y apoyando la oncología de precisión a partir de la histología de rutina.

PubMed Central ~7,417 palabras · 38 min de lectura

Colorectal cancer (CRC) ranks among the leading causes of cancer-related mortality globally, with an alarming shift in incidence toward younger demographics,[1] underscoring the need for innovative prognostic strategies.[2] At the heart of current oncology research lies the tumor immune microenvironment (TIME),[3] a complex matrix of interacting immune, stromal, and tumor cells—including T cells, B cells, NK cells, and monocytes—whose spatial distribution and coordinated activity critically shape anti-tumor responses and influence outcomes in colorectal adenocarcinoma. Understanding the TIME is paramount for developing prognostic models and advancing immunotherapeutic strategies to enhance tumor cytotoxicity.

However, the TIME exhibits profound complexity and patient-specific variability, posing significant challenges for effective analysis. It encompasses a heterogeneous mix of non-malignant cell types and sublineages, each with distinct biological roles and engaged in intricate cellular crosstalk. Deciphering these multifaceted interactions remains a formidable task. Although recent advances in biomedical technologies—such as spatial transcriptomics—offer powerful tools to address this heterogeneity, the tumor compartment remains deeply embedded within, and functionally entangled with, the surrounding immune and stromal infiltrates. Furthermore, spatial heterogeneity within TIME adds another layer of complexity,[4] with the microenvironment differing markedly between central and peripheral tumor regions. Therefore, innovative methods to separate and analyze TIME's constituent components and spatial interactions are crucial. Such approaches could significantly enhance characterization efforts when combined with clinical data, imaging, morphological assessments, and multiomics.

Histopathological examination of slides, routinely stained with hematoxylin and eosin (H&E), remains central to tumor diagnosis and prognostication. Recently, there has been a growing focus on computational approaches to glean insights from digitized tissue slides, known as whole-slide images (WSIs), enhancing our comprehension and characterization of the TIME. Artificial intelligence (AI), particularly through machine learning (ML) and deep learning (DL) approaches, offers transformative potential in advancing our understanding of tumor biology. These models excel at synthesizing complex associations from high-dimensional, multimodal biomedical data, including gigapixel WSIs.[5]

The accumulation of millions of digitized tissue slides has facilitated the development of Pathology Foundation models—neural networks with billions of parameters that capture histological variation across vast patient cohorts. Whereas these models have proven effective in tumor characterization and prognostication, their interpretability remains a challenge.[6] Current interpretation techniques often provide pixel-wise or patch-wise attribution post-hoc, pinpointing key cellular aggregates or tissue regions.[7], [8], [9], [10], [11] Whereas algorithms like HoverNet[12] can capture cellular heterogeneity, they do not consider contributions from the surrounding extracellular matrix elements. However, meaningful interpretation of histopathological features requires an integrated understanding that combines domain-specific expertise with computational insights.

Our computational approach addresses these limitations by shifting the focus from isolated, cell-level features to what we term landscape-level metrics/parameters—comprehensive statistical descriptors that summarize the spatial architecture and cellular/matrix composition of the TIME. This broader perspective captures the emergent properties of immune–tumor interactions and forms the basis of what we define as the tumor immune microenvironment landscape (TIMEL). Developing methods to translate histopathological features into clinically interpretable biomarkers of TIME components can potentially complement existing disease prognostication and management. Our previous TumOr And STroma classifier (TOAST) framework introduced a set of eight landscape parameters derived from DL models and statistical analysis, demonstrating a proof-of-concept for such calculations.[13] However, this approach has its own limitations: it does not adequately address the immune component, a critical aspect of TIME, and lacks spatial statistics, relying solely on compositional (i.e., non-spatial) variation of stromal and tumor elements. Furthermore, the accuracy of visual information extraction could be improved by utilizing pretrained foundation models, which offer more robust representations.[14], [15], [16]

Building on our previous work,[13] we introduce a hybrid framework, named TIMEL, designed to capture interpretable histopathological descriptors of the TIMEL. This framework leverages DL to extract cellular/matrix features and employs statistical modeling to represent their distribution and heterogeneity. Expanding upon the TOAST framework,[13] TIMEL introduces 30 landscape parameters that explicitly incorporate the immune compartment alongside tumor and stromal components. These parameters include seven non-spatial/compositional and three spatial statistics per component, quantifying abundance, variation, morphology, extremeness, spatial hotspotting, smoothness, and heterogeneity within TIME. By directly modeling intratumoral tissue-slide level heterogeneity through these interpretable landscape markers—rather than relying on post-hoc interpretation—our approach more accurately reflects the complex spatial organization of TIME and its prognostic significance.

Materials and methods

Method overview

This study employs a comprehensive approach to analyze histopathological WSIs from colon cancer patients using the TIMEL framework, which integrates DL and statistical analysis to assess TIME heterogeneity. The methods have been summarized below:1.Data collection: The research utilizes two cohorts: a discovery cohort from the TCGA-COAD dataset for development of spatial cell-type maps and initial survival analysis of landscape descriptors and an in-house validation cohort from Dartmouth–Hitchcock Medical Center (DHMC) for validation of metastasis-related outcomes. The discovery cohort consists of 411 WSIs after excluding suboptimal samples (due to duplication, poor image quality, or absence of identifiable tumor cells), with 10 WSIs used for training and cross-validation (CV) of cell-level models requiring substantial pathologist annotation. The validation cohort includes 108 patients, with slides stained and imaged at high resolution (40×; Leica Aperio GT450). Manual and automated annotations are performed to identify tissue types and tumor regions, with fine-grained histology annotations covering various cellular/matrix components.2.Cell-level, ROI-level & slide-level approaches: The TIMEL framework's DL approach classifies histological patterns at cellular resolution within WSIs, whereas the downstream statistical approach summarizes these classifications into patient-level landscape parameters that summarize tissue heterogeneity. The DL model is trained using mini-patches and employs pretrained neural networks for feature extraction, followed by CV to ensure robust model performance. A set of 30 landscape parameters were derived to characterize the TIME, capturing both non-spatial/compositional and spatial properties.3.Internal validation against pathologist-annotated spatial TSR/TIL maps: These descriptors were correlated against pathologist-derived metrics, such as local tumor–stromal ratio (TSR) and tumor-infiltrating lymphocyte (TIL) estimates, to ensure practical relevance.4.Patient-level validation: Further analyses explored the interdependencies among landscape parameters, their correlations with clinical variables, and their prognostic significance through univariate, multivariate, and feature selection analyses across both discovery and validation cohorts.### Samples and cohorts

Fig. 1A illustrates the overall study design.

Development/Discovery cohort: We extracted data for 461 patients from the TCGA-COAD dataset. Upon review of 459 diagnostic slides, 48 were excluded due to duplication, poor image quality, or absence of identifiable tumor cells. The remaining 411 WSIs formed the basis of our analysis. Of these, nearly 50,000 mini-patches from within 10 WSIs were randomly selected and annotated for use in cell-level training, CV, and model selection. All 411 WSIs were subsequently utilized for inference tasks. It is important to note that whereas the development cohort includes a small number of cases from the discovery cohort (10 out of 411 WSIs; 2.4%), the inference tasks are methodologically distinct on a different scale (patient-level) from the training phase (cell-level). The H&E WSIs for TCGA-COAD were obtained via the GDC portal (https://portal.gdc.cancer.gov) using various scanners (including Leica GT450/AT2/CS2 and Hamamatsu S60/S360). Clinical data were obtained from cBioPortal (https://www.cbioportal.org) and the GDC portal, and included the following parameters: age, gender, TNM stage, AJCC stage, disease-free survival (DFS), progression-free survival (PFS), and overall survival (OS).

In-house validation cohort: We included 108 CRC patients from DHMC (2016–2019) with IRB approval. Some of this cohort was part of our previous study.[17], [18] Slides were stained with H&E and imaged at 40× resolution using Leica Aperio AT2/GT450. Clinicopathological parameters collected included age, gender, microsatellite instability (MSI) status, and AJCC pTNM stages. Table 1 details the clinicopathological differences between cohorts.

Full individual patient data for both cohorts, including our calculated landscape parameters, were reported in Supplementary Table S1.### Manual annotations, automated tissue detection, and patchification

Two types of annotations were used: regional annotations defining macroscopic tumor regions that set the spatial bounds of analysis, and fine-grained histology annotations nested within these regions to label microscopic cellular and microarchitectural components. Mini-patches were extracted from fine-grained annotations for DL model development, whereas the spatial distribution of segmented microscopic tissue areas was quantified more broadly within the bounds of the regional tumor annotations.

Fine-grained histology annotation: Fine-grained tissue and cellular/matrix annotations (Fig. 1B) were performed by a pathologist, encompassing a total of 49,479 mini-patches. Each mini-patch had a resolution of 22.4 × 22.4 μm2, equivalent to approximately 90 pixels2, and could cover between 1 and 5 cells. The annotation classes included tumor cells (n = 10,000), collagenous stroma (n = 10,000), immune cells (n = 8190), necrosis (n = 6708), blood (n = 3434), fat (n = 6247), and whitespace (n = 4900). These annotations were performed across 10 WSIs randomly selected from the development cohort. To ensure accuracy, the annotations were re-reviewed by two board-certified pathologists.

Cancer regional annotation: To ensure analysis was confined to tumor regions, macroarchitectural regions such as the tumor bed areas (Fig. 1B) were manually annotated using polygonal tools in QuPath across 411 WSIs. These annotations, also performed by a pathologist, included multiple polygons for spatially distinct, non-contiguous tumor regions to ensure accuracy.

Automated tissue detection: To improve the precision of cancer region annotations, we integrated manually drawn polygons with tissue detection polygons automatically generated using the lazyslide package[19] (version 0.7.1). This hybrid approach preserves the cancer-specific focus of manual annotation while minimizing the inclusion of non-tissue (whitespace) areas through automated tissue detection (Fig. 1C).

Automated patchification: Annotated regions were subdivided into patches using the lazyslide algorithm. Each resulting patch measured 224 × 224 μm2. These patches were further partitioned into 100 non-overlapping mini-patches, each sized 22.4 × 22.4 μm2 (Fig. 1C). The resulting mini-patches served as input for the DL core of the TIMEL framework.### Development of the TIMEL framework

The TIMEL framework consists of two sequential components: a DL-based cell-level analysis and a statistical module for patient-level characterization. First, the DL model processes WSIs to identify key histological patterns, producing probability maps for each histological class. These maps are then fed into the statistical module, which aggregates the probabilities to compute WSI-level metrics—referred to as TIMEL landscape parameters. Practically, the framework takes a WSI as input and outputs quantitative parameters that summarize the histological composition and spatial heterogeneity of tumor regions (see Fig. 2A). The TIMEL parameters thus offer a structured, quantitative summary of tissue heterogeneity, supporting objective and reproducible histopathological assessment.

Description of deep learning algorithm

Model architecture: The following DL image classification models were used to classify mini-patches based on their assigned tissue structure (tumor, stroma, immune, necrosis, blood, fat, and whitespace): Inception V3,[20] DenseNet201,[21] ViT-base,[22] UNI[14], Prov-GigaPath,[16] and Virchow.[15] Each model was augmented with a 64-dimensional fully connected hidden layer followed by Batch Normalization and ReLU activation, and a final classification layer. All models were pretrained (i.e., initially trained on other datasets)—this was done to leverage existing feature representations and reduce the need for extensive task-specific training data. Inception V3, DenseNet201, and ViT-base were pretrained on the ImageNet-1K dataset, whereas UNI, Prov-GigaPath, and Virchow are histopathology foundation models pretrained using self-supervised learning on up to 1 million WSIs. These models were selected based on their widespread use and frequent comparison in prior histopathology studies.

Given the expectation that foundation models capture robust and generalizable histological features across institutions, they were hypothesized to outperform ImageNet-pretrained models, which served as baselines. Accordingly, all model parameters except the final classification head were frozen during training. This transfer learning strategy is standard in histopathology applications with limited labeled data, reducing computational cost and implementation challenges while mitigating catastrophic forgetting of general histological features. Preliminary empirical experiments during method development suggested that fine-tuning foundation model backbones degraded performance in the context of our study, motivating the adoption of a frozen-backbone approach. Comprehensive comparisons of architectures and fine-tuning strategies were considered beyond the scope of this study, which focused primarily on downstream statistical characterization and comparison of tissue spatial descriptors.

Feature extraction and model training protocol: Mini-patches were resized to 224 × 224 pixels and processed using pretrained neural networks for feature extraction and classification. Models were trained using the Adam optimizer with cross-entropy loss. A coarse hyperparameter search identified a batch size of 128, a learning rate of 1 × 10−5, and 15 training epochs as optimal. We employed 5-fold CV to train, select, and evaluate our models. In each fold, models were trained on mini-patches from 8 WSIs, totaling about 40,000 mini-patches, whereas the remaining 2 WSIs (approximately 10,000 mini-patches) were used for validation. Importantly, validation data did not influence early stopping. This process was repeated five times, allowing each fold to serve as the validation set once.

Model evaluation and selection: We selected the model with the best performance metrics—such as cross-entropy loss, accuracy, class-specific AUCs, precision, recall, and F1 scores—across all validation folds, with average class-specific AUC as the primary decision criterion. This top-performing model was then retrained on all 10 WSIs to prepare for subsequent statistical modeling tasks.### Development of statistical core

Generating histology probability maps: Within the annotated tumor regions, mini-patches were classified based on predicted cellular and microarchitectural components. We counted the number of mini-patches for each class within a larger ∼224 × 224 micron[2] (∼900 × 900 pixels2 at 40× resolution) regional patch to form a local estimate of the physical area of each component. These estimates were normalized into patch-wise probabilities for tumor, stroma, and immune proportions (Ptumor, Pstroma, and Pimmune) (see Fig. 2B). Ptumor is conceptually related to regional TSR, whereas Pimmune is indirectly associated with TIL.

Statistical descriptors (or landscape parameters): To comprehensively characterize the TIME, we derived 10 statistical descriptors for each of Ptumor, Pstroma, and Pimmune. These include seven non-spatial/compositional features—mean, variance, skewness, kurtosis, Dirichlet's concentration parameter, and the 5th and 95th percentiles as well as three spatial features—Moran's spatial autocorrelation, spatial Gaussian process (GP) length scale, and spatial Gini index. The selection of these descriptors was motivated by their ability to capture both the distributional properties and spatial organization of each cell type/stromal matrix element, providing a multidimensional view of TIME heterogeneity. Supplementary Table S2 details the abbreviations and definitions of these parameters, whereas Supplementary File 1 provides the corresponding motivation, formulas, and estimation algorithms. By adopting these “landscape parameters,” we expanded our analysis to 30 descriptors, representing a substantial increase from the 8 features used in our previous work.[13] The rationale for selecting these parameters is further discussed in the Interpretations of TIMEL Landscape Parameters section.### Association of histology probabilities with standard pathology metrics

To validate and contextualize our computational histology probabilities, we compared them with established pathologist-derived metrics: the TSR and TIL score. For TSR, representative high-stroma regions of interest (ROIs) were selected in accordance with standard pathology practice, typically at the tumor invasive front where stromal content is most prominent and tumor cells are present on all sides of the field.[23] Within these ROIs, TSR was calculated as the proportion of tumor area relative to the combined areas of tumor, stroma, and immune compartments, using the following formula:

For TILs, pathologists evaluate only the lymphocytes and plasma cells located within the stromal compartment of the invasive tumor, explicitly excluding immune infiltrates outside the tumor border, lymphoid aggregates, and lymphocytes within necrotic areas or tumor cell nests. The assessment involves estimating the percentage of the stromal area within the tumor that is occupied by mononuclear inflammatory cells, typically across the entire invasive tumor area rather than focusing on hotspots.[24] This estimation is generally performed at intermediate magnification, such as 200×–400× total magnification, to ensure a representative and reproducible evaluation:

In this study, landscape parameters derived from the probabilities of tumor (Ptumor), stroma (Pstroma), and immune (Pimmune) compartments serve as direct, albeit imperfect, surrogates for the corresponding metrics traditionally assessed by pathologists. For instance, the parameter closely resembles a pathologist-derived TSR estimate, which evaluates TSR in stroma-rich spots, whereas is conceptually similar to the quantification of TIL. This methodology enables a robust and quantitative comparison between digital and manual assessments, thereby supporting the clinical relevance and interpretability of our computational landscape parameters.### Comparative analyses of TSR, TIL, and landscape parameters

In this section, our hypothesis is that there are significant associations between human-evaluating TSR and TIL and the corresponding landscape parameters ( and ). We extracted previously scored TSR and TIL of the TCGA-COAD cohort from previous studies.[25], [26] TSR was scored, double-checked, and discussed for consensus by three human evaluators,[25] one of whom is a board-certified pathologist. TIL was computed, using the fraction of TIL-positive patches within all patches, using a DL model in a previous study. The —stromal percentage (which is practically 100% – TSR) and —TIL Spearman's correlation analyses were used to evaluate these postulated relationships.

To test our hypotheses, we performed Spearman's correlation analyses between: (i) and human-evaluated stromal percentage and (ii) and TIL score. These analyses were designed to evaluate the strength and direction of the relationships between human-assessed and computationally derived landscape parameters.### Interpretations of TIMEL landscape parameters

Interpreting landscape parameters is crucial for understanding the TIME. These statistical descriptors provide insights into the distribution and characteristics of various components within the tumor, thereby contextualizing the complex phenomena observed in TIME. For further specific details about the motivation, calculation process, abbreviations, and examples of each TIMEL landscape parameter, see Supplementary File 1 and Supplementary Table 2.

The spatial dynamics of each TIME component (tumor, stroma, and immune) can be characterized by both non-spatial/compositional and spatial properties. Non-spatial/compositional properties encompass abundance, variation, shape, and extremeness. Abundance, represented by mean (μ), reflects the overall prevalence of a given TIME component across the WSI. Variation, quantified by variance (σ2), describes the degree of fluctuation or heterogeneity of the feature throughout the WSI. The shape property, which includes skewness (γ) and concentration parameters (αtumor, αstroma, and αimmune), provides information about the distribution of intensity values, indicating where the majority of the TIME intensity map is concentrated. Extremeness, characterized by kurtosis (κ), 5th quantile (q5), and 95th quantile (q95), measures the presence and magnitude of outlier values in the TIME feature intensity, highlighting rare but potentially significant regions.

In addition to these non-spatial/compositional properties, spatial characteristics are also crucial. In this study, we assess spatial organization through hotspotting, smoothness, and heterogeneity properties, quantified by spatial autocorrelation Moran's I, spatial GP length scale , and spatial Gini index , respectively. Spatial hotspotting measures the degree to which similar values of a TIME feature cluster together spatially, thereby identifying regions of high (hot spots) and low (cold spots) concentration across the WSI. Spatial smoothness captures how gradually or abruptly a TIME feature changes across the tissue, quantified by the length scale , where a larger length scale indicates smoother, more gradual transitions. Spatial heterogeneity quantifies the inequality in the spatial distribution of a TIME feature using the spatial Gini index , with higher values indicating greater disparity and unevenness across the WSI.

Together, these landscape parameters provide a comprehensive and reproducible framework for summarizing both the composition and spatial architecture of the TIME, thereby supporting objective and nuanced histopathological assessment.### Clinicopathological analyses

We also performed clinical correlation analyses between landscape parameters and other variables, including age, gender, T, N, M, AJCC stages, and MSI status. Spearman's rank correlation was used for both continuous variables, like age, and ordinal variables such as T, N, and AJCC stages due to its ability to handle nonlinear relationships and rank-based data effectively. For categorical variables, the Wilcoxon rank-sum test was applied to gender and M status, as it is suitable for comparing two independent groups. The Kruskal–Wallis test was used for MSI status, allowing for the comparison of more than two groups when the data are non-normally distributed.### Prognostic landscape parameters

To investigate prognostic landscape parameters, we conducted an in-depth analysis of data from 411 patients in the TCGA-COAD dataset. The cohort was randomly divided into training and testing sets in a 70:30 ratio, ensuring a robust framework for model evaluation. We employed Elastic Net Cox proportion hazards regression and Elastic Net logistic regression models to pinpoint key prognostic parameters and clinical variables, including age, gender, and TNM stages. CV was utilized to fine-tune the balance between Lasso/L1 and Ridge/L2 penalties, exploring ratios from 0.0 to 1.0 in increments of 0.1.

Risk scores were computed for the test set, with model performance assessed using accuracy for ordinal outcomes (N and M stages) and the concordance index (C-index) for censored outcomes (PFS and OS). To ascertain feature importance, we extracted variables with non-zero coefficients and incorporated them into conventional multivariable Cox or logistic regression models, facilitating the evaluation of their collective impact on clinical outcomes. Tumor stage was consolidated into two categories—early (T1–T2) and advanced (T3–T4) T-stage—to account for limited data availability in the T1 subgroup.

We conducted complementary univariate analyses using Kaplan–Meier and receiver operating characteristic (ROC) analyses to evaluate the prognostic significance of landscape parameters. Kaplan–Meier analyses were performed to assess OS and PFS based on each landscape parameter, whereas ROC analyses were applied to evaluate their predictive power for N and M stages. To determine optimal cutpoints for landscape parameters, we utilized the survcutpoint() function from the survminer_ R package, which employs maximally selected rank statistics. In our analyses, we used the default standardized log-rank test statistic as the ranking metric. Kaplan–Meier analyses were subsequently conducted to compare groups with low and high values of each landscape parameter, as defined by these cutpoints.### Validation of prognostic landscape parameters in in-house validation cohort

Given the matched case–control design of our study cohort, which concentrated on metastasis outcomes and had limited mortality follow-up data, we prioritized N stage and M stage as primary outcomes. To minimize the false-discovery rate (FDR), our analysis centered on landscape parameters previously demonstrated to be significant in relation to N and M stages within the Prognostic Landscape Parameters section. We conducted both univariate and multivariate analyses, adjusting for clinical covariates such as age, gender, T stage, and MSI status. This comprehensive approach allowed us to evaluate the independent and combined effects of landscape parameters and clinical factors, providing a nuanced understanding of their roles in predicting metastasis outcomes.### Analysis platforms

The model training process and model prediction were performed in Python version 3.8.19 (Python software foundation) using PyTorch version 2.4.0, Torchvision version 0.19.0, and CUDA12.1 libraries (Meta Platforms, Inc.), run on cluster Graphics Processing Units Tesla V100 with a 32GB memory. Other analyses including regression, survival, and bioinformatics were conducted in R version 4.3.1 (The R Foundation, Vienna, Austria).

Method overview

This study employs a comprehensive approach to analyze histopathological WSIs from colon cancer patients using the TIMEL framework, which integrates DL and statistical analysis to assess TIME heterogeneity. The methods have been summarized below:1.Data collection: The research utilizes two cohorts: a discovery cohort from the TCGA-COAD dataset for development of spatial cell-type maps and initial survival analysis of landscape descriptors and an in-house validation cohort from Dartmouth–Hitchcock Medical Center (DHMC) for validation of metastasis-related outcomes. The discovery cohort consists of 411 WSIs after excluding suboptimal samples (due to duplication, poor image quality, or absence of identifiable tumor cells), with 10 WSIs used for training and cross-validation (CV) of cell-level models requiring substantial pathologist annotation. The validation cohort includes 108 patients, with slides stained and imaged at high resolution (40×; Leica Aperio GT450). Manual and automated annotations are performed to identify tissue types and tumor regions, with fine-grained histology annotations covering various cellular/matrix components.2.Cell-level, ROI-level & slide-level approaches: The TIMEL framework's DL approach classifies histological patterns at cellular resolution within WSIs, whereas the downstream statistical approach summarizes these classifications into patient-level landscape parameters that summarize tissue heterogeneity. The DL model is trained using mini-patches and employs pretrained neural networks for feature extraction, followed by CV to ensure robust model performance. A set of 30 landscape parameters were derived to characterize the TIME, capturing both non-spatial/compositional and spatial properties.3.Internal validation against pathologist-annotated spatial TSR/TIL maps: These descriptors were correlated against pathologist-derived metrics, such as local tumor–stromal ratio (TSR) and tumor-infiltrating lymphocyte (TIL) estimates, to ensure practical relevance.4.Patient-level validation: Further analyses explored the interdependencies among landscape parameters, their correlations with clinical variables, and their prognostic significance through univariate, multivariate, and feature selection analyses across both discovery and validation cohorts.

Samples and cohorts

Fig. 1A illustrates the overall study design.

Development/Discovery cohort: We extracted data for 461 patients from the TCGA-COAD dataset. Upon review of 459 diagnostic slides, 48 were excluded due to duplication, poor image quality, or absence of identifiable tumor cells. The remaining 411 WSIs formed the basis of our analysis. Of these, nearly 50,000 mini-patches from within 10 WSIs were randomly selected and annotated for use in cell-level training, CV, and model selection. All 411 WSIs were subsequently utilized for inference tasks. It is important to note that whereas the development cohort includes a small number of cases from the discovery cohort (10 out of 411 WSIs; 2.4%), the inference tasks are methodologically distinct on a different scale (patient-level) from the training phase (cell-level). The H&E WSIs for TCGA-COAD were obtained via the GDC portal (https://portal.gdc.cancer.gov) using various scanners (including Leica GT450/AT2/CS2 and Hamamatsu S60/S360). Clinical data were obtained from cBioPortal (https://www.cbioportal.org) and the GDC portal, and included the following parameters: age, gender, TNM stage, AJCC stage, disease-free survival (DFS), progression-free survival (PFS), and overall survival (OS).

In-house validation cohort: We included 108 CRC patients from DHMC (2016–2019) with IRB approval. Some of this cohort was part of our previous study.[17], [18] Slides were stained with H&E and imaged at 40× resolution using Leica Aperio AT2/GT450. Clinicopathological parameters collected included age, gender, microsatellite instability (MSI) status, and AJCC pTNM stages. Table 1 details the clinicopathological differences between cohorts.

Full individual patient data for both cohorts, including our calculated landscape parameters, were reported in Supplementary Table S1.

Manual annotations, automated tissue detection, and patchification

Two types of annotations were used: regional annotations defining macroscopic tumor regions that set the spatial bounds of analysis, and fine-grained histology annotations nested within these regions to label microscopic cellular and microarchitectural components. Mini-patches were extracted from fine-grained annotations for DL model development, whereas the spatial distribution of segmented microscopic tissue areas was quantified more broadly within the bounds of the regional tumor annotations.

Fine-grained histology annotation: Fine-grained tissue and cellular/matrix annotations (Fig. 1B) were performed by a pathologist, encompassing a total of 49,479 mini-patches. Each mini-patch had a resolution of 22.4 × 22.4 μm2, equivalent to approximately 90 pixels2, and could cover between 1 and 5 cells. The annotation classes included tumor cells (n = 10,000), collagenous stroma (n = 10,000), immune cells (n = 8190), necrosis (n = 6708), blood (n = 3434), fat (n = 6247), and whitespace (n = 4900). These annotations were performed across 10 WSIs randomly selected from the development cohort. To ensure accuracy, the annotations were re-reviewed by two board-certified pathologists.

Cancer regional annotation: To ensure analysis was confined to tumor regions, macroarchitectural regions such as the tumor bed areas (Fig. 1B) were manually annotated using polygonal tools in QuPath across 411 WSIs. These annotations, also performed by a pathologist, included multiple polygons for spatially distinct, non-contiguous tumor regions to ensure accuracy.

Automated tissue detection: To improve the precision of cancer region annotations, we integrated manually drawn polygons with tissue detection polygons automatically generated using the lazyslide package[19] (version 0.7.1). This hybrid approach preserves the cancer-specific focus of manual annotation while minimizing the inclusion of non-tissue (whitespace) areas through automated tissue detection (Fig. 1C).

Automated patchification: Annotated regions were subdivided into patches using the lazyslide algorithm. Each resulting patch measured 224 × 224 μm2. These patches were further partitioned into 100 non-overlapping mini-patches, each sized 22.4 × 22.4 μm2 (Fig. 1C). The resulting mini-patches served as input for the DL core of the TIMEL framework.

Development of the TIMEL framework

The TIMEL framework consists of two sequential components: a DL-based cell-level analysis and a statistical module for patient-level characterization. First, the DL model processes WSIs to identify key histological patterns, producing probability maps for each histological class. These maps are then fed into the statistical module, which aggregates the probabilities to compute WSI-level metrics—referred to as TIMEL landscape parameters. Practically, the framework takes a WSI as input and outputs quantitative parameters that summarize the histological composition and spatial heterogeneity of tumor regions (see Fig. 2A). The TIMEL parameters thus offer a structured, quantitative summary of tissue heterogeneity, supporting objective and reproducible histopathological assessment.

Description of deep learning algorithm

Model architecture: The following DL image classification models were used to classify mini-patches based on their assigned tissue structure (tumor, stroma, immune, necrosis, blood, fat, and whitespace): Inception V3,[20] DenseNet201,[21] ViT-base,[22] UNI[14], Prov-GigaPath,[16] and Virchow.[15] Each model was augmented with a 64-dimensional fully connected hidden layer followed by Batch Normalization and ReLU activation, and a final classification layer. All models were pretrained (i.e., initially trained on other datasets)—this was done to leverage existing feature representations and reduce the need for extensive task-specific training data. Inception V3, DenseNet201, and ViT-base were pretrained on the ImageNet-1K dataset, whereas UNI, Prov-GigaPath, and Virchow are histopathology foundation models pretrained using self-supervised learning on up to 1 million WSIs. These models were selected based on their widespread use and frequent comparison in prior histopathology studies.

Given the expectation that foundation models capture robust and generalizable histological features across institutions, they were hypothesized to outperform ImageNet-pretrained models, which served as baselines. Accordingly, all model parameters except the final classification head were frozen during training. This transfer learning strategy is standard in histopathology applications with limited labeled data, reducing computational cost and implementation challenges while mitigating catastrophic forgetting of general histological features. Preliminary empirical experiments during method development suggested that fine-tuning foundation model backbones degraded performance in the context of our study, motivating the adoption of a frozen-backbone approach. Comprehensive comparisons of architectures and fine-tuning strategies were considered beyond the scope of this study, which focused primarily on downstream statistical characterization and comparison of tissue spatial descriptors.

Feature extraction and model training protocol: Mini-patches were resized to 224 × 224 pixels and processed using pretrained neural networks for feature extraction and classification. Models were trained using the Adam optimizer with cross-entropy loss. A coarse hyperparameter search identified a batch size of 128, a learning rate of 1 × 10−5, and 15 training epochs as optimal. We employed 5-fold CV to train, select, and evaluate our models. In each fold, models were trained on mini-patches from 8 WSIs, totaling about 40,000 mini-patches, whereas the remaining 2 WSIs (approximately 10,000 mini-patches) were used for validation. Importantly, validation data did not influence early stopping. This process was repeated five times, allowing each fold to serve as the validation set once.

Model evaluation and selection: We selected the model with the best performance metrics—such as cross-entropy loss, accuracy, class-specific AUCs, precision, recall, and F1 scores—across all validation folds, with average class-specific AUC as the primary decision criterion. This top-performing model was then retrained on all 10 WSIs to prepare for subsequent statistical modeling tasks.### Development of statistical core

Generating histology probability maps: Within the annotated tumor regions, mini-patches were classified based on predicted cellular and microarchitectural components. We counted the number of mini-patches for each class within a larger ∼224 × 224 micron[2] (∼900 × 900 pixels2 at 40× resolution) regional patch to form a local estimate of the physical area of each component. These estimates were normalized into patch-wise probabilities for tumor, stroma, and immune proportions (Ptumor, Pstroma, and Pimmune) (see Fig. 2B). Ptumor is conceptually related to regional TSR, whereas Pimmune is indirectly associated with TIL.

Statistical descriptors (or landscape parameters): To comprehensively characterize the TIME, we derived 10 statistical descriptors for each of Ptumor, Pstroma, and Pimmune. These include seven non-spatial/compositional features—mean, variance, skewness, kurtosis, Dirichlet's concentration parameter, and the 5th and 95th percentiles as well as three spatial features—Moran's spatial autocorrelation, spatial Gaussian process (GP) length scale, and spatial Gini index. The selection of these descriptors was motivated by their ability to capture both the distributional properties and spatial organization of each cell type/stromal matrix element, providing a multidimensional view of TIME heterogeneity. Supplementary Table S2 details the abbreviations and definitions of these parameters, whereas Supplementary File 1 provides the corresponding motivation, formulas, and estimation algorithms. By adopting these “landscape parameters,” we expanded our analysis to 30 descriptors, representing a substantial increase from the 8 features used in our previous work.[13] The rationale for selecting these parameters is further discussed in the Interpretations of TIMEL Landscape Parameters section.

Description of deep learning algorithm

Model architecture: The following DL image classification models were used to classify mini-patches based on their assigned tissue structure (tumor, stroma, immune, necrosis, blood, fat, and whitespace): Inception V3,[20] DenseNet201,[21] ViT-base,[22] UNI[14], Prov-GigaPath,[16] and Virchow.[15] Each model was augmented with a 64-dimensional fully connected hidden layer followed by Batch Normalization and ReLU activation, and a final classification layer. All models were pretrained (i.e., initially trained on other datasets)—this was done to leverage existing feature representations and reduce the need for extensive task-specific training data. Inception V3, DenseNet201, and ViT-base were pretrained on the ImageNet-1K dataset, whereas UNI, Prov-GigaPath, and Virchow are histopathology foundation models pretrained using self-supervised learning on up to 1 million WSIs. These models were selected based on their widespread use and frequent comparison in prior histopathology studies.

Given the expectation that foundation models capture robust and generalizable histological features across institutions, they were hypothesized to outperform ImageNet-pretrained models, which served as baselines. Accordingly, all model parameters except the final classification head were frozen during training. This transfer learning strategy is standard in histopathology applications with limited labeled data, reducing computational cost and implementation challenges while mitigating catastrophic forgetting of general histological features. Preliminary empirical experiments during method development suggested that fine-tuning foundation model backbones degraded performance in the context of our study, motivating the adoption of a frozen-backbone approach. Comprehensive comparisons of architectures and fine-tuning strategies were considered beyond the scope of this study, which focused primarily on downstream statistical characterization and comparison of tissue spatial descriptors.

Feature extraction and model training protocol: Mini-patches were resized to 224 × 224 pixels and processed using pretrained neural networks for feature extraction and classification. Models were trained using the Adam optimizer with cross-entropy loss. A coarse hyperparameter search identified a batch size of 128, a learning rate of 1 × 10−5, and 15 training epochs as optimal. We employed 5-fold CV to train, select, and evaluate our models. In each fold, models were trained on mini-patches from 8 WSIs, totaling about 40,000 mini-patches, whereas the remaining 2 WSIs (approximately 10,000 mini-patches) were used for validation. Importantly, validation data did not influence early stopping. This process was repeated five times, allowing each fold to serve as the validation set once.

Model evaluation and selection: We selected the model with the best performance metrics—such as cross-entropy loss, accuracy, class-specific AUCs, precision, recall, and F1 scores—across all validation folds, with average class-specific AUC as the primary decision criterion. This top-performing model was then retrained on all 10 WSIs to prepare for subsequent statistical modeling tasks.

Development of statistical core

Generating histology probability maps: Within the annotated tumor regions, mini-patches were classified based on predicted cellular and microarchitectural components. We counted the number of mini-patches for each class within a larger ∼224 × 224 micron[2] (∼900 × 900 pixels2 at 40× resolution) regional patch to form a local estimate of the physical area of each component. These estimates were normalized into patch-wise probabilities for tumor, stroma, and immune proportions (Ptumor, Pstroma, and Pimmune) (see Fig. 2B). Ptumor is conceptually related to regional TSR, whereas Pimmune is indirectly associated with TIL.

Statistical descriptors (or landscape parameters): To comprehensively characterize the TIME, we derived 10 statistical descriptors for each of Ptumor, Pstroma, and Pimmune. These include seven non-spatial/compositional features—mean, variance, skewness, kurtosis, Dirichlet's concentration parameter, and the 5th and 95th percentiles as well as three spatial features—Moran's spatial autocorrelation, spatial Gaussian process (GP) length scale, and spatial Gini index. The selection of these descriptors was motivated by their ability to capture both the distributional properties and spatial organization of each cell type/stromal matrix element, providing a multidimensional view of TIME heterogeneity. Supplementary Table S2 details the abbreviations and definitions of these parameters, whereas Supplementary File 1 provides the corresponding motivation, formulas, and estimation algorithms. By adopting these “landscape parameters,” we expanded our analysis to 30 descriptors, representing a substantial increase from the 8 features used in our previous work.[13] The rationale for selecting these parameters is further discussed in the Interpretations of TIMEL Landscape Parameters section.

Association of histology probabilities with standard pathology metrics

To validate and contextualize our computational histology probabilities, we compared them with established pathologist-derived metrics: the TSR and TIL score. For TSR, representative high-stroma regions of interest (ROIs) were selected in accordance with standard pathology practice, typically at the tumor invasive front where stromal content is most prominent and tumor cells are present on all sides of the field.[23] Within these ROIs, TSR was calculated as the proportion of tumor area relative to the combined areas of tumor, stroma, and immune compartments, using the following formula:

For TILs, pathologists evaluate only the lymphocytes and plasma cells located within the stromal compartment of the invasive tumor, explicitly excluding immune infiltrates outside the tumor border, lymphoid aggregates, and lymphocytes within necrotic areas or tumor cell nests. The assessment involves estimating the percentage of the stromal area within the tumor that is occupied by mononuclear inflammatory cells, typically across the entire invasive tumor area rather than focusing on hotspots.[24] This estimation is generally performed at intermediate magnification, such as 200×–400× total magnification, to ensure a representative and reproducible evaluation:

In this study, landscape parameters derived from the probabilities of tumor (Ptumor), stroma (Pstroma), and immune (Pimmune) compartments serve as direct, albeit imperfect, surrogates for the corresponding metrics traditionally assessed by pathologists. For instance, the parameter closely resembles a pathologist-derived TSR estimate, which evaluates TSR in stroma-rich spots, whereas is conceptually similar to the quantification of TIL. This methodology enables a robust and quantitative comparison between digital and manual assessments, thereby supporting the clinical relevance and interpretability of our computational landscape parameters.

Comparative analyses of TSR, TIL, and landscape parameters

In this section, our hypothesis is that there are significant associations between human-evaluating TSR and TIL and the corresponding landscape parameters ( and ). We extracted previously scored TSR and TIL of the TCGA-COAD cohort from previous studies.[25], [26] TSR was scored, double-checked, and discussed for consensus by three human evaluators,[25] one of whom is a board-certified pathologist. TIL was computed, using the fraction of TIL-positive patches within all patches, using a DL model in a previous study. The —stromal percentage (which is practically 100% – TSR) and —TIL Spearman's correlation analyses were used to evaluate these postulated relationships.

To test our hypotheses, we performed Spearman's correlation analyses between: (i) and human-evaluated stromal percentage and (ii) and TIL score. These analyses were designed to evaluate the strength and direction of the relationships between human-assessed and computationally derived landscape parameters.

Interpretations of TIMEL landscape parameters

Interpreting landscape parameters is crucial for understanding the TIME. These statistical descriptors provide insights into the distribution and characteristics of various components within the tumor, thereby contextualizing the complex phenomena observed in TIME. For further specific details about the motivation, calculation process, abbreviations, and examples of each TIMEL landscape parameter, see Supplementary File 1 and Supplementary Table 2.

The spatial dynamics of each TIME component (tumor, stroma, and immune) can be characterized by both non-spatial/compositional and spatial properties. Non-spatial/compositional properties encompass abundance, variation, shape, and extremeness. Abundance, represented by mean (μ), reflects the overall prevalence of a given TIME component across the WSI. Variation, quantified by variance (σ2), describes the degree of fluctuation or heterogeneity of the feature throughout the WSI. The shape property, which includes skewness (γ) and concentration parameters (αtumor, αstroma, and αimmune), provides information about the distribution of intensity values, indicating where the majority of the TIME intensity map is concentrated. Extremeness, characterized by kurtosis (κ), 5th quantile (q5), and 95th quantile (q95), measures the presence and magnitude of outlier values in the TIME feature intensity, highlighting rare but potentially significant regions.

In addition to these non-spatial/compositional properties, spatial characteristics are also crucial. In this study, we assess spatial organization through hotspotting, smoothness, and heterogeneity properties, quantified by spatial autocorrelation Moran's I, spatial GP length scale , and spatial Gini index , respectively. Spatial hotspotting measures the degree to which similar values of a TIME feature cluster together spatially, thereby identifying regions of high (hot spots) and low (cold spots) concentration across the WSI. Spatial smoothness captures how gradually or abruptly a TIME feature changes across the tissue, quantified by the length scale , where a larger length scale indicates smoother, more gradual transitions. Spatial heterogeneity quantifies the inequality in the spatial distribution of a TIME feature using the spatial Gini index , with higher values indicating greater disparity and unevenness across the WSI.

Together, these landscape parameters provide a comprehensive and reproducible framework for summarizing both the composition and spatial architecture of the TIME, thereby supporting objective and nuanced histopathological assessment.

Clinicopathological analyses

We also performed clinical correlation analyses between landscape parameters and other variables, including age, gender, T, N, M, AJCC stages, and MSI status. Spearman's rank correlation was used for both continuous variables, like age, and ordinal variables such as T, N, and AJCC stages due to its ability to handle nonlinear relationships and rank-based data effectively. For categorical variables, the Wilcoxon rank-sum test was applied to gender and M status, as it is suitable for comparing two independent groups. The Kruskal–Wallis test was used for MSI status, allowing for the comparison of more than two groups when the data are non-normally distributed.

Prognostic landscape parameters

To investigate prognostic landscape parameters, we conducted an in-depth analysis of data from 411 patients in the TCGA-COAD dataset. The cohort was randomly divided into training and testing sets in a 70:30 ratio, ensuring a robust framework for model evaluation. We employed Elastic Net Cox proportion hazards regression and Elastic Net logistic regression models to pinpoint key prognostic parameters and clinical variables, including age, gender, and TNM stages. CV was utilized to fine-tune the balance between Lasso/L1 and Ridge/L2 penalties, exploring ratios from 0.0 to 1.0 in increments of 0.1.

Risk scores were computed for the test set, with model performance assessed using accuracy for ordinal outcomes (N and M stages) and the concordance index (C-index) for censored outcomes (PFS and OS). To ascertain feature importance, we extracted variables with non-zero coefficients and incorporated them into conventional multivariable Cox or logistic regression models, facilitating the evaluation of their collective impact on clinical outcomes. Tumor stage was consolidated into two categories—early (T1–T2) and advanced (T3–T4) T-stage—to account for limited data availability in the T1 subgroup.

We conducted complementary univariate analyses using Kaplan–Meier and receiver operating characteristic (ROC) analyses to evaluate the prognostic significance of landscape parameters. Kaplan–Meier analyses were performed to assess OS and PFS based on each landscape parameter, whereas ROC analyses were applied to evaluate their predictive power for N and M stages. To determine optimal cutpoints for landscape parameters, we utilized the survcutpoint() function from the survminer_ R package, which employs maximally selected rank statistics. In our analyses, we used the default standardized log-rank test statistic as the ranking metric. Kaplan–Meier analyses were subsequently conducted to compare groups with low and high values of each landscape parameter, as defined by these cutpoints.

Validation of prognostic landscape parameters in in-house validation cohort

Given the matched case–control design of our study cohort, which concentrated on metastasis outcomes and had limited mortality follow-up data, we prioritized N stage and M stage as primary outcomes. To minimize the false-discovery rate (FDR), our analysis centered on landscape parameters previously demonstrated to be significant in relation to N and M stages within the Prognostic Landscape Parameters section. We conducted both univariate and multivariate analyses, adjusting for clinical covariates such as age, gender, T stage, and MSI status. This comprehensive approach allowed us to evaluate the independent and combined effects of landscape parameters and clinical factors, providing a nuanced understanding of their roles in predicting metastasis outcomes.

Analysis platforms

The model training process and model prediction were performed in Python version 3.8.19 (Python software foundation) using PyTorch version 2.4.0, Torchvision version 0.19.0, and CUDA12.1 libraries (Meta Platforms, Inc.), run on cluster Graphics Processing Units Tesla V100 with a 32GB memory. Other analyses including regression, survival, and bioinformatics were conducted in R version 4.3.1 (The R Foundation, Vienna, Austria).

Results

Virchow accurately characterizes TIME features at micro-resolution

Supplementary File 2 provides detailed CV results for model selection. Virchow achieved the highest performance on mini-patch cell-level classification, with an overall accuracy of 0.906 (±0.020) and a loss of 0.386 (±0.063). Although ImageNet1K-pretrained models like ViT-H showed similar performance, with an accuracy of 0.905 (±0.029) and a loss of 0.431 (±0.067), Virchow excelled in distinguishing tumor, stroma, and immune cells. The ROC AUCs for Virchow in predicting various components were impressive: tumor (0.99), stroma (0.98), immune cells (0.98), necrosis (0.99), blood (0.97), space (1.00), and fat (0.99). The confusion matrix revealed minor misclassification between stroma and immune cells, likely due to the model's need to make definitive class decisions. The fine-tuned Virchow model was subsequently used to estimate statistical descriptors.### Comparative analyses of larger patch/region-based TSR, TIL estimates, and patient-level landscape parameters

Table 2 presents the results of correlation analyses between stromal percentage (calculated as 100% – TSR) and stromal landscape parameters, as well as between TIL scores and immune landscape parameters. Notably, the variances of the stromal () and immune () compartments were the most strongly associated features with stromal percentage (R = 0.47; adjusted p

Se abre en una nueva pestaña en la publicación original

Compartir y Discutir

Comentarios

¡Aún no hay comentarios. Sé el primero en comentar!

Enviar a mi oncólogo

Artículo: TIMEL: Deep learning-statistical integration reveals spatial stromal and immune signatures of aggressive colon cancer.

Autores: Le MK, Liu X, Vaickus L, Yao K, Levy J
Publicado: 2026-06-09
PMID: 42255201

Enlace: https://crcwarriors.org/article-detail.php?id=2352 | https://pubmed.ncbi.nlm.nih.gov/42255201/

¡Regístrate para usar esta función!

Crea una cuenta gratuita para enviar artículos científicos directamente a tu oncólogo y acceder a muchas más funcionalidades personalizadas.

Regístrate gratis