Back to Scientific publications

Toward next-generation machine learning and deep learning for spatial omics

Toward next-generation machine learning and deep learning for spatial omics

Yanis Zirem, Isabelle Fournier, Michel Salzet


Brief Bioinform
https://doi.org/10.1093/bib/bbag131

Abstract

Spatial omics technologies generate high-dimensional, spatially resolved molecular data across transcripts, proteins, metabolites and lipids, requiring computational models that account for tissue topology, multi-scale organization, and experimental noise. Although machine-learning (ML) and deep-learning (DL) methods have rapidly proliferated to meet these demands, the field still lacks clear methodological guidance for selecting models adapted to specific spatial constraints and biological questions. Here, we provide a critical and comparative synthesis of ML/DL approaches across core spatial omics tasks, including batch-effect correction, resolution enhancement, tissue and cell segmentation, spatial domain discovery, cell-type deconvolution, and model interpretability. Classical ML methods such as clustering, random forests, and other ensemble classifiers, offer interpretable baselines but are limited in their capacity to model non-linear spatial dependencies. Modern DL architectures, including convolutional and graph neural networks, transformers and generative models, capture complex spatial patterns and support multi-omics integration, yet face persistent challenges related to data scarcity, annotation burden, computational cost, and uncertainty estimation. Emerging strategies such as optimal transport, cross-modal attention, graph-linked embeddings, and foundation models enhance cross-modality alignment but require rigorous evaluation of their assumptions and operational constraints. We further discuss practical solutions, including self-supervised pretraining, federated learning and the adoption of standardized spatial data formats, to enhance scalability, reproducibility, and clinical readiness. Finally, we propose a decision framework that highlights when specific ML/DL families are most suitable according to data modality, spatial resolution, tissue architecture, and intended clinical application. By integrating methodological critique with actionable recommendations, this review offers a roadmap for the reproducible, interpretable, and clinically translatable deployment of ML and DL models in spatial omics.