Research Overview

Biological systems are often observed as collections of probability distributions that vary across space, time, or condition. I develop geometric and statistical methods to model such distributional data and recover latent structure and dynamical mechanisms. My work connects optimal transport, structure learning, and mechanistic modeling, with applications in single-cell and spatial transcriptomics as well as biological shape analysis. My research program builds a unified geometric framework for learning structure and dynamics from distributional biological data.

Causal inference in single-cell and spatial transcriptomics

Gene regulatory network inference is a challenging problem in computational biology. In single-cell RNA-seq and spatial transcriptomics, gene expression is observed as distributions of cellular states that vary across time or spatial location. The goal is to recover latent dynamical and causal relationships among genes from such distributional data, often without explicit time-series measurements. My work develops geometric and mechanistic approaches that model these evolving distributions and infer regulatory dynamics consistent with observed invariant or spatial structure.

References

  1. W Zhao, E Fertig, G Stein-O'Brien
    CycleGRN: Inferring Gene Regulatory Networks from Cyclic Flow Dynamics in Single-Cell RNA-seq (preprint)
    RECOMB (2026)

  2. W Zhao, A Plaza-Rodriguez, P Luanpaisanon, EX Wang, L Gyllingberg, R Singh, E Fertig, G Stein-O'Brien
    Inferring the regulation dynamics of oscillatory networks from scRNA-seq data (preprint)

  3. W Zhao, E Larschan, B Sandstede, and R Singh.
    Optimal Transport Reveals Dynamic Gene Regulatory Networks via Gene Velocity Estimation (preprint)(publication)
    PLOS Computational Biology (2025)

Shape analysis

Recent advances in technology have enabled high throughput imaging data of biological shapes, ranging from contour curves of cancer cell shapes, to 3D protein structures measured from Cryo-EM. Given complex imaging data, the goal is to extract crucial information from those shapes and model their variation across different spatial/temporal scales. Computational tools have been developed for applications including cell shape contours in 2D and images in 3D, ribosome shapes, and beyond. These methods provide statistical tools for comparing and testing structural differences in geometric biological data represented as distributions.

References

  1. W Zhao, DJ Sutherland, K Dao Duc.
    Fast and Interpretable Quantification of Biological Shape Heterogeneity via Stratified Wasserstein Kernel (preprint)

  2. S Yu, A Kushner, E Teasell, W Zhao, S Srebnik, and K Dao Duc.
    Advanced Coarse Grained Model for Fast Simulation of Nascent Polypeptide Chain within the Ribosome (preprint) (publication)
    Biophysical Journal (2025).

  3. W Zhao, S Maffa, and B Sandstede.
    Data-driven Continuation of Patterns and their Bifurcations (poster) (preprint) (publication)
    SIAM Journal on Applied Dynamical Systems (2025).

Modeling covariate-dependent probability distributions

Given observational data as pairs of outcome and covariates, a central problem is to estimate and/or simulate from the conditional densities for any arbitrary covariates. To enhance sampling efficiency, we proposed a two-step procedure to fully exploit the information, first pooling all data together through Wasserstein barycenter such that the outcomes are independent of their covariates. The framework has been generalized to account for confounding and missing values in covariates. Applications include meteorological time series, image processing, treatment effect estimation and beyond.

References

  1. AD Lipnick, EG Tabak, G Trigila, Y Wang, X Ye, and W Zhao.
    The Monge optimal transport barycenter problem (preprint)

  2. EG Tabak, G Trigila, and W Zhao.
    The Hierarchical Barycenter: Conditional Probability Simulation with Structured and Unobserved Covariates (preprint)

  3. EG Tabak, G Trigila, and W Zhao.
    The Conditional Barycenter Problem, its Data-Driven Formulation and its Solution through Normalizing Flows (preprint)(publication)
    Communications in Mathematical Sciences (2024).

  4. EG Tabak, G Trigila, and W Zhao.
    Distributional Barycenter Problem through Data-Driven Flows (preprint) (publication)
    Pattern Recognition (2022).

  5. EG Tabak, G Trigila, and W Zhao.
    Conditional Density Estimation and Simulation through Optimal Transport (preprint)(publication)
    Machine Learning (2020).

  6. EG Tabak, G Trigila, and W Zhao.
    Data Driven Conditional Optimal Transport (poster) (preprint) (publication)
    33rd Conference on Neural Information Processing Systems (NeurIPS) OTML Workshop (2019). (Extended version in Machine Learning (2021))