Research Overview
Conditional sampling through transport maps
Given observational data as pairs of outcome and covariates, a central problem is to estimate and/or simulate from the conditional densities for any arbitrary covariates. To enhance sampling efficiency, we proposed a two-step procedure to fully exploit the information, first pooling all data together through Wasserstein barycenter such that the outcomes are independent of their covariates. The framework has been generalized to account for confounding and missing values in covariates. Applications include meteorological time series, image processing, treatment effect estimation and beyond.
References
AD Lipnick, EG Tabak, G Trigila, Y Wang, X Ye, and W Zhao.
The Monge optimal transport barycenter problem (preprint)
(submitted) Information and Inference: A Journal of the IMA
EG Tabak, G Trigila, and W Zhao.
The Hierarchical Barycenter: Conditional Probability Simulation with Structured and Unobserved Covariates (preprint)
(submitted) Machine Learning
EG Tabak, G Trigila, and W Zhao.
The Conditional Barycenter Problem, its Data-Driven Formulation and its Solution through Normalizing Flows (preprint)(publication)
Communications in Mathematical Sciences (2024).
EG Tabak, G Trigila, and W Zhao.
Distributional Barycenter Problem through Data-Driven Flows (preprint) (publication)
Pattern Recognition (2022).
EG Tabak, G Trigila, and W Zhao.
Conditional Density Estimation and Simulation through Optimal Transport (preprint)(publication)
Machine Learning (2020).
EG Tabak, G Trigila, and W Zhao.
Data Driven Conditional Optimal Transport (poster) (preprint) (publication)
33rd Conference on Neural Information Processing Systems (NeurIPS) OTML Workshop (2019).
(Extended version in Machine Learning (2021))
Causal inference in single-cell transcriptomics
Gene regulatory network inference is a notoriously difficult problem in computational biology. Given gene expression levels of single cells across different time points of measurement and/or developmental ages, the goal is to infer the causal relationship between different genes.
References
W Zhao, E Larschan, B Sandstede, and R Singh.
Optimal Transport Reveals Dynamic Gene Regulatory Networks via Gene Velocity Estimation (preprint)(publication)
PLOS Computational Biology (2025)
Shape analysis
Recent advance in technology has enabled high throughput imaging data of biological shapes, ranging from contour curves of cancer cell shapes, to 3D protein structures measured from Cryo-EM. Given complex imaging data, the goal is to extract crucial information from those shapes and model their variation across different spatial/temporal scales. Computational tools have been developed for applications including cell shape contours in 2D and images in 3D, ribosome shapes, and beyond.
References
S Yu, A Kushner, E Teasell, W Zhao, S Srebnik, and K Dao Duc.
Advance Coarse Grained Model for Fast Simulation of Nascent Polypeptide Chain within the Ribosome (preprint)
(submitted) Biophysical Journal
W Zhao, S Maffa, and B Sandstede.
Data-driven Continuation of Patterns and their Bifurcations (poster) (preprint) (publication)
SIAM Journal on Applied Dynamical Systems (2025).
|