|
Research OverviewBiological systems are often observed as collections of probability distributions that vary across space, time, or condition. I develop geometric and statistical methods to model such distributional data and recover latent structure and dynamical mechanisms. My work connects optimal transport, structure learning, and mechanistic modeling, with applications in single-cell and spatial transcriptomics as well as biological shape analysis. My research program builds a unified geometric framework for learning structure and dynamics from distributional biological data. Causal inference in single-cell and spatial transcriptomicsGene regulatory network inference is a challenging problem in computational biology. In single-cell RNA-seq and spatial transcriptomics, gene expression is observed as distributions of cellular states that vary across time or spatial location. The goal is to recover latent dynamical and causal relationships among genes from such distributional data, often without explicit time-series measurements. My work develops geometric and mechanistic approaches that model these evolving distributions and infer regulatory dynamics consistent with observed invariant or spatial structure. References
Shape analysisRecent advances in technology have enabled high throughput imaging data of biological shapes, ranging from contour curves of cancer cell shapes, to 3D protein structures measured from Cryo-EM. Given complex imaging data, the goal is to extract crucial information from those shapes and model their variation across different spatial/temporal scales. Computational tools have been developed for applications including cell shape contours in 2D and images in 3D, ribosome shapes, and beyond. These methods provide statistical tools for comparing and testing structural differences in geometric biological data represented as distributions. References
Modeling covariate-dependent probability distributionsGiven observational data as pairs of outcome and covariates, a central problem is to estimate and/or simulate from the conditional densities for any arbitrary covariates. To enhance sampling efficiency, we proposed a two-step procedure to fully exploit the information, first pooling all data together through Wasserstein barycenter such that the outcomes are independent of their covariates. The framework has been generalized to account for confounding and missing values in covariates. Applications include meteorological time series, image processing, treatment effect estimation and beyond. References
|