My original academic training was focused on mathematic and statistics. However, my research experience exposed me to various topics in statistical mechanics, biophysics and system biology. In this process, I developed extensive understanding in these fields through taking classes and reading. My academic trajectory has thus shaped my interests in conducting interdisciplinary research that combines probability, statistics and the aforementioned areas in nature science. Instead of applying conventional statistical method to provide an ad-hoc answer, my goal is to understand the physical and biological mechanism hidden behind the observed data though developing methodologies with solid physical interpretations.
My Research Projects:
Modeling and Inference of Intrinsic Noise in Gene Regulatory Network. Biological processes within cell are driven by the stochastic interactions between molecules. At the same time, sophisticated nonlinear and dynamical mechanisms also play a key role in forming complex systems such as gene regulatory system. In recent years, developments in measurement technology have allowed scientists to obtain direct quantitative evidences regarding the nature of biological processes at cellular and molecular level, notably the cell to cell stochastic variations in gene expressions known as intrinsic noise. Still, conventional modeling and data-analysis approaches have a tendency to assume that the relationships between different factors are linear and static, and often overlook the physical laws that govern the biological processes. Starting as a postdoctoral scholar in Prof. Wing H. Wong’s Lab at Stanford University, I have investigated how to build physical-model oriented inference framework that can incorporate the discrete, stochastic, nonlinear and dynamical natures of cellular systems. In my work, the regulatory system is modeled as a multivariate birth and death process whose evolution can be described by master equation. I demonstrate that this approach is capable of representing a highly detailed stochastic dynamical system and also presents a more realistic description of the physical system than other schemes, including deterministic and stochastic differential equation approximations. Furthermore, I show that the full empirical distribution of gene expression measured at different perturbed steady-states can be used to infer the unknown parameters and regulatory relations using the method of moments. This ongoing work sheds new light on single-cell gene expression data analysis and the design of artificial genetic modules in synthesis biology.
Stepwise Signal Analysis. Modern biologists use fluorescence spectroscopy to study micro systems. The fluorescent signals are often roughly stepwise, in which shifts of average intensities indicate transitions of the system state. Determining the locations of shifts is thus key in scientific inquiry. Similar problems exist in other disciplines as well and are known as “change-points problems” in the statistical literature. I formulated a maximum marginal likelihood estimator to detect the change-points in a signal. I carried out extensive investigation on the effect and choice of the prior distribution. I proposed to treat each possible set of change-points equally and adopted an empirical Bayes approach to specify the prior distribution of segment parameters. This estimator is asymptotically consistent and has been shown to be competitive against several existing methods, which is based on the finite-sample study and real data examples (DNA array CGH data and single-molecule enzyme reaction data).
Modeling and Inference of Single Molecule Enzymatic Reaction. In classical theory, the mechanism of enzymatic reaction is modeled as a three-state continuous Markov chain, which implies that the successive enzymatic events are independent. However, recent observations on a single enzyme molecule exhibit strong serial correlation. I have studied a multi-state stochastic network model which is capable of explaining the observed phenomena. This new model consists of three groups of states instead of the original three states. It reflects the physical reality that the molecular structure is constantly altered by randomized forces, and the structure change in turn modifies molecular function. Furthermore, as the observed data contain only incomplete information concerning the system trajectories, I simplified the model by treating different types of transition rates as either constants, or gamma-distributed random variables based on physical knowledge and my theoretical inquiry. This strategy significantly reduces the number of parameters, and produces a satisfactory result using an optimization algorithm.
Optional Pólya tree. The analysis of high dimensional data is another feature and challenge in modern biology, as well as in other disciplines involving “big data.” The optional Pólya tree, a non-parametric Bayesian density estimation method, has shown great potential in this area. With collaborators in Wong’s Lab at Stanford University, we proposed a new algorithm to reduce the computation time of an optional Pólya tree estimator. This algorithm can reduce the computation time dramatically, as confirmed by both theoretical analysis and simulation studies.