Research Summary.

Our research focuses on the development of bioinformatics and statistical approaches to address challenges posed in biological inferences from high-throughput proteomics data, and their application to biomedical problems. For this purpose, we develop algorithms for peak detection and quantification, identification of structures in multivariate data, stochastic time-course modeling to extract dynamical features, construction of protein networks and error control in the resulting inferences. In collaboration with our colleagues, experimentalists, we apply these techniques in various systems for systematic studies of post-translational modifications, proteome dynamics, signal transductions and mass informatics. Our goal is to promote identification of functional dysregulations associated with changes in the state of a biological system. An important unit of our research is a mass spectrum.

Outlined below are examples of our research work.

1. Protein homeostasis (proteostasis) is achieved via continuous synthesis and degradation of cellular proteins. Proteostasis is important for proper protein functioning, and it is dysregulated in many diseases. Metabolic labeling followed by LC-MS is a powerful technique to study proteostasis in a large scale (thousands of proteins). We develop bioinformatics techniques to study proteostasis to elucidate biological process that are associated with diseases. Our techniques include signal processing for feature detection and quantification, time-course modeling, pattern recognition, and pathway analysis. The protein interactions figure below shows the proteins of Glycolysis/Gluconeogenesis (blue) and Hypertrophic cardiomyopathy (red) pathways in the matrix of the all quantified (white) proteins in mouse heart proteome. Bioinformatic processing of dynamic proteome data indicates that the correlation between the pathways is altered during transition to heart hypertrophy (Borzou et al., Bioinformatics, 2019;35(22):4748-4753).
2. Proteome Dynamics using heavy water metabolic labeling and LC-MS

3. Timepoint selection in heavy water metabolic labeling, Sadygov et al.

We developed a formula-based stochastic simulation strategy for TPS for in vivo studies with heavy water metabolic labeling and LC-MS. We model the rate constant (lognormal), measurement error (Laplace), peptide length (Gamma), relative abundance (RA) of the monoisotopic peak (beta regression), and the number of exchangeable hydrogens (Gamma regression). The parameters of the distributions are determined using corresponding empirical probability density functions from a large-scale dataset of murine heart proteome. The models are used in simulations of the rate constant to minimize the root-mean-squared error.