Our research focuses on the design and development of bioinformatics and statistical approaches to address challenges posed in biological inferences from large-scale, high-throughput proteomics data, and their application to biological problems. For this purpose, we develop algorithms for peak detection and quantification, identification of structures in multivariate data, stochastic time-course modeling to extract dynamical features, construction of protein networks and error control in the resulting inferences. In collaboration with our experimentalist colleagues, we apply these techniques in various systems for systematic studies of post-translational modifications and, proteome dynamics, signal transductions and mass informatics. Our goal is to promote identification of functional dysregulations associated with changes in the state of a biological system. Below are examples of applications of our techniques.

We applied a stochastic, Gaussian Process, modeling to infer changes in proteome dynamics. In collaboration with colleagues, we are studying changes in mitochondrial proteins due to the non-alcoholic fatty liver (NAFLD) disease and induced heart failure in rats. The experiments use heavy water labeling and liquid-chromatography mass spectrometry. The animal models are metabolically labeled with deuterium by providing heavy water in their diet. They are sacrificed at certain time points. The organs are harvested and mitochondria are isolated. We approximate the rate of protein turnover with the rate of deuterium incorporation. The time course of the relative isotope fractions are used in Gaussian Process (GP) modeling that we have developed to extract the protein turnover rates. When compared to the traditional exponential curve fitting the GP produces 2-fold increase in the number of proteins that can be measured.

We studied changes in signal transduction pathways that accompany the Epithelial-Mesenchymal Transition (EMT) of human small airway cells. While numerous studies have been done on the mechanisms of the transition itself, few studies have investigated the system effects of EMT on signaling networks. We use mixed effects modeling to develop a computational model of phospho-protein signaling data that compares human small airway epithelial cells (hSAECs) with their EMT-transformed counterparts across a series of perturbations with 8 ligands and 5 inhibitors, revealing previously uncharacterized changes in signaling in the EMT state. Construction of network topology maps showed significant changes between the two cellular states, including a linkage between GSK-3α and SMAD2. The model also predicted a loss of p38 mitogen activated protein kinase-independent HSP-27 signaling, which we experimentally validated. We further characterized the relationship between HSP27 and signal STAT3 signaling, and determined that loss of HSP27 following EMT is only partially responsible for the downregulation of STAT3. These rewired connections represent therapeutic targets that could potentially reverse EMT and restore a normal phenotype to the respiratory mucosa.

We developed a method to detect post-translational modifications in high mass accuracy MS spectra. We used the discreteness of the amino acid masses to probe the whole mass axis in an unbiased approach to identify regions of the mass axis that are highly populated with unmodified peptides. While it has been known for a while that not all mass regions are populated by peptides, the actual mapping of the peptide distributions has been not feasible, due to the fact that the complexity of the peptide space increases as power law with the base 20. We have developed a recursive algorithm that bypasses the sequence generation and directly generate compositions. As a result, we have been able to map the "peptide mass axis" up to the 3.5 kDa - the upper mass limit often used in proteomics. We have located the peaks and valleys (forbidden/quiet zones) in the mass distributions and have shown that post-translational modifications, such phosphorylation and glycosylation, create distributions separate from the nonmodified peptides. We have used this property to predict the amount of the phosphoproteins in a sample without referring to peptide fragmentation and database search - only based on the masses of the precursor peptides. This advance has provided an alternative approach to evaluate the sample preparation. In another study, we have established that the data-dependent acquisition can be modeled as a sampling from a single well defined peak. To obtain the distribution, we have introduced a new concept and termed it a peak deviation. We have shown that unlike the traditionally used mass defect, peak deviations form a unimodal distribution whose characteristics are related to the properties of the peptides in the sample.