Publications
Differential requirements for myeloid leukemia IFN-gamma conditioning determine graft-versus-leukemia resistance and sensitivity.
J Clin Invest. 2017 Jun 30;127(7):2765-2776. doi: 10.1172/JCI85736. Epub 2017 Jun Show AbstractHide Abstract
Abstract: The graft-versus-leukemia (GVL) effect in allogeneic hematopoietic stem cell transplantation (alloSCT) is potent against chronic phase chronic myelogenous leukemia (CP-CML), but blast crisis CML (BC-CML) and acute myeloid leukemias (AML) are GVL resistant. To understand GVL resistance, we studied GVL against mouse models of CP-CML, BC-CML, and AML generated by the transduction of mouse BM with fusion cDNAs derived from human leukemias. Prior work has shown that CD4+ T cell-mediated GVL against CP-CML and BC-CML required intact leukemia MHCII; however, stem cells from both leukemias were MHCII negative. Here, we show that CP-CML, BC-CML, and AML stem cells upregulate MHCII in alloSCT recipients. Using gene-deficient leukemias, we determined that BC-CML and AML MHC upregulation required IFN-gamma stimulation, whereas CP-CML MHC upregulation was independent of both the IFN-gamma receptor (IFN-gammaR) and the IFN-alpha/beta receptor IFNAR1. Importantly, IFN-gammaR-deficient BC-CML and AML were completely resistant to CD4- and CD8-mediated GVL, whereas IFN-gammaR/IFNAR1 double-deficient CP-CML was fully GVL sensitive. Mouse AML and BC-CML stem cells were MHCI+ without IFN-gamma stimulation, suggesting that IFN-gamma sensitizes these leukemias to T cell killing by mechanisms other than MHC upregulation. Our studies identify the requirement of IFN-gamma stimulation as a mechanism for BC-CML and AML GVL resistance, whereas independence from IFN-gamma renders CP-CML more GVL sensitive, even with a lower-level alloimmune response.
Interferon-gamma Drives Treg Fragility to Promote Anti-tumor Immunity.
Cell. 2017 Jun 1;169(6):1130-1141.e11. doi: 10.1016/j.cell.2017.05.005. Epub 2017 Show AbstractHide Abstract
Abstract: Regulatory T cells (Tregs) are a barrier to anti-tumor immunity. Neuropilin-1 (Nrp1) is required to maintain intratumoral Treg stability and function but is dispensable for peripheral immune tolerance. Treg-restricted Nrp1 deletion results in profound tumor resistance due to Treg functional fragility. Thus, identifying the basis for Nrp1 dependency and the key drivers of Treg fragility could help to improve immunotherapy for human cancer. We show that a high percentage of intratumoral NRP1(+) Tregs correlates with poor prognosis in melanoma and head and neck squamous cell carcinoma. Using a mouse model of melanoma where Nrp1-deficient (Nrp1(-/-)) and wild-type (Nrp1(+/+)) Tregs can be assessed in a competitive environment, we find that a high proportion of intratumoral Nrp1(-/-) Tregs produce interferon-gamma (IFNgamma), which drives the fragility of surrounding wild-type Tregs, boosts anti-tumor immunity, and facilitates tumor clearance. We also show that IFNgamma-induced Treg fragility is required for response to anti-PD1, suggesting that cancer therapies promoting Treg fragility may be efficacious.
LAG3 limits regulatory T cell proliferation and function in autoimmune diabetes.
Sci Immunol. 2017 Mar 31;2(9). pii: 2/9/eaah4569. doi: Show AbstractHide Abstract
Abstract: Inhibitory receptors (IRs) are pivotal in controlling T cell homeostasis because of their intrinsic regulation of conventional effector T (Tconv) cell proliferation, viability, and function. However, the role of IRs on regulatory T cells (Tregs) remains obscure because they could be required for suppressive activity and/or limit Treg function. We evaluated the role of lymphocyte activation gene 3 (LAG3; CD223) on Tregs by generating mice in which LAG3 is absent on the cell surface of Tregs in a murine model of type 1 diabetes. Unexpectedly, mice that lacked LAG3 expression on Tregs exhibited reduced autoimmune diabetes, consistent with enhanced Treg proliferation and function. Whereas the transcriptional landscape of peripheral wild-type (WT) and Lag3-deficient Tregs was largely comparable, substantial differences between intra-islet Tregs were evident and involved a subset of genes and pathways that promote Treg maintenance and function. Consistent with these observations, Lag3-deficient Tregs outcompeted WT Tregs in the islets but not in the periphery in cotransfer experiments because of enhanced interleukin-2-signal transducer and activator of transcription 5 signaling and increased Eos expression. Our study suggests that LAG3 intrinsically limits Treg proliferation and function at inflammatory sites, promotes autoimmunity in a chronic autoimmune-prone environment, and may contribute to Treg insufficiency in autoimmune disease.
Assessing significance in a Markov chain without mixing.
Proc Natl Acad Sci U S A. 2017 Mar 14;114(11):2860-2864. doi: Show AbstractHide Abstract
Abstract: We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Wnt signaling regulates hepatobiliary repair following cholestatic liver injury in mice.
Hepatology. 2016 Nov;64(5):1652-1666. doi: 10.1002/hep.28774. Epub 2016 Sep 26. Show AbstractHide Abstract
Abstract: Hepatic repair is directed chiefly by the proliferation of resident mature epithelial cells. Furthermore, if predominant injury is to cholangiocytes, the hepatocytes can transdifferentiate to cholangiocytes to assist in the repair and vice versa, as shown by various fate-tracing studies. However, the molecular bases of reprogramming remain elusive. Using two models of biliary injury where repair occurs through cholangiocyte proliferation and hepatocyte transdifferentiation to cholangiocytes, we identify an important role of Wnt signaling. First we identify up-regulation of specific Wnt proteins in the cholangiocytes. Next, using conditional knockouts of Wntless and Wnt coreceptors low-density lipoprotein-related protein 5/6, transgenic mice expressing stable beta-catenin, and in vitro studies, we show a role of Wnt signaling through beta-catenin in hepatocyte to biliary transdifferentiation. Last, we show that specific Wnts regulate cholangiocyte proliferation, but in a beta-catenin-independent manner. CONCLUSION: Wnt signaling regulates hepatobiliary repair after cholestatic injury in both beta-catenin-dependent and independent manners. (Hepatology 2016;64:1652-1666).
Modeling a human hepatocellular carcinoma subset in mice through coexpression of met and point-mutant beta-catenin.
Hepatology. 2016 Nov;64(5):1587-1605. doi: 10.1002/hep.28601. Epub 2016 May 28. Show AbstractHide Abstract
Abstract: Hepatocellular cancer (HCC) remains a significant therapeutic challenge due to its poorly understood molecular basis. In the current study, we investigated two independent cohorts of 249 and 194 HCC cases for any combinatorial molecular aberrations. Specifically we assessed for simultaneous HMET expression or hMet activation and catenin beta1 gene (CTNNB1) mutations to address any concomitant Met and Wnt signaling. To investigate cooperation in tumorigenesis, we coexpressed hMet and beta-catenin point mutants (S33Y or S45Y) in hepatocytes using sleeping beauty transposon/transposase and hydrodynamic tail vein injection and characterized tumors for growth, signaling, gene signatures, and similarity to human HCC. Missense mutations in exon 3 of CTNNB1 were identified in subsets of HCC patients. Irrespective of amino acid affected, all exon 3 mutations induced similar changes in gene expression. Concomitant HMET overexpression or hMet activation and CTNNB1 mutations were evident in 9%-12.5% of HCCs. Coexpression of hMet and mutant-beta-catenin led to notable HCC in mice. Tumors showed active Wnt and hMet signaling with evidence of glutamine synthetase and cyclin D1 positivity and mitogen-activated protein kinase/extracellular signal-regulated kinase, AKT/Ras/mammalian target of rapamycin activation. Introduction of dominant-negative T-cell factor 4 prevented tumorigenesis. The gene expression of mouse tumors in hMet-mutant beta-catenin showed high correlation, with subsets of human HCC displaying concomitant hMet activation signature and CTNNB1 mutations. CONCLUSION: We have identified cooperation of hMet and beta-catenin activation in a subset of HCC patients and modeled this human disease in mice with a significant transcriptomic intersection; this model will provide novel insight into the biology of this tumor and allow us to evaluate novel therapies as a step toward precision medicine. (Hepatology 2016;64:1587-1605).
Characterization of Gonadotrope Secretoproteome Identifies Neurosecretory Protein VGF-derived Peptide Suppression of Follicle-stimulating Hormone Gene Expression.
J Biol Chem. 2016 Sep 30;291(40):21322-21334. doi: 10.1074/jbc.M116.740365. Epub Show AbstractHide Abstract
Abstract: Reproductive function is controlled by the pulsatile release of hypothalamic gonadotropin-releasing hormone (GnRH), which regulates the expression of the gonadotropins luteinizing hormone and FSH in pituitary gonadotropes. Paradoxically, Fshb gene expression is maximally induced at lower frequency GnRH pulses, which provide a very low average concentration of GnRH stimulation. We studied the role of secreted factors in modulating gonadotropin gene expression. Inhibition of secretion specifically disrupted gonadotropin subunit gene regulation but left early gene induction intact. We characterized the gonadotrope secretoproteome and global mRNA expression at baseline and after Galphas knockdown, which has been found to increase Fshb gene expression (1). We identified 1077 secreted proteins or peptides, 19 of which showed mRNA regulation by GnRH or/and Galphas knockdown. Among several novel secreted factors implicated in Fshb gene regulation, we focused on the neurosecretory protein VGF. Vgf mRNA, whose gene has been implicated in fertility (2), exhibited high induction by GnRH and depended on Galphas In contrast with Fshb induction, Vgf induction occurred preferentially at high GnRH pulse frequency. We hypothesized that a VGF-derived peptide might regulate Fshb gene induction. siRNA knockdown or extracellular immunoneutralization of VGF augmented Fshb mRNA induction by GnRH. GnRH stimulated the secretion of the VGF-derived peptide NERP1. NERP1 caused a concentration-dependent decrease in Fshb gene induction. These findings implicate a VGF-derived peptide in selective regulation of the Fshb gene. Our results support the concept that signaling specificity from the cell membrane GnRH receptor to the nuclear Fshb gene involves integration of intracellular signaling and exosignaling regulatory motifs.
Hundreds of Genes Experienced Convergent Shifts in Selective Pressure in Marine Mammals.
Mol Biol Evol. 2016 Sep;33(9):2182-92. doi: 10.1093/molbev/msw112. Epub 2016 Jun Show AbstractHide Abstract
Abstract: Mammal species have made the transition to the marine environment several times, and their lineages represent one of the classical examples of convergent evolution in morphological and physiological traits. Nevertheless, the genetic mechanisms of their phenotypic transition are poorly understood, and investigations into convergence at the molecular level have been inconclusive. While past studies have searched for convergent changes at specific amino acid sites, we propose an alternative strategy to identify those genes that experienced convergent changes in their selective pressures, visible as changes in evolutionary rate specifically in the marine lineages. We present evidence of widespread convergence at the gene level by identifying parallel shifts in evolutionary rate during three independent episodes of mammalian adaptation to the marine environment. Hundreds of genes accelerated their evolutionary rates in all three marine mammal lineages during their transition to aquatic life. These marine-accelerated genes are highly enriched for pathways that control recognized functional adaptations in marine mammals, including muscle physiology, lipid-metabolism, sensory systems, and skin and connective tissue. The accelerations resulted from both adaptive evolution as seen in skin and lung genes, and loss of function as in gustatory and olfactory genes. In regard to sensory systems, this finding provides further evidence that reduced senses of taste and smell are ubiquitous in marine mammals. Our analysis demonstrates the feasibility of identifying genes underlying convergent organism-level characteristics on a genome-wide scale and without prior knowledge of adaptations, and provides a powerful approach for investigating the physiological functions of mammalian genes.
A Temporal Switch in the Germinal Center Determines Differential Output of Memory B and Plasma Cells.
Immunity. 2016 Jan 19;44(1):116-130. doi: 10.1016/j.immuni.2015.12.004. Epub 2016 Show AbstractHide Abstract
Abstract: There is little insight into or agreement about the signals that control differentiation of memory B cells (MBCs) and long-lived plasma cells (LLPCs). By performing BrdU pulse-labeling studies, we found that MBC formation preceded the formation of LLPCs in an adoptive transfer immunization system, which allowed for a synchronized Ag-specific response with homogeneous Ag-receptor, yet at natural precursor frequencies. We confirmed these observations in wild-type (WT) mice and extended them with germinal center (GC) disruption experiments and variable region gene sequencing. We thus show that the GC response undergoes a temporal switch in its output as it matures, revealing that the reaction engenders both MBC subsets with different immune effector function and, ultimately, LLPCs at largely separate points in time. These data demonstrate the kinetics of the formation of the cells that provide stable humoral immunity and therefore have implications for autoimmunity, for vaccine development, and for understanding long-term pathogen resistance.
A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes.
Cytometry A. 2016 Jan;89(1):16-21. doi: 10.1002/cyto.a.22732. Epub 2015 Oct 8. Show AbstractHide Abstract
Abstract: The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP-IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen-stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14-color staining panel. Two approaches (FlowReMi.1 and flowDensity-flowType-RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.
The center for causal discovery of biomedical knowledge from big data.
J Am Med Inform Assoc. 2015 Nov;22(6):1132-6. doi: 10.1093/jamia/ocv059. Epub Show AbstractHide Abstract
Abstract: The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.
Human Dendritic Cell Response Signatures Distinguish 1918, Pandemic, and Seasonal H1N1 Influenza Viruses.
J Virol. 2015 Oct;89(20):10190-205. doi: 10.1128/JVI.01523-15. Epub 2015 Jul 29. Show AbstractHide Abstract
Abstract: UNLABELLED: Influenza viruses continue to present global threats to human health. Antigenic drift and shift, genetic reassortment, and cross-species transmission generate new strains with differences in epidemiology and clinical severity. We compared the temporal transcriptional responses of human dendritic cells (DC) to infection with two pandemic (A/Brevig Mission/1/1918, A/California/4/2009) and two seasonal (A/New Caledonia/20/1999, A/Texas/36/1991) H1N1 influenza viruses. Strain-specific response differences included stronger activation of NF-kappaB following infection with A/New Caledonia/20/1999 and a unique cluster of genes expressed following infection with A/Brevig Mission/1/1918. A common antiviral program showing strain-specific timing was identified in the early DC response and found to correspond with reported transcript changes in blood during symptomatic human influenza virus infection. Comparison of the global responses to the seasonal and pandemic strains showed that a dramatic divergence occurred after 4 h, with only the seasonal strains inducing widespread mRNA loss. IMPORTANCE: Continuously evolving influenza viruses present a global threat to human health; however, these host responses display strain-dependent differences that are incompletely understood. Thus, we conducted a detailed comparative study assessing the immune responses of human DC to infection with two pandemic and two seasonal H1N1 influenza strains. We identified in the immune response to viral infection both common and strain-specific features. Among the stain-specific elements were a time shift of the interferon-stimulated gene response, selective induction of NF-kappaB signaling by one of the seasonal strains, and massive RNA degradation as early as 4 h postinfection by the seasonal, but not the pandemic, viruses. These findings illuminate new aspects of the distinct differences in the immune responses to pandemic and seasonal influenza viruses.
CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations.
Bioinformatics. 2015 May 15;31(10):1584-91. doi: 10.1093/bioinformatics/btv015. Show AbstractHide Abstract
Abstract: MOTIVATION: Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis. Considerable effort has been devoted to modeling sample heterogeneity, and presently, there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable. RESULTS: In this study, we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, Cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent; it requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell type.
Low-variance RNAs identify Parkinson's disease molecular signature in blood.
Mov Disord. 2015 May;30(6):813-21. doi: 10.1002/mds.26205. Epub 2015 Mar 18. Show AbstractHide Abstract
Abstract: The diagnosis of Parkinson's disease (PD) is usually not established until advanced neurodegeneration leads to clinically detectable symptoms. Previous blood PD transcriptome studies show low concordance, possibly resulting from the use of microarray technology, which has high measurement variation. The Leucine-rich repeat kinase 2 (LRRK2) G2019S mutation predisposes to PD. Using preclinical and clinical studies, we sought to develop a novel statistically motivated transcriptomic-based approach to identify a molecular signature in the blood of Ashkenazi Jewish PD patients, including LRRK2 mutation carriers. Using a digital gene expression platform to quantify 175 messenger RNA (mRNA) markers with low coefficients of variation (CV), we first compared whole-blood transcript levels in mouse models (1) overexpressing wild-type (WT) LRRK2, (2) overexpressing G2019S LRRK2, (3) lacking LRRK2 (knockout), and (4) and in WT controls. We then studied an Ashkenazi Jewish cohort of 34 symptomatic PD patients (both WT LRRK2 and G2019S LRRK2) and 32 asymptomatic controls. The expression profiles distinguished the four mouse groups with different genetic background. In patients, we detected significant differences in blood transcript levels both between individuals differing in LRRK2 genotype and between PD patients and controls. Discriminatory PD markers included genes associated with innate and adaptive immunity and inflammatory disease. Notably, gene expression patterns in levodopa-treated PD patients were significantly closer to those of healthy controls in a dose-dependent manner. We identify whole-blood mRNA signatures correlating with LRRK2 genotype and with PD disease state. This approach may provide insight into pathogenesis and a route to early disease detection.
Hybrid Bayesian-rank integration approach improves the predictive power of genomic dataset aggregation.
Bioinformatics. 2015 Jan 15;31(2):209-15. doi: 10.1093/bioinformatics/btu518. Show AbstractHide Abstract
Abstract: MOTIVATION: Modern molecular technologies allow the collection of large amounts of high-throughput data on the functional attributes of genes. Often multiple technologies and study designs are used to address the same biological question such as which genes are overexpressed in a specific disease state. Consequently, there is considerable interest in methods that can integrate across datasets to present a unified set of predictions. RESULTS: An important aspect of data integration is being able to account for the fact that datasets may differ in how accurately they capture the biological signal of interest. While many methods to address this problem exist, they always rely either on dataset internal statistics, which reflect data structure and not necessarily biological relevance, or external gold standards, which may not always be available. We present a new rank aggregation method for data integration that requires neither external standards nor internal statistics but relies on Bayesian reasoning to assess dataset relevance. We demonstrate that our method outperforms established techniques and significantly improves the predictive power of rank-based aggregations. We show that our method, which does not require an external gold standard, provides reliable estimates of dataset relevance and allows the same set of data to be integrated differently depending on the specific signal of interest. AVAILABILITY: The method is implemented in R and is freely available at http://www.pitt.edu/~mchikina/BIRRA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Homer1 alternative splicing is regulated by gonadotropin-releasing hormone and modulates gonadotropin gene expression.
Mol Cell Biol. 2014 May;34(10):1747-56. doi: 10.1128/MCB.01401-13. Epub 2014 Mar Show AbstractHide Abstract
Abstract: Hypothalamic gonadotropin-releasing hormone (GnRH) plays a critical role in reproductive physiology by regulating follicle-stimulating hormone (FSH) and luteinizing hormone (LH) gene expression in the pituitary. Analysis of gonadotrope deep-sequencing data identified a global regulation of pre-mRNA splicing by GnRH. Homer1, a gene encoding a postsynaptic density scaffolding protein, was selected for further study. Homer1 expresses a short splice form, Homer1a, and more-abundant long transcripts Homer1b/c. GnRH induced a modest increase in Homer1b/c expression and a dramatic increase in the Homer1a splice form. G protein knockdown studies suggested that the Homer1 induction, but not the regulated splicing, was Galphaq/11 dependent. Phosphorylation of the splicing regulator SRp20 was found to be induced by GnRH. SRp20 depletion attenuated the GnRH-induced increase in the Homer1a-to-Homer1b/c ratio and modulated the effects of GnRH on FSHbeta and LHbeta expression. Homer1 gene knockdown resulted in increased GnRH-induced FSHbeta and LHbeta transcript levels. Furthermore, splice-form-specific reduction of Homer1b/c increased both FSHbeta and LHbeta mRNA induction, whereas reduction of Homer1a had the opposite effect on FSHbeta induction. These results indicate that the regulation of Homer1 splicing by GnRH contributes to gonadotropin gene control.
Increasing consistency of disease biomarker prediction across datasets.
PLoS One. 2014 Apr 16;9(4):e91272. doi: 10.1371/journal.pone.0091272. eCollection Show AbstractHide Abstract
Abstract: Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.
Bioinformatics. 2013 Nov 15;29(22):2892-9. doi: 10.1093/bioinformatics/btt492. Show AbstractHide Abstract
Abstract: MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.
Critical assessment of automated flow cytometry data analysis techniques.
Nat Methods. 2013 Mar;10(3):228-38. doi: 10.1038/nmeth.2365. Epub 2013 Feb 10. Show AbstractHide Abstract
Abstract: Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
beta-catenin regulates GnRH-induced FSHbeta gene expression.
Mol Endocrinol. 2013 Feb;27(2):224-37. doi: 10.1210/me.2012-1310. Epub 2012 Dec Show AbstractHide Abstract
Abstract: The regulation of gonadotropin synthesis by GnRH plays an essential role in the neuroendocrine control of reproduction. The known signaling mechanisms involved in gonadotropin synthesis have been expanding. For example, involvement of beta-catenin in LHbeta induction by GnRH has been discovered. We examined the role of beta-catenin in FSHbeta gene expression in LbetaT2 gonadotrope cells. GnRH caused a sustained increase in nuclear beta-catenin levels, which was significantly reduced by c-Jun N-terminal kinase (JNK) inhibition. Small interfering RNA-mediated knockdown of beta-catenin mRNA demonstrated that induction of FSHbeta mRNA by GnRH depended on beta-catenin and that regulation of FSHbeta by beta-catenin occurred independently of the JNK-c-jun pathway. beta-Catenin depletion had no impact on FSHbeta mRNA stability. In LbetaT2 cells transfected with FSHbeta promoter luciferase fusion constructs, GnRH responsiveness was conferred by the proximal promoter (-944/-1) and was markedly decreased by beta-catenin knockdown. However, none of the T-cell factor/lymphoid enhancer factor binding sites in that region were required for promoter activation by GnRH. Chromatin immunoprecipitation further corroborated the absence of direct interaction between beta-catenin and the 1.8-kb FSHbeta promoter. To elucidate the mechanism for the beta-catenin effect, we analyzed approximately 1 billion reads of next-generation RNA sequencing beta-catenin knockdown assays and selected the nuclear cofactor breast cancer metastasis-suppressor 1-like (Brms1L) as one candidate for further study. Subsequent experiments confirmed that Brms1L mRNA expression was decreased by beta-catenin knockdown as well as by JNK inhibition. Furthermore, knockdown of Brms1L significantly attenuated GnRH-induced FSHbeta expression. Thus, our findings indicate that the expression of Brms1L depends on beta-catenin activity and contributes to FSHbeta induction by GnRH.
Involvement of histone demethylase LSD1 in short-time-scale gene expression changes during cell cycle progression in embryonic stem cells.
Mol Cell Biol. 2012 Dec;32(23):4861-76. doi: 10.1128/MCB.00816-12. Epub 2012 Oct Show AbstractHide Abstract
Abstract: The histone demethylase LSD1, a component of the CoREST (corepressor for element 1-silencing transcription factor) corepressor complex, plays an important role in the downregulation of gene expression during development. However, the activities of LSD1 in mediating short-time-scale gene expression changes have not been well understood. To reveal the mechanisms underlying these two distinct functions of LSD1, we performed genome-wide mapping and cellular localization studies of LSD1 and its dimethylated histone 3 lysine 4 (substrate H3K4me2) in mouse embryonic stem cells (ES cells). Our results showed an extensive overlap between the LSD1 and H3K4me2 genomic regions and a correlation between the genomic levels of LSD1/H3K4me2 and gene expression, including many highly expressed ES cell genes. LSD1 is recruited to the chromatin of cells in the G(1)/S/G(2) phases and is displaced from the chromatin of M-phase cells, suggesting that LSD1 or H3K4me2 alternatively occupies LSD1 genomic regions during cell cycle progression. LSD1 knockdown by RNA interference or its displacement from the chromatin by antineoplastic agents caused an increase in the levels of a subset of LSD1 target genes. Taken together, these results suggest that cell cycle-dependent association and dissociation of LSD1 with chromatin mediates short-time-scale gene expression changes during embryonic stem cell cycle progression.
An effective statistical evaluation of ChIPseq dataset similarity.
Bioinformatics. 2012 Mar 1;28(5):607-13. doi: 10.1093/bioinformatics/bts009. Epub Show AbstractHide Abstract
Abstract: MOTIVATION: ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation. RESULTS: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly. AVAILABILITY: Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Misfolded proteins inhibit proliferation and promote stress-induced death in SV40-transformed mammalian cells.
FASEB J. 2012 Feb;26(2):766-77. doi: 10.1096/fj.11-186197. Epub 2011 Nov 2. Show AbstractHide Abstract
Abstract: Protein misfolding is implicated in neurodegenerative diseases and occurs in aging. However, the contribution of the misfolded ensembles to toxicity remains largely unknown. Here we introduce 2 primate cell models of destabilized proteins devoid of specific cellular functions and interactors, as bona fide misfolded proteins, allowing us to isolate the gain-of-function of non-native structures. Both GFP-degron and a mutant chloramphenicol-acetyltransferase fused to GFP (GFP-Delta9CAT) form perinuclear aggregates, are degraded by the proteasome, and colocalize with and induce the chaperone Hsp70 (HSPA1A/B) in COS-7 cells. We find that misfolded proteins neither significantly compromise chaperone-mediated folding capacity nor induce cell death. However, they do induce growth arrest in cells that are unable to degrade them and promote stress-induced death upon proteasome inhibition by MG-132 and heat shock. Finally, we show that overexpression of all heat-shock factor-1 (HSF1) and Hsp70 proteins, as well as wild-type and deacetylase-deficient (H363Y) SIRT1, rescue survival upon stress, implying a noncatalytic action of SIRT1 in response to protein misfolding. Our study establishes a novel model and extends our knowledge on the mechanism of the function-independent proteotoxicity of misfolded proteins in dividing cells.
Accurate quantification of functional analogy among close homologs.
PLoS Comput Biol. 2011 Feb 3;7(2):e1001074. doi: 10.1371/journal.pcbi.1001074. Show AbstractHide Abstract
Abstract: Correctly evaluating functional similarities among homologous proteins is necessary for accurate transfer of experimental knowledge from one organism to another, and is of particular importance for the development of animal models of human disease. While the fact that sequence similarity implies functional similarity is a fundamental paradigm of molecular biology, sequence comparison does not directly assess the extent to which two proteins participate in the same biological processes, and has limited utility for analyzing families with several parologous members. Nevertheless, we show that it is possible to provide a cross-organism functional similarity measure in an unbiased way through the exclusive use of high-throughput gene-expression data. Our methodology is based on probabilistic cross-species mapping of functionally analogous proteins based on Bayesian integrative analysis of gene expression compendia. We demonstrate that even among closely related genes, our method is able to predict functionally analogous homolog pairs better than relying on sequence comparison alone. We also demonstrate that the landscape of functional similarity is often complex and that definitive "functional orthologs" do not always exist. Even in these cases, our method and the online interface we provide are designed to allow detailed exploration of sources of inferred functional similarity that can be evaluated by the user.
Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans.
PLoS Comput Biol. 2009 Jun;5(6):e1000417. doi: 10.1371/journal.pcbi.1000417. Epub Show AbstractHide Abstract
Abstract: Tissue-specific gene expression plays a fundamental role in metazoan biology and is an important aspect of many complex diseases. Nevertheless, an organism-wide map of tissue-specific expression remains elusive due to difficulty in obtaining these data experimentally. Here, we leveraged existing whole-animal Caenorhabditis elegans microarray data representing diverse conditions and developmental stages to generate accurate predictions of tissue-specific gene expression and experimentally validated these predictions. These patterns of tissue-specific expression are more accurate than existing high-throughput experimental studies for nearly all tissues; they also complement existing experiments by addressing tissue-specific expression present at particular developmental stages and in small tissues. We used these predictions to address several experimentally challenging questions, including the identification of tissue-specific transcriptional motifs and the discovery of potential miRNA regulation specific to particular tissues. We also investigate the role of tissue context in gene function through tissue-specific functional interaction networks. To our knowledge, this is the first study producing high-accuracy predictions of tissue-specific expression and interactions for a metazoan organism based on whole-animal data.
Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans.
PLoS Comput Biol. 2009 Jun;5(6):e1000417. doi: 10.1371/journal.pcbi.1000417. Epub Show AbstractHide Abstract
Abstract: Tissue-specific gene expression plays a fundamental role in metazoan biology and is an important aspect of many complex diseases. Nevertheless, an organism-wide map of tissue-specific expression remains elusive due to difficulty in obtaining these data experimentally. Here, we leveraged existing whole-animal Caenorhabditis elegans microarray data representing diverse conditions and developmental stages to generate accurate predictions of tissue-specific gene expression and experimentally validated these predictions. These patterns of tissue-specific expression are more accurate than existing high-throughput experimental studies for nearly all tissues; they also complement existing experiments by addressing tissue-specific expression present at particular developmental stages and in small tissues. We used these predictions to address several experimentally challenging questions, including the identification of tissue-specific transcriptional motifs and the discovery of potential miRNA regulation specific to particular tissues. We also investigate the role of tissue context in gene function through tissue-specific functional interaction networks. To our knowledge, this is the first study producing high-accuracy predictions of tissue-specific expression and interactions for a metazoan organism based on whole-animal data.
The Sleipnir library for computational functional genomics.
Bioinformatics. 2008 Jul 1;24(13):1559-61. doi: 10.1093/bioinformatics/btn237. Show AbstractHide Abstract
Abstract: MOTIVATION: Biological data generation has accelerated to the point where hundreds or thousands of whole-genome datasets of various types are available for many model organisms. This wealth of data can lead to valuable biological insights when analyzed in an integrated manner, but the computational challenge of managing such large data collections is substantial. In order to mine these data efficiently, it is necessary to develop methods that use storage, memory and processing resources carefully. RESULTS: The Sleipnir C++ library implements a variety of machine learning and data manipulation algorithms with a focus on heterogeneous data integration and efficiency for very large biological data collections. Sleipnir allows microarray processing, functional ontology mining, clustering, Bayesian learning and inference and support vector machine tasks to be performed for heterogeneous data on scales not previously practical. In addition to the library, which can easily be integrated into new computational systems, prebuilt tools are provided to perform a variety of common tasks. Many tools are multithreaded for parallelization in desktop or high-throughput computing environments, and most tasks can be performed in minutes for hundreds of datasets using a standard personal computer. AVAILABILITY: Source code (C++) and documentation are available at http://function.princeton.edu/sleipnir and compiled binaries are available from the authors on request.
Role of Domain IV/S4 outermost arginines in gating of T-type calcium channels.
Pflugers Arch. 2005 Nov;451(2):349-61. doi: 10.1007/s00424-005-1407-5. Epub 2005 Show AbstractHide Abstract
Abstract: The role of the outermost three charged residues of Domain IV/S4 in controlling gating of Ca(v)3.2 was investigated using single substitutions of each arginine with glutamine, cysteine, histidine, and lysine in a Flp-In-293 cell line, in which expression levels could be compared. Channel density, based on gating charge measurements, was ~125,000 channels/cell (10 fC/pF), except for R2Q and R3C, which expressed at lower levels. Channels substituted at Arg-1715 (R1C, R1Q, R1H) demonstrated such modest changes that a role in voltage sensing could not be determined. Arg-1718 (R2) made a contribution to activation voltage sensing, and the channel was sensitive to the geometry of side-chain substitutions at this position. Arg-1721 (R3) substitutions produced complex kinetic changes that together suggested that geometry made a larger contribution than charge. Current decay at positive potentials (O-->I) exponentially approached a constant value for all mutants except R2K channels, which were biphasically dependent on potential. R2K channels also displayed slowed deactivation with reduced voltage dependence despite near control values for conductance. Voltage-dependent accessibility of R to C mutants, evaluated with intracellularly and extracellularly applied methanthiosulfonate (MTS) reagents, showed that both R2 and R3 were exposed only when cells were depolarized, although it was not necessary for channels to open. Together, the data indicate that Domain IV/S4 is an activation domain and is not involved in inactivation from the open state.