Treg Cells Promote the SREBP1-Dependent Metabolic Fitness of Tumor-Promoting Macrophages via Repression of CD8+ T Cell-Derived Interferon-γ
Abstract: Regulatory T (Treg) cells are crucial for immune homeostasis, but they also contribute to tumor immune evasion by promoting a suppressive tumor microenvironment (TME). Mice with Treg cell-restricted Neuropilin-1 deficiency show tumor resistance while maintaining peripheral immune homeostasis, thereby providing a controlled system to interrogate the impact of intratumoral Treg cells on the TME. Using this and other genetic models, we showed that Treg cells shaped the transcriptional landscape across multiple tumor-infiltrating immune cell types. Treg cells suppressed CD8+ T cell secretion of interferon-γ (IFNγ), which would otherwise block the activation of sterol regulatory element-binding protein 1 (SREBP1)-mediated fatty acid synthesis in immunosuppressive (M2-like) tumor-associated macrophages (TAMs). Thus, Treg cells indirectly but selectively sustained M2-like TAM metabolic fitness, mitochondrial integrity, and survival. SREBP1 inhibition augmented the efficacy of immune checkpoint blockade, suggesting that targeting Treg cells or their modulation of lipid metabolism in M2-like TAMs could improve cancer immunotherapy.
Abstract: Motivation: When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits. Results: Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results. Availability: RERconverge source code, documentation, and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila, and yeast are available at https://bit.ly/2J2QBnj.
Abstract: Identifying genomic elements underlying phenotypic adaptations is an important problem in evolutionary biology. Comparative analyses learning from convergent evolution of traits are gaining momentum in accurately detecting such elements. We previously developed a method for predicting phenotypic associations of genetic elements by contrasting patterns of sequence evolution in species showing a phenotype with those that do not. Using this method, we successfully demonstrated convergent evolutionary rate shifts in genetic elements associated with two phenotypic adaptations, namely the independent subterranean and marine transitions of terrestrial mammalian lineages. Our method calculates gene-specific rates of evolution on branches of phylogenetic trees using linear regression. These rates represent the extent of sequence divergence on a branch after removing the expected divergence on the branch due to background factors. The rates calculated using this regression analysis exhibit an important statistical limitation, namely heteroscedasticity. We observe that the rates on branches that are longer on average show higher variance, and describe how this problem adversely affects the confidence with which we can make inferences about rate shifts. Using a combination of data transformation and weighted regression, we have developed an updated method that corrects this heteroscedasticity in the rates. We additionally illustrate the improved performance offered by the updated method at robust detection of convergent rate shifts in phylogenetic trees of protein-coding genes across mammals, as well as using simulated tree datasets. Overall, we present an important extension to our evolutionary-rates-based method that performs more robustly and consistently at detecting convergent shifts in evolutionary rates.
Abstract: Genome scale molecular datasets are often highly structured, with many correlated measurements. This general phenomenon can be related to the underlying data generating process. In assays of mixed cell populations, such as blood, variation in cell-type proportion induces a complex correlation structure at the gene-level. Likewise, groups of genes can be co-regulated/co-expressed through shared transcription factors and signaling pathways. Many applications of gene expression analysis rely on their ability to reflect these unobserved biological processes in order to draw mechanistic conclusions. On the other hand, correlated patterns of expression may also reflect nuisance factors, such as batch effects, which interfere with correct biological interpretation. The choice of analysis method is heavily dependent on which of these factors (nuisance or interesting-biological) is believed to account for more variation and the optimal variance analysis strategy remains an open question. In this study we describe a method to infer a biologically grounded data generating model that provides estimates of underlying biological processes, including explicitly identified pathway-level and cell-type proportion effects. Specifically, we formulate a new matrix decomposition framework, PLIER (Pathway-level Information ExtractoR), that explicitly incorporates prior biological knowledge. Using simulations, we demonstrate the superiority of our method in recovering the true data generating model. Using real data, we show that our approach is able to recover interpretable biological variables, reproduce previous findings in a simplified framework, distinguish biological and technical variation, and provide additional biological insight. The PLIER method and auxiliary functions and data are compiled in the PLIER R package available at https://github.com/wgmao/PLIER.
Abstract: The biological origin of life expectancy remains a fundamental and unanswered scientific question with important ramifications for human health, especially as the bulk of burden of human healthcare shifts from infectious to age-related diseases. The striking variability in life-span among animals occupying similar ecological niches and the numerous mutations that have been shown to increase lifespan in model organisms point to a considerable genetic contribution. Using mammalian comparative genomics, we correlate lifespan phenotypes with relative evolutionary rates, a measure of evolutionary selective pressure. Our analysis demonstrates that many genes and pathways are under increased evolutionary constraint in both Long-Lived Large-bodied mammals (3L) and mammals Exceptionally Long-Lived given their size (ELL), suggesting that these genes and pathways contribute to the maintenance of both traits. For 3L species, we find strong evolutionary constraint on multiple pathways involved in controlling carcinogenesis, including cell cycle, apoptosis, and immune pathways. These findings provide additional perspective on the well-known Peto's Paradox that large animals with large numbers of cells do not get cancer at higher rates than smaller animals with fewer cells. For the ELL phenotype, our analysis strongly implicates pathways involved in DNA repair, further supporting the importance of DNA repair processes in aging. Moreover, these correlations with lifespan phenotypes are consistent across the entire mammalian phylogeny, suggesting that additional constraint on these pathways is a universal requirement for long lifespan.
Abstract: RNAseq technology provides an unprecedented power in the assesment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. Here we present a simple three-parameter transformation, DataRemix, which can greatly improve the biological utility of gene expression datasets without any specific knowledge on the dataset. As we optimize the transformation with respect to the downstream biological objective, this parametric framework reweighs the contribution of each hidden factor and makes the biological signals visible. We demonstrate that DataRemix can outperform normalization methods which make explicit use of dataset specific technical factors. Also we show that DataRemix can be efficiently optimized via Thompson Sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally we reanalyze the Depression Gene Networks (DGN) dataset, and we highlight new trans-eQTL networks which were not reported in the initial study.
Evolutionary rate covariation analysis of E-cadherin identifies Raskol as a regulator of cell adhesion and actin dynamics in Drosophila.
Abstract: The adherens junction couples the actin cytoskeletons of neighboring cells to provide the foundation for multicellular organization. The core of the adherens junction is the cadherin-catenin complex that arose early in the evolution of multicellularity to link actin to intercellular adhesions. Over time, evolutionary pressures have shaped the signaling and mechanical functions of the adherens junction to meet specific developmental and physiological demands. Evolutionary rate covariation (ERC) identifies proteins with correlated fluctuations in evolutionary rate that can reflect shared selective pressures and functions. Here we use ERC to identify proteins with evolutionary histories similar to the Drosophila E-cadherin (DE-cad) ortholog. Core adherens junction components α-catenin and p120-catenin displayed positive ERC correlations with DE-cad, indicating that they evolved under similar selective pressures during evolution between Drosophila species. Further analysis of the DE-cad ERC profile revealed a collection of proteins not previously associated with DE-cad function or cadherin-mediated adhesion. We then analyzed the function of a subset of ERC-identified candidates by RNAi during border cell (BC) migration and identified novel genes that function to regulate DE-cad. Among these, we found that the gene CG42684, which encodes a putative GTPase activating protein (GAP), regulates BC migration and adhesion. We named CG42684 raskol (“to split” in Russian) and show that it regulates DE-cad levels and actin protrusions in BCs. We propose that Raskol functions with DE-cad to restrict Ras/Rho signaling and help guide BC migration. Our results demonstrate that a coordinated selective pressure has shaped the adherens junction and this can be leveraged to identify novel components of the complexes and signaling pathways that regulate cadherin-mediated adhesion.
Adaptive plasticity of IL-10+ and IL-35+ T reg cells cooperatively promotes tumor T cell exhaustion.
Abstract: Regulatory T cells (Treg cells) maintain host self-tolerance but are a major barrier to effective cancer immunotherapy. Treg cells subvert beneficial anti-tumor immunity by modulating inhibitory receptor expression on tumor-infiltrating lymphocytes (TILs); however, the underlying mediators and mechanisms have remained elusive. Here, we found that the cytokines IL-10 and IL-35 (Ebi3–IL-12α heterodimer) were divergently expressed by Treg cell subpopulations in the tumor microenvironment (TME) and cooperatively promoted intratumoral T cell exhaustion by modulating several inhibitory receptor expression and exhaustion-associated transcriptomic signature of CD8+ TILs. While expression of BLIMP1 (encoded by Prdm1) was a common target, IL-10 and IL-35 differentially affected effector T cell versus memory T cell fates, respectively, highlighting their differential, partially overlapping but non-redundant regulation of anti-tumor immunity. Our results reveal previously unappreciated cooperative roles for Treg cell-derived IL-10 and IL-35 in promoting BLIMP1-dependent exhaustion of CD8+ TILs that limits effective anti-tumor immunity.
Abstract: While T cells are important for the pathogenesis of systemic lupus erythematosus (SLE) and lupus nephritis, little is known about how T cells function after infiltrating the kidney. The current paradigm suggests that kidney-infiltrating T cells (KITs) are activated effector cells contributing to tissue damage and ultimately organ failure. Herein, we demonstrate that the majority of CD4+ and CD8+ KITs in 3 murine lupus models are not effector cells, as hypothesized, but rather express multiple inhibitory receptors and are highly dysfunctional, with reduced cytokine production and proliferative capacity. In other systems, this hypofunctional profile is linked directly to metabolic and specifically mitochondrial dysfunction, which we also observed in KITs. The T cell phenotype was driven by the expression of an “exhausted” transcriptional signature. Our data thus reveal that the tissue parenchyma has the capability of suppressing T cell responses and limiting damage to self. These findings suggest avenues for the treatment of autoimmunity based on selectively exploiting the exhausted phenotype of tissue-infiltrating T cells.
Abstract: Mammals diversified by colonizing drastically different environments, with each transition yielding numerous molecular changes including losses of protein function. While not initially deleterious, these losses could subsequently carry deleterious pleiotropic consequences. Here we use phylogenetic methods to identify convergent functional losses across independent marine mammal lineages. In one extreme case, Paraoxonase 1 (PON1) accrued lesions in all marine lineages, while remaining intact in all terrestrial mammals. These lesions coincide with PON1 enzymatic activity loss in marine species’ blood plasma. This convergent loss is likely explained by parallel shifts in marine ancestors’ lipid metabolism and/or bloodstream oxidative environment affecting PON1’s role in fatty acid oxidation. PON1 loss also eliminates marine mammals’ main defense against neurotoxicity from specific man-made organophosphorus compounds, implying potential risks in modern environments.
Abstract: Background: Gene regulatory sequences play critical roles in ensuring tightly controlled RNA expression patterns that are essential in a large variety of biological processes. Specifically, enhancer sequences drive expression of their target genes, and the availability of genome-wide maps of enhancer-promoter interactions has opened up the possibility to use machine learning approaches to extract and interpret features that define these interactions in different biological contexts. Methods: Inspired by machine translation models we develop an attention-based neural network model, EPIANN, to predict enhancer-promoter interactions based on DNA sequences. Codes and data are available at https://github.com/wgmao/EPIANN. Results: Our approach accurately predicts enhancer-promoter interactions across six cell lines. In addition, our method generates pairwise attention scores at the sequence level, which specify how short regions in the enhancer and promoter pair-up to drive the interaction prediction. This allows us to identify over-represented transcription factors (TF) binding sites and TF-pair interactions in the context of enhancer function.
Differential requirements for myeloid leukemia IFN-gamma conditioning determine graft-versus-leukemia resistance and sensitivity.
Abstract: The graft-versus-leukemia (GVL) effect in allogeneic hematopoietic stem cell transplantation (alloSCT) is potent against chronic phase chronic myelogenous leukemia (CP-CML), but blast crisis CML (BC-CML) and acute myeloid leukemias (AML) are GVL resistant. To understand GVL resistance, we studied GVL against mouse models of CP-CML, BC-CML, and AML generated by the transduction of mouse BM with fusion cDNAs derived from human leukemias. Prior work has shown that CD4+ T cell-mediated GVL against CP-CML and BC-CML required intact leukemia MHCII; however, stem cells from both leukemias were MHCII negative. Here, we show that CP-CML, BC-CML, and AML stem cells upregulate MHCII in alloSCT recipients. Using gene-deficient leukemias, we determined that BC-CML and AML MHC upregulation required IFN-gamma stimulation, whereas CP-CML MHC upregulation was independent of both the IFN-gamma receptor (IFN-gammaR) and the IFN-alpha/beta receptor IFNAR1. Importantly, IFN-gammaR-deficient BC-CML and AML were completely resistant to CD4- and CD8-mediated GVL, whereas IFN-gammaR/IFNAR1 double-deficient CP-CML was fully GVL sensitive. Mouse AML and BC-CML stem cells were MHCI+ without IFN-gamma stimulation, suggesting that IFN-gamma sensitizes these leukemias to T cell killing by mechanisms other than MHC upregulation. Our studies identify the requirement of IFN-gamma stimulation as a mechanism for BC-CML and AML GVL resistance, whereas independence from IFN-gamma renders CP-CML more GVL sensitive, even with a lower-level alloimmune response.
Abstract: Regulatory T cells (Tregs) are a barrier to anti-tumor immunity. Neuropilin-1 (Nrp1) is required to maintain intratumoral Treg stability and function but is dispensable for peripheral immune tolerance. Treg-restricted Nrp1 deletion results in profound tumor resistance due to Treg functional fragility. Thus, identifying the basis for Nrp1 dependency and the key drivers of Treg fragility could help to improve immunotherapy for human cancer. We show that a high percentage of intratumoral NRP1(+) Tregs correlates with poor prognosis in melanoma and head and neck squamous cell carcinoma. Using a mouse model of melanoma where Nrp1-deficient (Nrp1(-/-)) and wild-type (Nrp1(+/+)) Tregs can be assessed in a competitive environment, we find that a high proportion of intratumoral Nrp1(-/-) Tregs produce interferon-gamma (IFNgamma), which drives the fragility of surrounding wild-type Tregs, boosts anti-tumor immunity, and facilitates tumor clearance. We also show that IFNgamma-induced Treg fragility is required for response to anti-PD1, suggesting that cancer therapies promoting Treg fragility may be efficacious.
Abstract: Inhibitory receptors (IRs) are pivotal in controlling T cell homeostasis because of their intrinsic regulation of conventional effector T (Tconv) cell proliferation, viability, and function. However, the role of IRs on regulatory T cells (Tregs) remains obscure because they could be required for suppressive activity and/or limit Treg function. We evaluated the role of lymphocyte activation gene 3 (LAG3; CD223) on Tregs by generating mice in which LAG3 is absent on the cell surface of Tregs in a murine model of type 1 diabetes. Unexpectedly, mice that lacked LAG3 expression on Tregs exhibited reduced autoimmune diabetes, consistent with enhanced Treg proliferation and function. Whereas the transcriptional landscape of peripheral wild-type (WT) and Lag3-deficient Tregs was largely comparable, substantial differences between intra-islet Tregs were evident and involved a subset of genes and pathways that promote Treg maintenance and function. Consistent with these observations, Lag3-deficient Tregs outcompeted WT Tregs in the islets but not in the periphery in cotransfer experiments because of enhanced interleukin-2-signal transducer and activator of transcription 5 signaling and increased Eos expression. Our study suggests that LAG3 intrinsically limits Treg proliferation and function at inflammatory sites, promotes autoimmunity in a chronic autoimmune-prone environment, and may contribute to Treg insufficiency in autoimmune disease.
Abstract: We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Abstract: Hepatic repair is directed chiefly by the proliferation of resident mature epithelial cells. Furthermore, if predominant injury is to cholangiocytes, the hepatocytes can transdifferentiate to cholangiocytes to assist in the repair and vice versa, as shown by various fate-tracing studies. However, the molecular bases of reprogramming remain elusive. Using two models of biliary injury where repair occurs through cholangiocyte proliferation and hepatocyte transdifferentiation to cholangiocytes, we identify an important role of Wnt signaling. First we identify up-regulation of specific Wnt proteins in the cholangiocytes. Next, using conditional knockouts of Wntless and Wnt coreceptors low-density lipoprotein-related protein 5/6, transgenic mice expressing stable beta-catenin, and in vitro studies, we show a role of Wnt signaling through beta-catenin in hepatocyte to biliary transdifferentiation. Last, we show that specific Wnts regulate cholangiocyte proliferation, but in a beta-catenin-independent manner. CONCLUSION: Wnt signaling regulates hepatobiliary repair after cholestatic injury in both beta-catenin-dependent and independent manners. (Hepatology 2016;64:1652-1666).
Modeling a human hepatocellular carcinoma subset in mice through coexpression of met and point-mutant beta-catenin.
Abstract: Hepatocellular cancer (HCC) remains a significant therapeutic challenge due to its poorly understood molecular basis. In the current study, we investigated two independent cohorts of 249 and 194 HCC cases for any combinatorial molecular aberrations. Specifically we assessed for simultaneous HMET expression or hMet activation and catenin beta1 gene (CTNNB1) mutations to address any concomitant Met and Wnt signaling. To investigate cooperation in tumorigenesis, we coexpressed hMet and beta-catenin point mutants (S33Y or S45Y) in hepatocytes using sleeping beauty transposon/transposase and hydrodynamic tail vein injection and characterized tumors for growth, signaling, gene signatures, and similarity to human HCC. Missense mutations in exon 3 of CTNNB1 were identified in subsets of HCC patients. Irrespective of amino acid affected, all exon 3 mutations induced similar changes in gene expression. Concomitant HMET overexpression or hMet activation and CTNNB1 mutations were evident in 9%-12.5% of HCCs. Coexpression of hMet and mutant-beta-catenin led to notable HCC in mice. Tumors showed active Wnt and hMet signaling with evidence of glutamine synthetase and cyclin D1 positivity and mitogen-activated protein kinase/extracellular signal-regulated kinase, AKT/Ras/mammalian target of rapamycin activation. Introduction of dominant-negative T-cell factor 4 prevented tumorigenesis. The gene expression of mouse tumors in hMet-mutant beta-catenin showed high correlation, with subsets of human HCC displaying concomitant hMet activation signature and CTNNB1 mutations. CONCLUSION: We have identified cooperation of hMet and beta-catenin activation in a subset of HCC patients and modeled this human disease in mice with a significant transcriptomic intersection; this model will provide novel insight into the biology of this tumor and allow us to evaluate novel therapies as a step toward precision medicine. (Hepatology 2016;64:1587-1605).
Characterization of Gonadotrope Secretoproteome Identifies Neurosecretory Protein VGF-derived Peptide Suppression of Follicle-stimulating Hormone Gene Expression.
Abstract: Reproductive function is controlled by the pulsatile release of hypothalamic gonadotropin-releasing hormone (GnRH), which regulates the expression of the gonadotropins luteinizing hormone and FSH in pituitary gonadotropes. Paradoxically, Fshb gene expression is maximally induced at lower frequency GnRH pulses, which provide a very low average concentration of GnRH stimulation. We studied the role of secreted factors in modulating gonadotropin gene expression. Inhibition of secretion specifically disrupted gonadotropin subunit gene regulation but left early gene induction intact. We characterized the gonadotrope secretoproteome and global mRNA expression at baseline and after Galphas knockdown, which has been found to increase Fshb gene expression (1). We identified 1077 secreted proteins or peptides, 19 of which showed mRNA regulation by GnRH or/and Galphas knockdown. Among several novel secreted factors implicated in Fshb gene regulation, we focused on the neurosecretory protein VGF. Vgf mRNA, whose gene has been implicated in fertility (2), exhibited high induction by GnRH and depended on Galphas In contrast with Fshb induction, Vgf induction occurred preferentially at high GnRH pulse frequency. We hypothesized that a VGF-derived peptide might regulate Fshb gene induction. siRNA knockdown or extracellular immunoneutralization of VGF augmented Fshb mRNA induction by GnRH. GnRH stimulated the secretion of the VGF-derived peptide NERP1. NERP1 caused a concentration-dependent decrease in Fshb gene induction. These findings implicate a VGF-derived peptide in selective regulation of the Fshb gene. Our results support the concept that signaling specificity from the cell membrane GnRH receptor to the nuclear Fshb gene involves integration of intracellular signaling and exosignaling regulatory motifs.
Abstract: Mammal species have made the transition to the marine environment several times, and their lineages represent one of the classical examples of convergent evolution in morphological and physiological traits. Nevertheless, the genetic mechanisms of their phenotypic transition are poorly understood, and investigations into convergence at the molecular level have been inconclusive. While past studies have searched for convergent changes at specific amino acid sites, we propose an alternative strategy to identify those genes that experienced convergent changes in their selective pressures, visible as changes in evolutionary rate specifically in the marine lineages. We present evidence of widespread convergence at the gene level by identifying parallel shifts in evolutionary rate during three independent episodes of mammalian adaptation to the marine environment. Hundreds of genes accelerated their evolutionary rates in all three marine mammal lineages during their transition to aquatic life. These marine-accelerated genes are highly enriched for pathways that control recognized functional adaptations in marine mammals, including muscle physiology, lipid-metabolism, sensory systems, and skin and connective tissue. The accelerations resulted from both adaptive evolution as seen in skin and lung genes, and loss of function as in gustatory and olfactory genes. In regard to sensory systems, this finding provides further evidence that reduced senses of taste and smell are ubiquitous in marine mammals. Our analysis demonstrates the feasibility of identifying genes underlying convergent organism-level characteristics on a genome-wide scale and without prior knowledge of adaptations, and provides a powerful approach for investigating the physiological functions of mammalian genes.
A Temporal Switch in the Germinal Center Determines Differential Output of Memory B and Plasma Cells.
Abstract: There is little insight into or agreement about the signals that control differentiation of memory B cells (MBCs) and long-lived plasma cells (LLPCs). By performing BrdU pulse-labeling studies, we found that MBC formation preceded the formation of LLPCs in an adoptive transfer immunization system, which allowed for a synchronized Ag-specific response with homogeneous Ag-receptor, yet at natural precursor frequencies. We confirmed these observations in wild-type (WT) mice and extended them with germinal center (GC) disruption experiments and variable region gene sequencing. We thus show that the GC response undergoes a temporal switch in its output as it matures, revealing that the reaction engenders both MBC subsets with different immune effector function and, ultimately, LLPCs at largely separate points in time. These data demonstrate the kinetics of the formation of the cells that provide stable humoral immunity and therefore have implications for autoimmunity, for vaccine development, and for understanding long-term pathogen resistance.
A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes.
Abstract: The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP-IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen-stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14-color staining panel. Two approaches (FlowReMi.1 and flowDensity-flowType-RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.
Abstract: The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.
Human Dendritic Cell Response Signatures Distinguish 1918, Pandemic, and Seasonal H1N1 Influenza Viruses.
Abstract: UNLABELLED: Influenza viruses continue to present global threats to human health. Antigenic drift and shift, genetic reassortment, and cross-species transmission generate new strains with differences in epidemiology and clinical severity. We compared the temporal transcriptional responses of human dendritic cells (DC) to infection with two pandemic (A/Brevig Mission/1/1918, A/California/4/2009) and two seasonal (A/New Caledonia/20/1999, A/Texas/36/1991) H1N1 influenza viruses. Strain-specific response differences included stronger activation of NF-kappaB following infection with A/New Caledonia/20/1999 and a unique cluster of genes expressed following infection with A/Brevig Mission/1/1918. A common antiviral program showing strain-specific timing was identified in the early DC response and found to correspond with reported transcript changes in blood during symptomatic human influenza virus infection. Comparison of the global responses to the seasonal and pandemic strains showed that a dramatic divergence occurred after 4 h, with only the seasonal strains inducing widespread mRNA loss. IMPORTANCE: Continuously evolving influenza viruses present a global threat to human health; however, these host responses display strain-dependent differences that are incompletely understood. Thus, we conducted a detailed comparative study assessing the immune responses of human DC to infection with two pandemic and two seasonal H1N1 influenza strains. We identified in the immune response to viral infection both common and strain-specific features. Among the stain-specific elements were a time shift of the interferon-stimulated gene response, selective induction of NF-kappaB signaling by one of the seasonal strains, and massive RNA degradation as early as 4 h postinfection by the seasonal, but not the pandemic, viruses. These findings illuminate new aspects of the distinct differences in the immune responses to pandemic and seasonal influenza viruses.
CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations.
Abstract: MOTIVATION: Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis. Considerable effort has been devoted to modeling sample heterogeneity, and presently, there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable. RESULTS: In this study, we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, Cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent; it requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell type.
Abstract: The diagnosis of Parkinson's disease (PD) is usually not established until advanced neurodegeneration leads to clinically detectable symptoms. Previous blood PD transcriptome studies show low concordance, possibly resulting from the use of microarray technology, which has high measurement variation. The Leucine-rich repeat kinase 2 (LRRK2) G2019S mutation predisposes to PD. Using preclinical and clinical studies, we sought to develop a novel statistically motivated transcriptomic-based approach to identify a molecular signature in the blood of Ashkenazi Jewish PD patients, including LRRK2 mutation carriers. Using a digital gene expression platform to quantify 175 messenger RNA (mRNA) markers with low coefficients of variation (CV), we first compared whole-blood transcript levels in mouse models (1) overexpressing wild-type (WT) LRRK2, (2) overexpressing G2019S LRRK2, (3) lacking LRRK2 (knockout), and (4) and in WT controls. We then studied an Ashkenazi Jewish cohort of 34 symptomatic PD patients (both WT LRRK2 and G2019S LRRK2) and 32 asymptomatic controls. The expression profiles distinguished the four mouse groups with different genetic background. In patients, we detected significant differences in blood transcript levels both between individuals differing in LRRK2 genotype and between PD patients and controls. Discriminatory PD markers included genes associated with innate and adaptive immunity and inflammatory disease. Notably, gene expression patterns in levodopa-treated PD patients were significantly closer to those of healthy controls in a dose-dependent manner. We identify whole-blood mRNA signatures correlating with LRRK2 genotype and with PD disease state. This approach may provide insight into pathogenesis and a route to early disease detection.
Hybrid Bayesian-rank integration approach improves the predictive power of genomic dataset aggregation.
Abstract: MOTIVATION: Modern molecular technologies allow the collection of large amounts of high-throughput data on the functional attributes of genes. Often multiple technologies and study designs are used to address the same biological question such as which genes are overexpressed in a specific disease state. Consequently, there is considerable interest in methods that can integrate across datasets to present a unified set of predictions. RESULTS: An important aspect of data integration is being able to account for the fact that datasets may differ in how accurately they capture the biological signal of interest. While many methods to address this problem exist, they always rely either on dataset internal statistics, which reflect data structure and not necessarily biological relevance, or external gold standards, which may not always be available. We present a new rank aggregation method for data integration that requires neither external standards nor internal statistics but relies on Bayesian reasoning to assess dataset relevance. We demonstrate that our method outperforms established techniques and significantly improves the predictive power of rank-based aggregations. We show that our method, which does not require an external gold standard, provides reliable estimates of dataset relevance and allows the same set of data to be integrated differently depending on the specific signal of interest. AVAILABILITY: The method is implemented in R and is freely available at http://www.pitt.edu/~mchikina/BIRRA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Homer1 alternative splicing is regulated by gonadotropin-releasing hormone and modulates gonadotropin gene expression.
Abstract: Hypothalamic gonadotropin-releasing hormone (GnRH) plays a critical role in reproductive physiology by regulating follicle-stimulating hormone (FSH) and luteinizing hormone (LH) gene expression in the pituitary. Analysis of gonadotrope deep-sequencing data identified a global regulation of pre-mRNA splicing by GnRH. Homer1, a gene encoding a postsynaptic density scaffolding protein, was selected for further study. Homer1 expresses a short splice form, Homer1a, and more-abundant long transcripts Homer1b/c. GnRH induced a modest increase in Homer1b/c expression and a dramatic increase in the Homer1a splice form. G protein knockdown studies suggested that the Homer1 induction, but not the regulated splicing, was Galphaq/11 dependent. Phosphorylation of the splicing regulator SRp20 was found to be induced by GnRH. SRp20 depletion attenuated the GnRH-induced increase in the Homer1a-to-Homer1b/c ratio and modulated the effects of GnRH on FSHbeta and LHbeta expression. Homer1 gene knockdown resulted in increased GnRH-induced FSHbeta and LHbeta transcript levels. Furthermore, splice-form-specific reduction of Homer1b/c increased both FSHbeta and LHbeta mRNA induction, whereas reduction of Homer1a had the opposite effect on FSHbeta induction. These results indicate that the regulation of Homer1 splicing by GnRH contributes to gonadotropin gene control.
Abstract: Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.
Abstract: MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.
Abstract: Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Abstract: The regulation of gonadotropin synthesis by GnRH plays an essential role in the neuroendocrine control of reproduction. The known signaling mechanisms involved in gonadotropin synthesis have been expanding. For example, involvement of beta-catenin in LHbeta induction by GnRH has been discovered. We examined the role of beta-catenin in FSHbeta gene expression in LbetaT2 gonadotrope cells. GnRH caused a sustained increase in nuclear beta-catenin levels, which was significantly reduced by c-Jun N-terminal kinase (JNK) inhibition. Small interfering RNA-mediated knockdown of beta-catenin mRNA demonstrated that induction of FSHbeta mRNA by GnRH depended on beta-catenin and that regulation of FSHbeta by beta-catenin occurred independently of the JNK-c-jun pathway. beta-Catenin depletion had no impact on FSHbeta mRNA stability. In LbetaT2 cells transfected with FSHbeta promoter luciferase fusion constructs, GnRH responsiveness was conferred by the proximal promoter (-944/-1) and was markedly decreased by beta-catenin knockdown. However, none of the T-cell factor/lymphoid enhancer factor binding sites in that region were required for promoter activation by GnRH. Chromatin immunoprecipitation further corroborated the absence of direct interaction between beta-catenin and the 1.8-kb FSHbeta promoter. To elucidate the mechanism for the beta-catenin effect, we analyzed approximately 1 billion reads of next-generation RNA sequencing beta-catenin knockdown assays and selected the nuclear cofactor breast cancer metastasis-suppressor 1-like (Brms1L) as one candidate for further study. Subsequent experiments confirmed that Brms1L mRNA expression was decreased by beta-catenin knockdown as well as by JNK inhibition. Furthermore, knockdown of Brms1L significantly attenuated GnRH-induced FSHbeta expression. Thus, our findings indicate that the expression of Brms1L depends on beta-catenin activity and contributes to FSHbeta induction by GnRH.
Involvement of histone demethylase LSD1 in short-time-scale gene expression changes during cell cycle progression in embryonic stem cells.
Abstract: The histone demethylase LSD1, a component of the CoREST (corepressor for element 1-silencing transcription factor) corepressor complex, plays an important role in the downregulation of gene expression during development. However, the activities of LSD1 in mediating short-time-scale gene expression changes have not been well understood. To reveal the mechanisms underlying these two distinct functions of LSD1, we performed genome-wide mapping and cellular localization studies of LSD1 and its dimethylated histone 3 lysine 4 (substrate H3K4me2) in mouse embryonic stem cells (ES cells). Our results showed an extensive overlap between the LSD1 and H3K4me2 genomic regions and a correlation between the genomic levels of LSD1/H3K4me2 and gene expression, including many highly expressed ES cell genes. LSD1 is recruited to the chromatin of cells in the G(1)/S/G(2) phases and is displaced from the chromatin of M-phase cells, suggesting that LSD1 or H3K4me2 alternatively occupies LSD1 genomic regions during cell cycle progression. LSD1 knockdown by RNA interference or its displacement from the chromatin by antineoplastic agents caused an increase in the levels of a subset of LSD1 target genes. Taken together, these results suggest that cell cycle-dependent association and dissociation of LSD1 with chromatin mediates short-time-scale gene expression changes during embryonic stem cell cycle progression.
Abstract: MOTIVATION: ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation. RESULTS: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly. AVAILABILITY: Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Misfolded proteins inhibit proliferation and promote stress-induced death in SV40-transformed mammalian cells.
Abstract: Protein misfolding is implicated in neurodegenerative diseases and occurs in aging. However, the contribution of the misfolded ensembles to toxicity remains largely unknown. Here we introduce 2 primate cell models of destabilized proteins devoid of specific cellular functions and interactors, as bona fide misfolded proteins, allowing us to isolate the gain-of-function of non-native structures. Both GFP-degron and a mutant chloramphenicol-acetyltransferase fused to GFP (GFP-Delta9CAT) form perinuclear aggregates, are degraded by the proteasome, and colocalize with and induce the chaperone Hsp70 (HSPA1A/B) in COS-7 cells. We find that misfolded proteins neither significantly compromise chaperone-mediated folding capacity nor induce cell death. However, they do induce growth arrest in cells that are unable to degrade them and promote stress-induced death upon proteasome inhibition by MG-132 and heat shock. Finally, we show that overexpression of all heat-shock factor-1 (HSF1) and Hsp70 proteins, as well as wild-type and deacetylase-deficient (H363Y) SIRT1, rescue survival upon stress, implying a noncatalytic action of SIRT1 in response to protein misfolding. Our study establishes a novel model and extends our knowledge on the mechanism of the function-independent proteotoxicity of misfolded proteins in dividing cells.
Abstract: Correctly evaluating functional similarities among homologous proteins is necessary for accurate transfer of experimental knowledge from one organism to another, and is of particular importance for the development of animal models of human disease. While the fact that sequence similarity implies functional similarity is a fundamental paradigm of molecular biology, sequence comparison does not directly assess the extent to which two proteins participate in the same biological processes, and has limited utility for analyzing families with several parologous members. Nevertheless, we show that it is possible to provide a cross-organism functional similarity measure in an unbiased way through the exclusive use of high-throughput gene-expression data. Our methodology is based on probabilistic cross-species mapping of functionally analogous proteins based on Bayesian integrative analysis of gene expression compendia. We demonstrate that even among closely related genes, our method is able to predict functionally analogous homolog pairs better than relying on sequence comparison alone. We also demonstrate that the landscape of functional similarity is often complex and that definitive "functional orthologs" do not always exist. Even in these cases, our method and the online interface we provide are designed to allow detailed exploration of sources of inferred functional similarity that can be evaluated by the user.
Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans.
Abstract: Tissue-specific gene expression plays a fundamental role in metazoan biology and is an important aspect of many complex diseases. Nevertheless, an organism-wide map of tissue-specific expression remains elusive due to difficulty in obtaining these data experimentally. Here, we leveraged existing whole-animal Caenorhabditis elegans microarray data representing diverse conditions and developmental stages to generate accurate predictions of tissue-specific gene expression and experimentally validated these predictions. These patterns of tissue-specific expression are more accurate than existing high-throughput experimental studies for nearly all tissues; they also complement existing experiments by addressing tissue-specific expression present at particular developmental stages and in small tissues. We used these predictions to address several experimentally challenging questions, including the identification of tissue-specific transcriptional motifs and the discovery of potential miRNA regulation specific to particular tissues. We also investigate the role of tissue context in gene function through tissue-specific functional interaction networks. To our knowledge, this is the first study producing high-accuracy predictions of tissue-specific expression and interactions for a metazoan organism based on whole-animal data.
Abstract: MOTIVATION: Biological data generation has accelerated to the point where hundreds or thousands of whole-genome datasets of various types are available for many model organisms. This wealth of data can lead to valuable biological insights when analyzed in an integrated manner, but the computational challenge of managing such large data collections is substantial. In order to mine these data efficiently, it is necessary to develop methods that use storage, memory and processing resources carefully. RESULTS: The Sleipnir C++ library implements a variety of machine learning and data manipulation algorithms with a focus on heterogeneous data integration and efficiency for very large biological data collections. Sleipnir allows microarray processing, functional ontology mining, clustering, Bayesian learning and inference and support vector machine tasks to be performed for heterogeneous data on scales not previously practical. In addition to the library, which can easily be integrated into new computational systems, prebuilt tools are provided to perform a variety of common tasks. Many tools are multithreaded for parallelization in desktop or high-throughput computing environments, and most tasks can be performed in minutes for hundreds of datasets using a standard personal computer. AVAILABILITY: Source code (C++) and documentation are available at http://function.princeton.edu/sleipnir and compiled binaries are available from the authors on request.
Abstract: The role of the outermost three charged residues of Domain IV/S4 in controlling gating of Ca(v)3.2 was investigated using single substitutions of each arginine with glutamine, cysteine, histidine, and lysine in a Flp-In-293 cell line, in which expression levels could be compared. Channel density, based on gating charge measurements, was ~125,000 channels/cell (10 fC/pF), except for R2Q and R3C, which expressed at lower levels. Channels substituted at Arg-1715 (R1C, R1Q, R1H) demonstrated such modest changes that a role in voltage sensing could not be determined. Arg-1718 (R2) made a contribution to activation voltage sensing, and the channel was sensitive to the geometry of side-chain substitutions at this position. Arg-1721 (R3) substitutions produced complex kinetic changes that together suggested that geometry made a larger contribution than charge. Current decay at positive potentials (O-->I) exponentially approached a constant value for all mutants except R2K channels, which were biphasically dependent on potential. R2K channels also displayed slowed deactivation with reduced voltage dependence despite near control values for conductance. Voltage-dependent accessibility of R to C mutants, evaluated with intracellularly and extracellularly applied methanthiosulfonate (MTS) reagents, showed that both R2 and R3 were exposed only when cells were depolarized, although it was not necessary for channels to open. Together, the data indicate that Domain IV/S4 is an activation domain and is not involved in inactivation from the open state.