L0 segmentation

An ultra-fast solution for the the L0 segmentation problem for discovering features from complex epigenetic signals or any sequential data

code.

Paper: A unified hypothesis-free feature extraction framework for diverse epigenomic data

PCnt

PCnt is a hybrid optimization method for causal learning that combines the strengths of PC and NOTEARS and shows superior performance on real data biological benchmarks

code.

Paper: A hybrid constrained continuous optimization approach for optimal causal discovery from biological data

InstaPrism

A fast re-implementation of a highly preformant proportion estimation method: BayesPrism

code.

Paper: InstaPrism: an R package for fast implementation of BayesPrism

Hetergeneous bulk RNAseq simulation

A framework for simulating realistic bulk data from single cell to enable accuarte cell type proportion and deconvolution benchmarking

code.

Paper: Heterogeneous pseudobulk simulation enables realistic benchmarking of cell-type deconvolution methods

TISFM: totally interpretable sequence to function model

An intrinsically interpretable neural network architecture for sequence-to-function modeling that replaces convolution towers with enitrely interpretable layers and transformations

code.

Paper: TISFM: totally interpretable sequence to function model

NIFA Non-negative Independent Factor Analysis

A model that generalizes non-negative matrix factorization (NMF) and independent component analysis (ICA) to find disentangled representations of single cell data

code.

Paper: Non-negative Independent Factor Analysis disentangles discrete and continuous sources of variation in scRNA-seq data

PLIER Pathway-Level Information Extractor

PLIER is a matrix decomposition method that uses prior information from pathway databases to find an interpretable latent variable representation of gene expression datasets.

code.

Paper: Pathway-Level Information ExtractoR (PLIER): a generative model for gene expression data

RERconverge

A suite of tools to calculate relative evolutionary rates (RERs) and their associations with phenotypes.

code

Application papers:
Hundreds of Genes Experienced Convergent Shifts in Selective Pressure in Marine Mammals
Subterranean mammals show convergent regression in ocular genes and enhancers, along with adaptation to tunneling

CELLCode

An R package that performs multi-layered differnetial expression analysis to account for tissue composition heterogeneity. It estimates cell-proportions, performs and correction, and assigns trascriptionally regulated genes to the tissue of origin.

code

Paper:
CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations

DataRemix

An R package to optimize a data-normalization transform for specific biological tasks.

code

IntervalStats

A tool to compute associations between genomic interavals such as peaks for a ChIPseq or ATACseq dataset that uses exact enumeration to compute accurate p-values.

code Also available as part of the coloc-stats webserver.

Paper: An effective statistical evaluation of ChIPseq dataset similarity

EPIANN

An attention-based deep learning model to predict interacting chromosomal regions. code

Navigation

L0 segmentation

PCnt

InstaPrism

Hetergeneous bulk RNAseq simulation

TISFM: totally interpretable sequence to function model

NIFA Non-negative Independent Factor Analysis

PLIER Pathway-Level Information Extractor

RERconverge

CELLCode

DataRemix

IntervalStats

EPIANN