Chikina Lab banner
Chikina Lab

InfluenceNet

InfluenceNet

A transparent convolutional neural network that models how transcription factor binding motifs interact across DNA sequences. Unlike black-box deep learning models that require expensive post-hoc analysis, InfluenceNet directly encodes motif influence as a position- and motif-specific profile, providing immediate interpretability while matching state-of-the-art performance.

code

Paper: InfluenceNet: Encoding Motif Influence for Interpretable Modeling of Cis-Regulatory Syntax

dLEM

dLEM

A differentiable framework that fits the loop-extrusion steady state to Hi-C data, enabling compact explanations of observed structures in terms of cohesin translocation rates. Enables seamless integration with other 1D data, global perturbation predictions, and dramatically simplifies sequence to function models of chromatin conformation.

code | Deep dLEM model: code

Paper: Mechanistic Genome Folding at Scale through the Differentiable Loop Extrusion Model

MotifDiff

MotifDiff

A computational tool that rapidly predicts how genetic variants affect transcription factor binding to DNA. MotifDiff uses biophysical models based on position weight matrices and a novel probability-based normalization strategy to score millions of variants in minutes, providing interpretable, mechanistic insights that address limitations of deep learning approaches on common variants.

code

Paper: Ultra-fast Variant Effect Prediction Using Biophysical Transcription Factor Binding Models

L0 Segmentation

L0

An ultra-fast solution for approximating sequential signals as piecewise-constant segments, enabling compression and denoising of nucleotide-resolution epigenetic data. Unlike fused lasso methods that dampen the global signal, L0 segmentation preserves key biological features and can be integrated into other machine learning models for downstream analysis.

code

Paper: A unified hypothesis-free feature extraction framework for diverse epigenomic data

PCnt

PCnt

A hybrid method for causal discovery that integrates the PC algorithm's accuracy at estimating causal graph structure with NOTEARS' continuous optimization for quantifying causal effect sizes. PCnt uses PC's output as a constraint within NOTEARS' framework, achieving superior performance on real biological benchmarks including Perturb-seq and eQTL data.

code

Paper: A hybrid constrained continuous optimization approach for optimal causal discovery from biological data

InstaPrism

InstaPrism

A fast R package for cell-type deconvolution that replaces BayesPrism's computationally expensive Gibbs sampling with a fixed-point algorithm, producing effectively equivalent results with major speed and memory improvements. Includes precompiled reference datasets for various cancer types to streamline analysis workflows.

code

Paper: InstaPrism: an R package for fast implementation of BayesPrism

Heterogeneous Bulk RNAseq Simulation

HetSim

A framework for benchmarking cell-type deconvolution methods using realistic simulated bulk RNA-seq. Standard simulation pipelines randomly sample single cells regardless of intrinsic differences, producing unrealistic variance patterns. This heterogeneous simulation strategy reveals that deconvolution method classes differ dramatically in robustness to cellular heterogeneity, with BayesPrism and hybrid MuSiC/CIBERSORTx approaches performing best.

code

Paper: Heterogeneous pseudobulk simulation enables realistic benchmarking of cell-type deconvolution methods

TISFM: Totally Interpretable Sequence to Function Model

TISFM

An intrinsically interpretable neural network for predicting genomic function from DNA sequence. Unlike standard deep learning models where interpretability requires expensive post-hoc analysis, TISFM's internal parameters directly correspond to relevant sequence motifs. Tested on open chromatin data across immune cell types, it outperforms standard CNN models while correctly identifying transcription factors with known roles in cell differentiation.

code

Paper: TISFM: totally interpretable sequence to function model

NIFA: Non-negative Independent Factor Analysis

NIFA

A probabilistic model that simultaneously captures discrete cell-type identity and continuous pathway activity in single-cell RNA-seq data by combining properties of non-negative matrix factorization and independent component analysis. NIFA outperforms ICA, PCA, NMF, and scCoGAPS on benchmarks, and when applied to immunotherapy data, reproduces and refines previous findings while enabling discovery of new clinically relevant cell states.

code

Paper: Non-negative Independent Factor Analysis disentangles discrete and continuous sources of variation in scRNA-seq data

PLIER: Pathway-Level Information Extractor

PLIER

PLIER is a matrix decomposition method that uses prior information from pathway databases to find an interpretable latent variable representation of gene expression datasets.

code

Paper: Pathway-Level Information ExtractoR (PLIER): a generative model for gene expression data

RERconverge

RERconverge

CellCODE

CellCODE

An R package for differential expression analysis that accounts for varying cell-type proportions in heterogeneous tissue samples. CellCODE uses latent variable analysis to estimate surrogate proportion variables from marker genes, then incorporates these into differential expression to improve detection of regulated genes and assign them to their cell type of origin — all without requiring additional experimental data beyond expression measurements.

code

Paper: CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations

DataRemix

DataRemix

An R package that optimizes a singular value decomposition-based data transformation with three tunable parameters to prioritize biological signals over noise, without requiring external dataset-specific knowledge. DataRemix outperforms standard normalization methods and was used to discover what is believed to be the first replicable trans-eQTL effect in human brain tissue.

code

IntervalStats

IntervalStats

A tool to compute associations between genomic intervals such as peaks for a ChIPseq or ATACseq dataset that uses exact enumeration to compute accurate p-values.

code | Also available as part of the coloc-stats webserver

Paper: An effective statistical evaluation of ChIPseq dataset similarity

EPIANN

An attention-based deep learning model to predict interacting chromosomal regions.

code