(image credit)


Current Projects

Transcription factor organization during cellular reprogramming

The ability to reprogram cells from one type to another presents a powerful tool to diverse areas of research and medicine. We study the interplay between genomic sequence, transcription factor binding, and chromatin architecture during cell state change, with the goal of composing simple mechanistic models that explain transcription factor binding dynamics and which can be used in reprogramming systems. We characterize this interplay using DNase-seq, ChIP-Seq, ChIA-PET, and RNA-seq data, focusing on developmental and stem cell differentiation systems along the pancreatic lineage.

Detecting high resolution chromatin interactions from high throughput sequencing data

The primary aim of this project is to better understand the regulation of gene expression through the application of novel computational methods to high throughput sequencing data. In particular, recent work has focused on improving the fidelity and resolution of chromatin interactions learned from ChIA-PET data. We are currently working in collaboration with experimental biologists to characterize the dynamics of chromatin interactions during cellular differentiation.

High resolution analysis of regulatory genome grammars: discovery, modeling and testing

The goal of this project is to develop computational methods to discover human genome regulatory elements at high spatial resolution from high throughput sequencing data such as ChIP-Seq, DNase-Seq and RNA-Seq, to learn models of the regulatory genome grammars, and to test these grammars experimentally using massively parallel reporter assay (MPRA) to further improve the grammar models. A deeper understanding of regulatory genome grammars is important in elucidating the mechanisms of gene regulation and interpreting the functional role of regulatory genetic variations in health and diseases.

Computational genetics for model organisms

This project focuses on machine learning and statistical approaches to problems in genetics (model organism and human) and molecular biology. One application is a collaborative project investigating the genetic sources of phenotypic variability in yeast. This involves developing models of genetic complexity and designing and analyzing high-throughput sequencing experiments.

Computational detection of somatic variation

Studies have shown that somatic cells do not exhibit the same genotype. One possible explanation for this somatic mosaicism is that it is caused by genomic changes occurring over the course of development. We are using high-throughput sequencing data to test this hypothesis and identify particular developmentally programmed variants. In general, we are interested in computational methods for understanding regulatory genomics.

Computational prediction of chromatin controlling factors

We are conducting work on lineage-structured DNase-seq data. This work analyzes the transcription factor binding patterns across a variety of cell types. We discovered a new class of transcriptional factors which increase chromatin accessibility in a local region, which gives us a way to predict changes to chromatin over time.

Statistical correction for high-throughtput sequencing

We are developing methods to take advantage of correlations within and between high-throughtput sequencing experiments. This work formalizes the notion that the Poisson distribution, commonly used in sequencing data analysis, does not fit real world sequencing data well. Instead of suggesting that people use some type of more complicated distribution, we developed a method which can preprocess and re-weight data so that existing Poisson based pipelines work correctly.