Research

Current Projects

Strategies for pan-strain COVID vaccines using n-times coverage for adaptive cellular immunity

We have developed the n-times coverage method for peptide vaccine design that maximizes the fraction of a population that receives n immunogenic peptides to induce T cell intolerance to a selected pathogen. We find that maximum n-times coverage produces a pan-strain COVID-19 vaccine design that is predicted to be superior to 29 other published designs in predicted population coverage and the expected number of peptides displayed by each individual’s HLA molecules. In collaboration with the University of Texas Medical Branch, Boston University, the Ragon Institute, and Acuitas Therapeutics we are testing the efficacy of our vaccine design in transgenic animal models against the B.1.351 SARS-Cov-2 variant of concern. The vaccine is delivered using an mRNA-LNP platform.

Synthesis of large and diverse sequence libraries for therapeutic discovery via computational methods

The successful discovery of novel biological therapeutics by selection requires highly diverse and efficient libraries of candidate sequences. Using deep learning techniques, we design strategies for synthesizing such libraries that can be used to enable the discovery of novel therapeutics against challenging targets.

Specificity-informed computational T cell receptor design

Immunotherapy and targeted cell based therapies are promising modalities for the treatment of cancer, however they have suffered from multiple setbacks when translated to the clinic. Namely, off-target binding of healthy cells has been a major limiting factor for T cell receptor (TCR) based cell therapies. In my work we try to better understand this phenomenon by utilizing high-throughput immunological binding assays that can generate large interaction maps between many TCRs and corresponding pMHC. With this information we develop machine learning models that can predict the specificity profile of a given TCR and to further design TCRs with the most favorable characteristics.

Understanding the dynamics of genomic landscape during motor neuron differentiation

We aim to understand the gene-regulatory role of repetitive elements in differentiating cells. Specifically, we aim to answer the following questions: First, are specific repetitive elements enriched in a given cell type? Second, do expanded repetitive elements bind to cell type-specific transcription factors? Third, are expanded repetitive elements part of canonical chromosomes?

High-througput characterization and modeling of cellular development and directed differentiation

Understanding and modeling how cells are driven to transient and terminal states can enable precise manipulation of cell fates in vitro, leading to more accurate models of disease and cell-based therapeutics. We study differentiation using computational methods and multi-modal high throughput data of gene expression and chromatin state over time, including scRNA-seq, ATAC-seq, and ChIP-seq. We develop machine learning methods for ranking transcription factors involved in cellular reprogramming and modeling an underlying potential function–or landscape–of differentiation to enable in silico simulations of differentiation with and without genetic perturbations.

Computationally guided vaccine design for Mycobacterium tuberculosis

The goal of this project is to computationally design subunit vaccines for Tuberculosis, a disease caused by the bacterium Mycobacterium tuberculosis (Mtb). Systematically determining the composition of antigen(s) to include in vaccines is an ongoing challenge. To elicit an immune response, vaccines must be composed of antigens that are effectively presented by major histocompatibility proteins (MHC-I and MHC-II). Here, we use publicly available data combined with a combinatorial machine learning (ML) approach to identify Mtb-derived peptides that are likely to be presented by MHC-I and MHC-II across the diversity of human leukocyte antigen (HLA) alleles and provide multiple times coverage over populations of interest. First, we use ML-based peptide-MHC scoring tools (NetMHC) to predict binding affinity of all Mtb derived peptides across all publicly available HLAs. Next, we use a combinatorial evaluation and design method (EvalVax and OptiVax-ILP) to compute population coverage of whole Mtb proteins and design subunit vaccines with optimal coverage for global and targeted populations, respectively.

Detecting high resolution chromatin interactions from high throughput sequencing data

The primary aim of this project is to better understand the regulation of gene expression through the application of novel computational methods to high throughput sequencing data. In particular, recent work has focused on improving the fidelity and resolution of chromatin interactions learned from ChIA-PET data. We are currently working in collaboration with experimental biologists to characterize the dynamics of chromatin interactions during cellular differentiation.

High resolution analysis of regulatory genome grammars: discovery, modeling and testing

The goal of this project is to develop computational methods to discover human genome regulatory elements at high spatial resolution from high throughput sequencing data such as ChIP-Seq, DNase-Seq and RNA-Seq, to learn models of the regulatory genome grammars, and to test these grammars experimentally using massively parallel reporter assay (MPRA) to further improve the grammar models. A deeper understanding of regulatory genome grammars is important in elucidating the mechanisms of gene regulation and interpreting the functional role of regulatory genetic variations in health and diseases.