Methods for training and interpretation of an ensemble of neural networks for multi-task functional prediction of accessibility or histone modifications from DNA sequence.
We provide support to train a DeepAccess model with either bed files or fasta files (and labels) as input. For training from bed files, you will also need to download a reference genome and chrom.sizes file, which are available on UCSC:
usage: deepaccess train [-h] -l LABELS [LABELS ...]
-out OUT [-ref REFFASTA]
[-g GENOME] [-beds BEDFILES [BEDFILES ...]]
[-fa FASTA] [-fasta_labels FASTA_LABELS]
[-f FRAC_RANDOM] [-nepochs NEPOCHS]
[-ho HOLDOUT] [-seed SEED] [-verbose]
optional arguments:
-h, --help show this help message and exit
-l LABELS [LABELS ...], --labels LABELS [LABELS ...]
-out OUT, --out OUT
-ref REFFASTA, --refFasta REFFASTA
-g GENOME, --genome GENOME
genome chrom.sizes file
-beds BEDFILES [BEDFILES ...], --bedfiles BEDFILES [BEDFILES ...]
-fa FASTA, --fasta FASTA
-fasta_labels FASTA_LABELS, --fasta_labels FASTA_LABELS
-f FRAC_RANDOM, --frac_random FRAC_RANDOM
-nepochs NEPOCHS, --nepochs NEPOCHS
-ho HOLDOUT, --holdout HOLDOUT
chromosome to holdout
-seed SEED, --seed SEED
-verbose, --verbose Print training progress
| Argument | Description | Example |
|---|---|---|
| -h, –help | show this help message and exit | NA |
| -l –labels | list of labels for each bed file | C1 C2 C3 |
| -out –out | output folder name | myoutput |
| -ref –ref | reference fasta; required with bed input | mm10.fa |
| -g –genome | genome chromosome sizes; required with bed input | default/mm10.chrom.sizes |
| -beds –bedfiles | list of bed files; one of beds or fa input required | C1.bed C2.bed C3.bed |
| -fa –fasta | fasta file; one of beds or fa input required | C1C2C3.fa |
| -fasta_labels –fasta_labels | text file containing tab delimited labels (0 or 1) for each fasta line with one column for each class | C1C2C3.txt |
| -f –frac_random | for bed file input fraction of random outgroup regions to add to training | 0.1 |
| -nepochs –nepochs | number of training iterations | 1 |
| -ho –holdout | chromosome name to hold out (only with bed input) | chr19 |
| -verbose –verbose | print training and evaluation progress | NA |
| -seed –seed | set tensorflow seed | 2021 |