Command line interface documentation

PRESCIENT is primarily implemented as a command line tool. Access manual for each command in the command-line using the syntax prescient commands -h . Run the following commands via prescient command [params].


process_data

Parameter Description
data_path Path to normalized expression CSV.
out_dir Directory to store PRESCIENT torch object.
meta_path Path to metadata CSV containing timepoint and celltype annotation data.
tp_col Column name of timepoint feature in metadata provided as string.
celltype_col Column name of timepoint feature in metadata provided as string.
num_pcs Define number of PCs to compute for input to training.
Default: 50
num_neighbors_umap Define number of neighbors for UMAP trasformation (UMAP used only for visualization.)
Default: 10
growth_path Path to torch pt file containg pre-computed growth weights. See vignette notebooks for generating growth rate vector.
Example Usage:
prescient process_data -d data/Veres2019/Stage_5.Seurat.csv -m data/Veres2019/GSE114412_Stage_5.all.cell_metadata.csv --growth_path data/Veres2019/Veres2019_growth-kegg.pt -o './' --tp_col 'CellWeek' --celltype_col 'Assigned_cluster'
This command takes a normalized expression CSV, metadata CSV, and pre-computed weight torch file as input and produces a PRESCIENT training torch object.

train_model

Parameter Description
data_path Path to PRESCIENT data torch object produced by process_data.
weight_name Descriptive name of weight vector being used provided as string for model filename.
loss Designate distance function for loss.
Default: euclidean
k_dim Designate activation function for layers of NN.
Default: 500
activation Designate hidden units of fully connected layers in model.
Default: softplus
layers Number of layers for neural network parameterizing the potential function.
Default: 2
pretrain_lr Learning rate for Adam optimizer during pretraining.
Default: 1e-9
pretrain_epochs Number of epochs for pretraining with contrastive divergence.
Default: 500
train_epochs Number of epochs for training.
Default: 2500
train_lr Learning rate for Adam optimizer during training.
Default: 0.01
train_dt Timestep for simulations during training.
Default: 0.1
train_sd Standard deviation of Gaussian noise for simulation steps.
Default: 0.5
train_tau Tau hyperparameter of PRESCIENT.
Default: 1e-6
train_batch Batch size (fraction) for training.
Default: 0.1
train_clip Gradient clipping threshold for training.
Default: 0.25
save Save model every n epochs as torch dict.
Default: 100
Example Usage:
prescient train_model -i data.pt --out_dir /experiments/ --weight_name 'kegg-growth' --seed 3 --layers 2 --k_dim 200 --train_tau 1e-06
This command trains a PRESCIENT model using a PRESCIENT training torch object.

simulate_trajectories

Parameter Description
data_path Path to PRESCIENT training file (stored in out_dir of process_data command).
model_path Path to directory containing PRESCIENT model for simulation.
out_path Path to directory for storing output.
num_sims Number of simulations (random initializations of n cells) to run.
Default: 10
num_cells Number of cells per simulation.
Default: 200
num_steps Number of steps forward in time. If not provided, steps will be calculated based on start and end point + train dt.
seed Choose the seed of the trained model to use for simulations.
Default: 1
epoch Choose which epoch of the chosen model to use for simulations. Provide this value as str.
Default: 002500
gpu If available, assign GPU device number (requires CUDA). Provide as int.
celltype_subset Randomly sample initial cells from a particular celltype defined in metadata. Provide celltype as str as appears in metadata.
tp_subset Randomly sample initial cells from a particular timepoint. Provide timepoint as int or as appears in metadata.
Example Usage:
prescient simulate_trajectories -i data.pt --model_path /experiments/kegg-growth-softplus_2_200-1e-06/ --num_steps 10 -o experiments/ --seed 2
This command generates simulated trajectories from randomly initialized cells using a PRESCIENT model and training torch object.

perturbation_analysis

Parameter Description
perturb_genes Provide a gene or list of genes to be perturbed as a string (commas, no spaces). Must be in the feature set used to train models.
z_score Set magnitude of z_score perturbation.
Default: 5.0
data_path Path to PRESCIENT training file (stored in out_dir of process_data command).
model_path Path to directory containing PRESCIENT model for simulation.
out_path Path to directory for storing output.
num_sims Number of simulations (random initializations of n cells) to run.
Default: 10
num_cells Number of cells per simulation.
Default: 200
num_steps Number of steps forward in time. If not provided, steps will be calculated based on start and end point + train dt.
Default: nulls
seed Choose the seed of the trained model to use for simulations.
Default: 1
epoch Choose which epoch of the chosen model to use for simulations.
Default: 1344
gpu If available, assign GPU device number (requires CUDA). Provide as int.
celltype_subset Randomly sample initial cells from a particular celltype defined in metadata. Provide celltype as str as appears in metadata.
tp_subset Randomly sample initial cells from a particular timepoint. Provide timepoint as int or as appears in metadata.
Example Usage:
prescient perturbation_analysis -i ../Downloads/data.pt -p 'GENE1,GENE2,GENE3' -z 5 --model_path /experiments/kegg-softplus_2_200-1e-06/ --num_steps 10 --seed 2 -o experiments/
This command runs forward simulations of unperturbed cells and cells with perturbations of selected genes.

Links to resources for running CLI with Google Cloud SDK

If you do not have access to GPUs and want to use them for training and simulations (alternatively, you can use CPUs) from the command line, we recommend using any cloud computing service that provides NVIDIA GPUs with CUDA support. For an easier approach, we have provided a short demo in the notebooks tab for using free cloud GPUs in a notebook via Google Colab. We recommend this approach, as the setup process for Google Cloud SDKs can be intensive. That being said, we provide a list of Google Cloud web tutorials for setting up a Google Cloud account, Google Cloud SDKs command-line interface, creating a GPU instance, and running an interactive shell:

  1. Setting up account and billing for buying GPU compute time
  2. Downloading Google Cloud SDK (gcloud command)
  3. Creating a virtual machine (VM) with mounted GPU
  4. Using gcloud interactive shell