Command line interface documentation

PRESCIENT is primarily implemented as a command line tool. Access manual for each command in the command-line using the syntax prescient commands -h . Run the following commands via prescient command [params].

process_data

Process normalized expression dataframe into compatible PRESCIENT file format.

Parameter	Description
data_path	Path to normalized expression CSV.
out_dir	Directory to store PRESCIENT torch object.
meta_path	Path to metadata CSV containing timepoint and celltype annotation data.
tp_col	Column name of timepoint feature in metadata provided as string.
celltype_col	Column name of timepoint feature in metadata provided as string.
num_pcs	Define number of PCs to compute for input to training. Default: 50
num_neighbors_umap	Define number of neighbors for UMAP trasformation (UMAP used only for visualization.) Default: 10
growth_path	Path to torch pt file containg pre-computed growth weights. See vignette notebooks for generating growth rate vector.

Example Usage:
prescient process_data -d data/Veres2019/Stage_5.Seurat.csv -m data/Veres2019/GSE114412_Stage_5.all.cell_metadata.csv --growth_path data/Veres2019/Veres2019_growth-kegg.pt -o './' --tp_col 'CellWeek' --celltype_col 'Assigned_cluster'
This command takes a normalized expression CSV, metadata CSV, and pre-computed weight torch file as input and produces a PRESCIENT training torch object.

train_model

Train a PRESCIENT model using a PRESCIENT data object as input.

Parameter	Description
data_path	Path to PRESCIENT data torch object produced by process_data.
weight_name	Descriptive name of weight vector being used provided as string for model filename.
loss	Designate distance function for loss. Default: euclidean
k_dim	Designate activation function for layers of NN. Default: 500
activation	Designate hidden units of fully connected layers in model. Default: softplus
layers	Number of layers for neural network parameterizing the potential function. Default: 2
pretrain_lr	Learning rate for Adam optimizer during pretraining. Default: 1e-9
pretrain_epochs	Number of epochs for pretraining with contrastive divergence. Default: 500
train_epochs	Number of epochs for training. Default: 2500
train_lr	Learning rate for Adam optimizer during training. Default: 0.01
train_dt	Timestep for simulations during training. Default: 0.1
train_sd	Standard deviation of Gaussian noise for simulation steps. Default: 0.5
train_tau	Tau hyperparameter of PRESCIENT. Default: 1e-6
train_batch	Batch size (fraction) for training. Default: 0.1
train_clip	Gradient clipping threshold for training. Default: 0.25
save	Save model every n epochs as torch dict. Default: 100

Example Usage:
prescient train_model -i data.pt --out_dir /experiments/ --weight_name 'kegg-growth' --seed 3 --layers 2 --k_dim 200 --train_tau 1e-06
This command trains a PRESCIENT model using a PRESCIENT training torch object.

simulate_trajectories

Simulate cellular trajectories using a trained PRESCIENT model and a PRESCIENT data object.

Parameter	Description
data_path	Path to PRESCIENT training file (stored in out_dir of process_data command).
model_path	Path to directory containing PRESCIENT model for simulation.
out_path	Path to directory for storing output.
num_sims	Number of simulations (random initializations of n cells) to run. Default: 10
num_cells	Number of cells per simulation. Default: 200
num_steps	Number of steps forward in time. If not provided, steps will be calculated based on start and end point + train dt.
seed	Choose the seed of the trained model to use for simulations. Default: 1
epoch	Choose which epoch of the chosen model to use for simulations. Provide this value as str. Default: 002500
gpu	If available, assign GPU device number (requires CUDA). Provide as int.
celltype_subset	Randomly sample initial cells from a particular celltype defined in metadata. Provide celltype as str as appears in metadata.
tp_subset	Randomly sample initial cells from a particular timepoint. Provide timepoint as int or as appears in metadata.

Example Usage:
prescient simulate_trajectories -i data.pt --model_path /experiments/kegg-growth-softplus_2_200-1e-06/ --num_steps 10 -o experiments/ --seed 2
This command generates simulated trajectories from randomly initialized cells using a PRESCIENT model and training torch object.

perturbation_analysis

Simulate unperturbed and perturbed simulations of cells using a trained PRESCIENT model and a PRESCIENT data object.

Parameter	Description
perturb_genes	Provide a gene or list of genes to be perturbed as a string (commas, no spaces). Must be in the feature set used to train models.
z_score	Set magnitude of z_score perturbation. Default: 5.0
data_path	Path to PRESCIENT training file (stored in out_dir of process_data command).
model_path	Path to directory containing PRESCIENT model for simulation.
out_path	Path to directory for storing output.
num_sims	Number of simulations (random initializations of n cells) to run. Default: 10
num_cells	Number of cells per simulation. Default: 200
num_steps	Number of steps forward in time. If not provided, steps will be calculated based on start and end point + train dt. Default: nulls
seed	Choose the seed of the trained model to use for simulations. Default: 1
epoch	Choose which epoch of the chosen model to use for simulations. Default: 1344
gpu	If available, assign GPU device number (requires CUDA). Provide as int.
celltype_subset	Randomly sample initial cells from a particular celltype defined in metadata. Provide celltype as str as appears in metadata.
tp_subset	Randomly sample initial cells from a particular timepoint. Provide timepoint as int or as appears in metadata.

Example Usage:
prescient perturbation_analysis -i ../Downloads/data.pt -p 'GENE1,GENE2,GENE3' -z 5 --model_path /experiments/kegg-softplus_2_200-1e-06/ --num_steps 10 --seed 2 -o experiments/
This command runs forward simulations of unperturbed cells and cells with perturbations of selected genes.

Links to resources for running CLI with Google Cloud SDK

If you do not have access to GPUs and want to use them for training and simulations (alternatively, you can use CPUs) from the command line, we recommend using any cloud computing service that provides NVIDIA GPUs with CUDA support. For an easier approach, we have provided a short demo in the notebooks tab for using free cloud GPUs in a notebook via Google Colab. We recommend this approach, as the setup process for Google Cloud SDKs can be intensive. That being said, we provide a list of Google Cloud web tutorials for setting up a Google Cloud account, Google Cloud SDKs command-line interface, creating a GPU instance, and running an interactive shell: