Quickstart
Here, we provide the path of least resistance (the command-line interface)
to training a PRESCIENT model and running perturbational analyses. To install PRESCIENT refer to
the homepage.
Create PRESCIENT torch object
First, we recommend looking at how to prepare inputs for PRESCIENT
and bring your scRNA-seq to an acceptable format for PRESCIENT. For estimating growth weights, please refer to the notebooks tab.
Run the following to estimate growth rates and create a PRESCIENT training pyTorch object:
prescient process_data -d /path/to/your_data.csv -o /path/for/output/ -m /path/to/metadata.csv --tp_col "timepoint colname" --celltype_col "annotation colname" --growth_path /path/to/growth_weights.pt
Train PRESCIENT model
To train a PRESCIENT model, it is beneficial to use GPU acceleration with CUDA support. PRESCIENT models can be trained on CPUs but will take longer to train.
For a demo on runining PRESCIENT with free GPU cloud resources on Google Colab, please refer to the notebooks tab.
Next, train a basic PRESCIENT model with default parameters with the following command and the data.pt
file from the process_data
command:
prescient train_model -i /path/to/data.pt --out_dir /experiments/ --weight_name 'kegg-growth'
For more options to control model architecture and hyperparameters,
please refer to CLI documentation.
Now, with a trained PRESCIENT model and the original PRESCIENT data object, you can simulate trajectories of cells with arbitrary intializations.
To do so, run the simulate command line functions.
In the following example, the function will randomly sample 50 cells at
the first provided timepoint and simulate forward to the final timepoint:
prescient simulate_trajectories -i /path/to/data.pt --model_path /path/to/trained/model_directory -o /path/to/output_dir --seed 2
This will produce a PRESCIENT simulation object containing the following:
- "sims": generated cells of simulated trajectory
For more control over choosing cells, number of steps, etc. please refer to CLI documentation.
One of the advantages of training a PRESCIENT model is the ability to simulate the trajectory of out-of-sample
or perturbed initial cells. To do this, individual or sets of genes are perturbed by setting the value(s) to a z-score in scaled
expression space. The following function induces perturbations and generates simulated trajectories of both unperturbed and perturbed cells
for comparison.
In the following example GENE1, GENE2, and GENE3 are perturbed in 10 random samples of 200 cells with a z-score of 5 and simulated forward to the final timepoint with a trained PRESCIENT model:
prescient perturbation_analysis -i /path/to/data.pt -p 'GENE1,GENE2,GENE3' -z 5 --model_path /path/to/trained/model_directory --seed 2 -o /path/to/output_dir
This will produce a PRESCIENT simulation object containing the following:
- "perturbed_genes": list of genes perturbed
- "unperturbed_sim": PC coordinates of unperturbed simulated trajectory
- "perturbed_sim": PC coordinates of perturbed simulated trajectory
For more control over choosing cells, number of steps, etc. please refer to CLI documentation.