Fit Transition Path Theory (TPT) on the cellrank transition models

fitTPT fits Transition Path Theory on the transition model defined using fitTransitionModel(), at a 'coarse-grained' level where transitions are considered between *groups* of cells with grouping indicated by the user.

Usage

fitTPT(
  anndata_file,
  CellrankObj,
  group.cells.by,
  source_state,
  target_state,
  conda_env = "scicsr",
  random_n = 100,
  do_pca = TRUE,
  do_neighbors = TRUE
)

Arguments

anndata_file: filename pointing to the AnnData file.
CellrankObj: the cellrank_obj entry of the output list of fitTransitionModel().
group.cells.by: character, column in the metadata to group cells
source_state: character, a value in the group.cells.by column which is taken as the source state for fitting transition path theory. All cells belonging to this group are considered as the source.
target_state: character, a value in the group.cells.by column which is taken as the target state for fitting transition path theory. All cells belonging to this group are considered as the target.
conda_env: character, if not NULL this named conda environment is used to perform TPT analysis. (Default: NULL, i.e. no conda environment will be used, the program assumes the python packages scanpy, scvelo and cellrank are installed in the local python)
random_n: number of times to reshuffle transition matrix columns to derive randomised models (default: 100).
do_pca: Should principal component analysis (PCA) be re-computed on the data? (Default: TRUE)
do_neighbors: Should k-nearest neighbour (kNN) graph be re-computed on the data? (Default: TRUE)

Value

a list with these entries:

gross_flux: a n-by-n matrix (where n is the total number of states), of total fluxes estimated between from a state (row) to another state (column).
pathways: a data.frame indicating the possible paths to take from source_state to target_state, and the likelihood (max: 100) to travel through each stated path.
significance: a n-by-n matrix (where n is the total number of states), where the observed gross flux is greater than the flux estimated in the randomised models.
total_gross_flux: element-wise sum of the gross_flux matrix.
total_gross_flux_reshuffled: element-wise sum of the gross_flux matrix, calculated over each randomised (randomly reshuffled transition matrix coluns) models.
gross_flux_randomised: gross_flux matrix but from the randomised (randomly reshuffled transition matrix coluns) TPT models.
mfpt: Mean First Passage Time required to travel from source_state to target_state as estimated by Transition Path Theory.
mfpt_reshuffled: Mean First Passage Time required to travel from source_state to target_state as estimated by Transition Path Theory, calculated over each randomised (randomly reshuffled transition matrix coluns) models.
stationary_distribution: Equilibrium probability of each state as estimated by Transition Path Theory.
stationary_distribution_reshuffled: Equilibrium probability of each state as estimated by Transition Path Theory, calculated over each randomised (randomly reshuffled transition matrix coluns) models.

Details

fitTPT interfaces with (and reimplements some routines to improve efficincy) the Python deeptime package to fit transition path theory (TPT) onto the markov state model defined by running the fitTransitionModel function that uses cellrank under the hood. With the parameter group.cells.by, the user specifies a scheme to group individual row/columns of the transition matrix (for example, by cell type or by isotype). The function then fits TPT on to this grouped/'coarse-grained' transition matrix, upon user indicating a likely 'source' and 'target' state. The output are estimated information flows ('flux') between different states in order to flow from the source to the target, and the probabilities of sampling each state at equilibrium ('stationary distribution'). A random 'null background' model was fitted by randomly reshuffling columns of the transition matrix by random_n (default: 100) times. These random fluxes help determine the significance of an observed flux, by calculating one-sided empirical probabilities of the observed flux larger than the that observed in the randomised models.