score cells by their Class Switch Recombination (CSR) status

getCSRpotential scores each cell by their status in temrs of class switch recombination (CSR), by considering the mapped sterile and productive IgH transcripts.

Usage

getCSRpotential(
  SeuratObj,
  ighc_count_assay_name = "IGHC",
  ighc_slot = "scale.data",
  knn_graph = TRUE,
  reference_based = NULL,
  vars.to.regress = c("nCount_RNA"),
  mode = "furthest",
  c_gene_anno_name = NULL,
  isotype_column_to_add = "isotype"
)

Arguments

SeuratObj: Seurat Object
ighc_count_assay_name: name of assay in SeuratObj which holds the IgH productive/sterile transcript count data. (Default: "IGHC")
ighc_slot: the slot in slot(SeuratObj, "assays")[[ighc_count_assay_name]] to be used to access productive/sterile transcript counts (Default: "scale_data")
knn_graph: should the k-nearest neighbour graph calculated on the gene expression assay be used to impute the annotation of productive transcripts for cells where no such transcripts are found across all isotypes? If TRUE, majority voting on the direct neighbours of the cell in the kNN graph will be used to impute. Otherwise, the cell will be assume to express IgM productive transcript. Expects TRUE or FALSE, or a igraph object containing kNN graph (in which case this graph will be used for majority voting imputation). (Default: TRUE)
reference_based: indicate the species. The function will use a naive isotype signature (sterile/productive gene counts) trained on reference B cell atlas for the given species. For now either 'human' or 'mouse' are accepted. If NULL, the function calculates CSR potential by taking the Euclidean norm of (representative_p, total_s) (see Details).
vars.to.regress: list of variables to be regressed out when scaling the sterile/productive count matrix, if ighc_slot is given as scale.data but it has not been populated. (Default: "nCount_RNA", i.e. per-cell library size)
mode: (Only applicable if c_gene_anno_type is NULL.) Interpretation of the isotype expressed by the cell. Either "furthest" (i.e. the isotype furthest along the IGH locus with non-zero expression of productive transcript will be taken as the isotype representative of the cell) or "highest" (the isotype with highest expression). (Default: "furthest")
c_gene_anno_name: If not NULL, this column from the Seurat Object meta.data will be used to indicate representative_p in calculaing the CSR potential score, in lieu of the productive transcript counts in the IGHC assay (Default: NULL)
isotype_column_to_add: name of column to be added to the SeuratObj meta.data to indicate the isotype of the cell. Used for subsequent grouping of cells in calculating transitions.

Value

Seurat object with these following columns added to the meta.data slot:

representative_p: an integer indicating the productive isotype for each cell (in human: 0 = IgM, 1 = IgG3 ... )
total_s: amount of sterile IgH molecules for each cell, calculated from the given ighc_slot of the IGHC assay.
csr_pot: CSR potential. Depending on the argument reference_based the method of calculation will be different (see Details).
isotype_column_to_add: isotype labelled as M, G3, etc. (added only when c_gene_anno_name is FALSE and the ighc_count_assay_name Assay is used to calculate CSR potential.

Details

getCSRpotential calculates a "CSR potential" score which ranks the cells in the given Seurat Obj by their status in the CSR process. The default is to calculate this by estimating the contribution (weight) of a 'naive' isotype signature for each cell, given its sterile/productive expression profile. The CSR potential will be 1 - (Naive signature weight). This method is available for either human or mouse for which isotype signatures were trained on reference B cell atlas data. Alternatively, CSR potential can also be calculated empirically (by setting reference_based = NULL), given by the Euclidean norm of (representative_p, total_s) (i.e.\( \sqrt{ \text{representative_p}^2 + \text{total_s}^2} \) ), where

representative_p: the productive isotype for each cell (in human this will be 0 = IgM, 1 = IgG3 ... ), and
total_s: amount of sterile IgH molecules for each cell.

For total_s, the default is to use the scale.data slot which already normalises the IGHC counts by library size. If this doesn't exist the function will calculate this while regressing out the library size. For representative_p, users can either use a specified column in the Seurat object meta.data which indicates the isotype of the cell, or, if not provided, used the productive reads counted using the productive/sterile quantification workflow implemented in this package (see the argument mode of this function).