score cells by their Class Switch Recombination (CSR) status
getCSRpotential.Rd
getCSRpotential
scores each cell by their status in temrs of class switch recombination (CSR), by considering the mapped sterile and productive IgH transcripts.
Usage
getCSRpotential(
SeuratObj,
ighc_count_assay_name = "IGHC",
ighc_slot = "scale.data",
knn_graph = TRUE,
reference_based = NULL,
vars.to.regress = c("nCount_RNA"),
mode = "furthest",
c_gene_anno_name = NULL,
isotype_column_to_add = "isotype"
)
Arguments
- SeuratObj
Seurat Object
- ighc_count_assay_name
name of assay in
SeuratObj
which holds the IgH productive/sterile transcript count data. (Default: "IGHC")- ighc_slot
the slot in
slot(SeuratObj, "assays")[[ighc_count_assay_name]]
to be used to access productive/sterile transcript counts (Default: "scale_data")- knn_graph
should the k-nearest neighbour graph calculated on the gene expression assay be used to impute the annotation of productive transcripts for cells where no such transcripts are found across all isotypes? If TRUE, majority voting on the direct neighbours of the cell in the kNN graph will be used to impute. Otherwise, the cell will be assume to express IgM productive transcript. Expects
TRUE
orFALSE
, or aigraph
object containing kNN graph (in which case this graph will be used for majority voting imputation). (Default: TRUE)- reference_based
indicate the species. The function will use a naive isotype signature (sterile/productive gene counts) trained on reference B cell atlas for the given species. For now either 'human' or 'mouse' are accepted. If
NULL
, the function calculates CSR potential by taking the Euclidean norm of (representative_p, total_s) (see Details).- vars.to.regress
list of variables to be regressed out when scaling the sterile/productive count matrix, if
ighc_slot
is given asscale.data
but it has not been populated. (Default: "nCount_RNA", i.e. per-cell library size)- mode
(Only applicable if c_gene_anno_type is NULL.) Interpretation of the isotype expressed by the cell. Either "furthest" (i.e. the isotype furthest along the IGH locus with non-zero expression of productive transcript will be taken as the isotype representative of the cell) or "highest" (the isotype with highest expression). (Default: "furthest")
- c_gene_anno_name
If not NULL, this column from the Seurat Object meta.data will be used to indicate
representative_p
in calculaing the CSR potential score, in lieu of the productive transcript counts in the IGHC assay (Default:NULL
)- isotype_column_to_add
name of column to be added to the SeuratObj meta.data to indicate the isotype of the cell. Used for subsequent grouping of cells in calculating transitions.
Value
Seurat object with these following columns added to the meta.data slot:
representative_p
: an integer indicating the productive isotype for each cell (in human: 0 = IgM, 1 = IgG3 ... )total_s
: amount of sterile IgH molecules for each cell, calculated from the givenighc_slot
of the IGHC assay.csr_pot
: CSR potential. Depending on the argumentreference_based
the method of calculation will be different (see Details).isotype_column_to_add
: isotype labelled as M, G3, etc. (added only when c_gene_anno_name is FALSE and the ighc_count_assay_name Assay is used to calculate CSR potential.
Details
getCSRpotential
calculates a "CSR potential" score which ranks the cells in the given Seurat Obj by their status in the CSR process. The default is to calculate this by estimating the contribution (weight) of a 'naive' isotype signature for each cell, given its sterile/productive expression profile.
The CSR potential will be 1 - (Naive signature weight). This method is available for either human or mouse for which isotype signatures were trained on reference B cell atlas data. Alternatively, CSR potential can also be calculated empirically (by setting reference_based = NULL
),
given by the Euclidean norm of (representative_p, total_s) (i.e.\( \sqrt{ \text{representative_p}^2 + \text{total_s}^2} \) ), where
representative_p
: the productive isotype for each cell (in human this will be 0 = IgM, 1 = IgG3 ... ), andtotal_s
: amount of sterile IgH molecules for each cell.
For total_s
, the default is to use the scale.data slot which already normalises the IGHC counts by library size. If this doesn't exist the function will calculate this while regressing out the library size.
For representative_p
, users can either use a specified column in the Seurat object meta.data which indicates the isotype of the cell, or, if not provided, used the productive reads counted using the productive/sterile quantification workflow implemented in this package (see the argument mode
of this function).