Skip to contents

collapseIntoMetagenes defines metagenes which sum over the gene counts mapped to a group of genes, in a single-cell gene expression count matrix.

Usage

collapseIntoMetagenes(
  countmat,
  metagenes_definitions = c(RIBO = "^RP[LS]|^MRP[LS]", `HLA-Imaj` = "^HLA-[ABC]$",
    `HLA-Imin` = "^HLA-[EFG]$", `HLA-II` = "^HLA-D", VDJ = "^IG[HKL][VDJ][0-9]")
)

Arguments

countmat

sparse matrix containing single-cell gene expression data. Output from Seurat::Read10X or equivalent.

metagenes_definitions

a vector containing regular expressions to match gene names in the row names of countmat. For each regular expression, matched genes will be summarised into one metagene (see Details). (Default: individual metagenes for ribosomal, HLA I-major, HLA I-minor, HLA II and Ig VDJ transcripts.)

Value

a sparse matrix containing count data where all the matched genes are collapsed into metagenes with names given by the names of each element in metagenes_definitions.

Details

collapseIntoMetagenes can be used to group transcript counts into metagenes, to remove the effect of e.g. individual variations which leads to preference of specific genes. One example is the immunoglobulin VDJ genes whose expression is specific to each B cell and is indicative of clonotype rather than cell state. By summing over all individual VDJ genes into one metagene, this avoids those individual genes to influence the downstream dimensionality projection and clustering results.