Normalisation and dimensionality reduction of scRNAseq gene counts

normalise_dimreduce is a wrapper function around the Seurat basic data processing workflow to generate normalised gene count data, its dimensionality reduction, and derivation of cell clusters.

Usage

normalise_dimreduce(
  obj,
  var_explained_lim = 0.015,
  run.harmony = FALSE,
  harmony_vars = NULL,
  SCT = FALSE,
  mt.pattern = "^MT-",
  mt.percent = 10,
  features_exclude = c("^IGH[MDE]", "^IGHG[1-4]", "^IGHA[1-2]", "^IG[HKL][VDJ]",
    "^IGKC", "^IGLC[1-7]", "^TR[ABGD][CV]", "^AC233755.1", "^IGLL", "^JCHAIN"),
  ...
)

Arguments

obj: Seurat object with the gene counts unnormalised.
var_explained_lim: numeric, the minimum proportion of variance explained cutoff for a principal component to be included in the dimensionality reduction and clustering steps (default: 0.015, i.e. 1.5%)
run.harmony: should the package Harmony be used on the data? (Default: FALSE)
harmony_vars: vector of parameters to be included in the regression step in Harmony. Variations specific to these parameters will be removed during the Harmony run.
SCT: Should the SCTransform pipeline be used? If not it will follow the standard Seurat normalisation workflow (NormalizeData, FindVariableFeatures, ScaleData)
mt.pattern: the regular expression used to identify mitochondrial transcripts (Default: ^MT-", i.e. all gene names beginning with "MT-")
mt.percent: the cutoff for mitochondrial transcript percentage, above which cells will be removed from the Seurat project as part of quality control (Default: 10, i.e. cells with more than 10% of counts mapped to mitochondrial transcripts will be removed from the Seurat object)
features_exclude: a vector of regular expressions to select genes to be IGNORED during dimensionality reduction and clustering. By default the following features were included in this list: IgH, K, L V/D/JC genes, TRA/TRB V/C genes, AC233755.1 (which encodes a V-gene-like product), IGLL, JCHAIN)
...: Arguments to be passed to various Seurat functions (SCTransform, NormalizeData, FindVariableFeatures, ScaleData, RunPCA, RunUMAP, FindNeighbors, FindClusters)

Value

A Seurat object with normalised gene count data, dimensionality reduction and clustering done

Details

normalise_dimreduce is a wrapper function around the major basic Seurat data processing workflow and performs the following steps:

Calculation of % mitochondrial transcripts (PercentageFeatureSet) and subsetting to remove those beyond the cutoff given by mt.percent.
Gene count normalisation, using either SCTransform or NormalizeData
Pruning variably expressed features. All genes with names matching the vector of regular expression given in the argument features_exclude will be removed from this list to avoid them influencing the downstream dimensionality reduction and clustering steps. This is particularly relevant for avoiding clusters of B cells grouped by their isotypes/VDJ expression.
Principal component analysis (PCA) (Seurat::RunPCA function)
Batch correction using Harmony: covariates given in harmony_vars will be removed in the Harmony regression step. (Optional, if run.harmony == TRUE)
UMAP dimensionality reduction: Seurat::RunUMAP, retaining the top principal components, each of which explain at least var_explained_lim of the variance.
k-neighbor network (kNN) construction (Seurat::FindNeighbors)
Define cell clusters based on kNN graph (Seurat::FindClusters)