Normalisation and dimensionality reduction of scRNAseq gene counts
normalise_dimreduce.Rd
normalise_dimreduce
is a wrapper function around the Seurat basic data processing workflow to generate normalised gene count data, its dimensionality reduction, and derivation of cell clusters.
Usage
normalise_dimreduce(
obj,
var_explained_lim = 0.015,
run.harmony = FALSE,
harmony_vars = NULL,
SCT = FALSE,
mt.pattern = "^MT-",
mt.percent = 10,
features_exclude = c("^IGH[MDE]", "^IGHG[1-4]", "^IGHA[1-2]", "^IG[HKL][VDJ]", "^IGKC",
"^IGLC[1-7]", "^TR[ABGD][CV]", "^AC233755.1", "^IGLL", "^JCHAIN"),
...
)
Arguments
- obj
Seurat object with the gene counts unnormalised.
- var_explained_lim
numeric, the minimum proportion of variance explained cutoff for a principal component to be included in the dimensionality reduction and clustering steps (default: 0.015, i.e. 1.5%)
- run.harmony
should the package
Harmony
be used on the data? (Default: FALSE)- harmony_vars
vector of parameters to be included in the regression step in
Harmony
. Variations specific to these parameters will be removed during theHarmony
run.- SCT
Should the
SCTransform
pipeline be used? If not it will follow the standard Seurat normalisation workflow (NormalizeData
,FindVariableFeatures
,ScaleData
)- mt.pattern
the regular expression used to identify mitochondrial transcripts (Default: ^MT-", i.e. all gene names beginning with "MT-")
- mt.percent
the cutoff for mitochondrial transcript percentage, above which cells will be removed from the Seurat project as part of quality control (Default: 10, i.e. cells with more than 10% of counts mapped to mitochondrial transcripts will be removed from the Seurat object)
- features_exclude
a vector of regular expressions to select genes to be IGNORED during dimensionality reduction and clustering. By default the following features were included in this list: IgH, K, L V/D/JC genes, TRA/TRB V/C genes, AC233755.1 (which encodes a V-gene-like product), IGLL, JCHAIN)
- ...
Arguments to be passed to various Seurat functions (
SCTransform
,NormalizeData
,FindVariableFeatures
,ScaleData
,RunPCA
,RunUMAP
,FindNeighbors
,FindClusters
)
Details
normalise_dimreduce
is a wrapper function around the major basic Seurat data processing workflow and performs the following steps:
Calculation of % mitochondrial transcripts (
PercentageFeatureSet
) and subsetting to remove those beyond the cutoff given bymt.percent
.Gene count normalisation, using either
SCTransform
orNormalizeData
Pruning variably expressed features. All genes with names matching the vector of regular expression given in the argument
features_exclude
will be removed from this list to avoid them influencing the downstream dimensionality reduction and clustering steps. This is particularly relevant for avoiding clusters of B cells grouped by their isotypes/VDJ expression.Principal component analysis (PCA) (
Seurat::RunPCA
function)Batch correction using
Harmony
: covariates given inharmony_vars
will be removed in the Harmony regression step. (Optional, ifrun.harmony == TRUE
)UMAP dimensionality reduction:
Seurat::RunUMAP
, retaining the top principal components, each of which explain at leastvar_explained_lim
of the variance.k-neighbor network (kNN) construction (
Seurat::FindNeighbors
)Define cell clusters based on kNN graph (
Seurat::FindClusters
)