parse the substring inside a given cell identifier which corresponds to the nucleotide barcode

guessBarcodes parses given cell identifiers to identify the substring which correspond to the nucleotide barcode included in the experiment.

Usage

guessBarcodes(cell_name, min_barcode_length = 6L)

Arguments

cell_name: character, a cell identifier, typicall with prefix and/or suffix (e.g. "ACTGATGCAT-1", "SampleA_ATGAACCTATGG")
min_barcode_length: integer, minimum length of the nucleotide barcode (Default: 6)

Value

a vector with the input cell_name decomposed into these three entries:

prefix: prefix which exists in the input cell_name (NA if doesn't exist in cell_name)
cell_name: the actual nucleotide barcode
suffix: suffix which exists in the input cell_name (NA if doesn't exist in cell_name)

Details

Numeric / string prefices/suffices were typically added to cell identifiers to avoid wrong mapping across samples; however often these create issues when trying to merge data on the *same* sample but annotated using different workflows. This function attempts to resolve such issues by extracting the nucleotide barcodes actually introduced in the experiment.