parse the substring inside a given cell identifier which corresponds to the nucleotide barcode
guessBarcodes.Rd
guessBarcodes
parses given cell identifiers to identify the substring which correspond to the nucleotide barcode included in the experiment.
Arguments
- cell_name
character, a cell identifier, typicall with prefix and/or suffix (e.g. "ACTGATGCAT-1", "SampleA_ATGAACCTATGG")
- min_barcode_length
integer, minimum length of the nucleotide barcode (Default: 6)
Value
a vector with the input cell_name
decomposed into these three entries:
- prefix
prefix which exists in the input
cell_name
(NA
if doesn't exist incell_name
)- cell_name
the actual nucleotide barcode
- suffix
suffix which exists in the input
cell_name
(NA
if doesn't exist incell_name
)
Details
Numeric / string prefices/suffices were typically added to cell identifiers to avoid wrong mapping across samples; however often these create issues when trying to merge data on the *same* sample but annotated using different workflows. This function attempts to resolve such issues by extracting the nucleotide barcodes actually introduced in the experiment.