1.4 Preparing a Mapping File
Mapping File (Tab-delimited .txt): The mapping file is generated by the user, e.g. My_Project.txt. This file contains all of the information about the samples necessary to perform the data analysis.
At a minimum, the mapping file should contain:
- The first column header must be “#SampleID”.
- The second column header must be “BarcodeSequence”. Cells can be empty if not available.
- The third column header must be “LinkerPrimerSequence”.
- All subsequent column headers (except the last one) are metadata headers. For example, a “Smoker” column would include either “Yes” or “No”. Note that the data in each column is assumed to be categorical unless specified otherwise. Categorical data columns must include at least 2 unique values per column. For missing data, write “NA”; do not leave blanks.
- The last column of the mapping file must be named “Description”. Information in this column includes information that is unique to each sample, such as the medications taken by the patient, or any other descriptive information.
Example 1:
#SampleID | BarcodeSequence | LinkerPrimerSequence | ReversePrimer | region | Visit | Patient | Description |
---|---|---|---|---|---|---|---|
101V2 | TGATACGTCT | agagtttgatcmtggctcag | gcwgcctcccgtaggagt | V1V2 | V2 | 101V2 | No_treatment |
Example 2:
#SampleID | BarcodeSequence | LinkerPrimerSequence | InputFileName | Description |
---|---|---|---|---|
EB10 | EB10.fasta | Horse10 |
Check for errors in mapping file. The output of the command is an interactive .HTML file displaying any errors found. Validating Mapping Files Without Barcodes and/or Primers (The mapping file will still show a warning-as it is lacking any barcodes, it has no way to differentiate sequences, and thus cannot be used for demultiplexing. However, such warnings can be ignored if the mapping file is being used for steps downstream of demultiplexing.)
validate_mapping_file.py -m map.txt -o validate_map -p -b
-m, --mapping_fp Metadata mapping filepath
-o, --output_dir :Required output directory for log file, corrected mapping file, and html file.
-b, --not_barcoded: Use -b if barcodes are not present. BarcodeSequence header still required. [default: False]
-p, --disable_primer_check : Use -p to disable checks for primers. LinkerPrimerSequence header still required. [default: False]