Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/
The sequencing center from which the data originated
sample
The name of the sequenced sample.
run-date
Date the run was produced, to insert into the read group header (Iso8601Date)
read-group-id
Read group ID to use in the file header
platform-unit
Read-Group Platform Unit (eg. run barcode)
platform-model
Platform model to insert into the group header (ex. miseq, hiseq2500, hiseqX)
platform
Read-Group platform (e.g. ILLUMINA, SOLID).
library
The name/ID of the sequenced library.
description
Description of the read group.
comment
Comments to include in the output file’s header.
validation_stringency
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values: STRICT or LENIENT or SILENT
sort_order
GATK: The order in which the reads should be output.
create_bam_index
GATK: Generate BAM index file when possible
reference_sequence
Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file
Argument Name
Summary
Default Value
fgbio_fastq_to_bam_umi-tag
Tag in which to store molecular barcodes/UMIs.
fgbio_fastq_to_bam_sort
If true, query-name sort the BAM file, otherwise preserve input order.
fgbio_fastq_to_bam_input
Fastq files corresponding to each sequencing read ( e.g. R1, I1, etc.). Please refer to the template file to get this correct.
Read structures, one for each of the FASTQs. Refer to the tool for more details
fgbio_fastq_to_bam_predicted-insert-size
Predicted median insert size, to insert into the read group header
fgbio_fastq_to_bam_output_file_name
The output SAM or BAM file to be written.
Argument Name
Summary
Default Value
gatk_merge_sam_files_output_file_name
SAM or BAM file to write the merged result to
merge_sam_files_sort_order
Sort order of output file
Argument Name
Summary
Default Value
unpaired_fastq_file
unpaired fastq output file name
R1_output_fastq
Read1 fastq.gz output file name
R2_output_fastq
Read2 fastq.gz output file name
gatk_sam_to_fastq_include_non_primary_alignments
If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.
gatk_sam_to_fastq_include_non_pf_reads
Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info.
Argument Name
Summary
Default Value
fastp_unpaired1_output_file_name
For PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it.
fastp_unpaired2_output_file_name
For PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If --unpaired2 is same as --unpaired1 (default mode), both unpaired reads will be written to this same file.
fastp_read1_adapter_sequence
the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.
fastp_read2_adapter_sequence
The adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as (string)
AGATCGGAAGAGC
fastp_read1_output_file_name
Read1 output File Name
1
fastp_read2_output_file_name
Read2 output File Name
fastp_minimum_read_length
reads shorter than length_required will be discarded
15
fastp_json_output_file_name
the json format report file name
fastp_html_output_file_name
the html format report file name
fastp_failed_reads_output_file_name
specify the file to store reads that cannot pass the filters.
Argument Name
Summary
Default Value
bwa_mem_Y
Force soft-clipping rather than default hard-clipping of supplementary alignments
bwa_mem_T
Don’t output alignment with score lower than INT. This option only affects output.
bwa_mem_P
In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.
bwa_mem_output
Output SAM file name
bwa_mem_M
Mark shorter split hits as secondary
bwa_mem_K
to achieve deterministic alignment results (Note: this is a hidden option)
bwa_number_of_threads
Number of threads
Argument Name
Summary
Default Value
picard_addRG_output_file_name
Output BAM file name
picard_addRG_sort_order
Sort order for the BAM file
Argument Name
Summary
Default Value
gatk_merge_bam_alignment_output_file_name
Output BAM file name
Argument Name
Summary
Default Value
optical_duplicate_pixel_distance
The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is more appropriate. For other platforms and models, users should experiment to find what works best.
read_name_regex
Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values.
duplicate_scoring_strategy
The scoring strategy for choosing the non-duplicate among candidates.