Inputs Description

Input files and parameters required to run workflow

Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/

Parameter Used by Tools

Common Parameters Across Tools

Argument Name

Summary

Default Value

sequencing-center

The sequencing center from which the data originated

MSKCC

sample

The name of the sequenced sample.(Required)

run-date

Date the run was produced, to insert into the read group header (Iso8601Date)

read-group-id

Read group ID to use in the file header (Required)

platform-unit

Read-Group Platform Unit (eg. run barcode) (Required)

platform-model

Platform model to insert into the group header (ex. miseq, hiseq2500, hiseqX)

novaseq

platform

Read-Group platform (e.g. ILLUMINA, SOLID).

ILLUMINA

library

The name/ID of the sequenced library. (Required)

description

Description of the read group.

comment

Comments to include in the output file’s header.

validation_stringency

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values: STRICT or LENIENT or SILENT

LENIENT

sort_order

GATK: The order in which the reads should be output.

create_bam_index

GATK: Generate BAM index file when possible

reference_sequence

Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file

temporary_directory

Temporary directory to be used for all steps

fgbio_async_io

Fgbio asynchronous execution

Uncollapsed BAM Generation

Argument Name

Summary

Default Value

fgbio_fastq_to_bam_umi-tag

Tag in which to store molecular barcodes/UMIs.

fgbio_fastq_to_bam_sort

If true, query-name sort the BAM file, otherwise preserve input order.

fgbio_fastq_to_bam_input

Fastq files corresponding to each sequencing read ( e.g. R1, I1, etc.). Please refer to the template file to get this correct.

read-structures

Read structures, one for each of the FASTQs. Refer to the tool for more details

fgbio_fastq_to_bam_predicted-insert-size

Predicted median insert size, to insert into the read group header

fgbio_fastq_to_bam_output_file_name

The output SAM or BAM file to be written.

Argument Name

Summary

Default Value

gatk_merge_sam_files_output_file_name

SAM or BAM file to write the merged result to (Required)

merge_sam_files_sort_order

Sort order of output file

queryname

Picard SamToFastq

Argument Name

Summary

Default Value

unpaired_fastq_file

unpaired fastq output file name

UBG_picard_SamToFastq_R1_output_fastq

Read1 fastq.gz output file name for uncollapsed bam generation (Required)

UBG_picard_SamToFastq_R2_output_fastq

Read2 fastq.gz output file name for uncollapsed bam generation (Required)

BC_gatk_sam_to_fastq_output_name_R1

Read1 fastq.gz output file name for bam collapsing (Required)

BC_gatk_sam_to_fastq_output_name_R2

Read2 fastq.gz output file name for bam collapsing (Required)

gatk_sam_to_fastq_include_non_primary_alignments

If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.

gatk_sam_to_fastq_include_non_pf_reads

Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info.

Argument Name

Summary

Default Value

fastp_unpaired1_output_file_name

For PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it.

fastp_unpaired2_output_file_name

For PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If --unpaired2 is same as --unpaired1 (default mode), both unpaired reads will be written to this same file.

fastp_read1_adapter_sequence

the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.

GATCGGAAGAGC

fastp_read2_adapter_sequence

The adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as (string)

AGATCGGAAGAGC

fastp_read1_output_file_name

Read1 output File Name (Required)

fastp_read2_output_file_name

Read2 output File Name (Required)

fastp_minimum_read_length

reads shorter than length_required will be discarded

25

fastp_json_output_file_name

the json format report file name (Required)

fastp_html_output_file_name

the html format report file name (Required)

disable_trim_poly_g

Disable Poly-G trimming.

True

disable_quality_filtering

Disable base quality filtering.

True

Argument Name

Summary

Default Value

bwa_mem_Y

Force soft-clipping rather than default hard-clipping of supplementary alignments

True

bwa_mem_T

Don’t output alignment with score lower than INT. This option only affects output.

30

bwa_mem_P

In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.

UBG_bwa_mem_output

Output SAM file name for uncollapsed bam generation (Required)

BC_bwa_mem_output

Output SAM file name for bam collapsing (Required)

bwa_mem_M

Mark shorter split hits as secondary

bwa_mem_K

to achieve deterministic alignment results (Note: this is a hidden option)

1000000

bwa_number_of_threads

Number of threads

Argument Name

Summary

Default Value

UBG_picard_addRG_output_file_name

Output BAM file name for uncollapsed bam generation (Required)

BC_picard_addRG_output_file_name

Output BAM file name for bam collapsing (Required)

picard_addRG_sort_order

Sort order for the BAM file

queryname

Argument Name

Summary

Default Value

UBG_gatk_merge_bam_alignment_output_file_name

Output BAM file name for uncollapsed bam generation (Required)

BC_gatk_merge_bam_alignment_output_file_name

Output BAM file name for bam collapsing (Required)

Argument Name

Summary

Default Value

optical_duplicate_pixel_distance

The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is more appropriate. For other platforms and models, users should experiment to find what works best.

2500

read_name_regex

Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values.

duplicate_scoring_strategy

The scoring strategy for choosing the non-duplicate among candidates.

gatk_mark_duplicates_output_file_name

The output file to write marked records to (Required)

gatk_mark_duplicates_duplication_metrics_file_name

File to write duplication metrics to (Required)

gatk_mark_duplicates_assume_sort_order

If not null, assume that the input file has this order even if the header says otherwise.

bedtools genomecov

Argument Name

Summary

Default Value

bedtools_genomecov_option_bedgraph

option flag parameter to choose output file format. -bg refers to bedgraph format

True

bedtools merge

Argument Name

Summary

Default Value

bedtools_merge_distance_between_features

Maximum distance between features allowed for features to be merged.

10

Argument Name

Summary

Default Value

abra2_window_size

Processing window size and overlap (size,overlap)

"400,200"

abra2_soft_clip_contig

Soft clip contig args [maxcontigs,min_base_qual,frac high_qual_bases,min_soft_clip_len]

"16,13,80,15"

abra2_scoring_gap_alignments

Scoring used for contig alignments(match, mismatch_penalty,gap_open_penalty,gap_extend_penalty)

"8,32,48,1"

abra2_no_sort

Do not attempt to sort final output

True

abra2_no_edge_complex_indel

Prevent output of complex indels at read start or read end

True

abra2_maximum_mixmatch_rate

Max allowed mismatch rate when mapping reads back to contigs

0.1

abra2_maximum_average_depth

Regions with average depth exceeding this value will be down-sampled

1000

abra2_contig_anchor

Contig anchor [M_bases_at_contig_edge,max_mismatches_near_edge]

"10,2"

abra2_consensus_sequence

Use positional consensus sequence when aligning high quality soft clipping

BC_abra2_output_bams

The output BAM file to write to (Required)

UBG_abra2_output_bams

The output BAM file to write to (Required)

Argument Name

Summary

Default Value

UBG_picard_fixmateinformation_output_file_name

The output BAM file to write to for uncollapsed bam generation (Required)

BC_picard_fixmate_information_output_file_name

The output BAM file to write to for bam collapsing (Required)

Base Quality Score Recalibration

Argument Name

Summary

Default Value

gatk_base_recalibrator_known_sites

One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis (Required)

gatk_bqsr_read_filter

Read filters to be applied before analysis

base_recalibrator_output_file_name

The output recalibration table file to create (Required)

Argument Name

Summary

Default Value

apply_bqsr_output_file_name

The output BAM file (Required)

gatk_bqsr_disable_read_filter

Read filters to be disabled before analysis

Collapsed BAM Generation

Argument Name

Summary

Default Value

fgbio_group_reads_by_umi_input

The input BAM file

fgbio_group_reads_by_umi_strategy

The UMI assignment strategy. (identity, edit, adjacency, paired)

paired

fgbio_group_reads_by_umi_raw_tag

The tag containing the raw UMI.

RX

fgbio_group_reads_by_umi_output_file_name

The output BAM file name (Required)

fgbio_group_reads_by_umi_min_umi_length

The minimum UMI length. If not specified then all UMIs must have the same length, otherwise, discard reads with UMIs shorter than this length and allow for differing UMI lengths.

fgbio_group_reads_by_umi_include_non_pf_reads

Include non-PF reads.

False

fgbio_group_reads_by_umi_family_size_histogram

Optional output of tag family size counts. (Required)

Give a file name. ex: samplename.hist

fgbio_group_reads_by_umi_edits

The allowable number of edits between UMIs.

1

fgbio_group_reads_by_umi_assign_tag

The output tag for UMI grouping.

MI

Argument Name

Summary

Default Value

fgbio_collect_duplex_seq_metrics_intervals

Optional set of intervals over which to restrict analysis.

fgbio_collect_duplex_seq_metrics_output_prefix

Prefix of output files to write.

fgbio_collect_duplex_seq_metrics_min_ba_reads

Minimum BA reads to call a tag family a ‘duplex’.

fgbio_collect_duplex_seq_metrics_min_ab_reads

Minimum AB reads to call a tag family a ‘duplex’.

fgbio_collect_duplex_seq_metrics_mi_tag

The output tag for UMI grouping.

MI

fgbio_collect_duplex_seq_metrics_duplex_umi_counts

If true, produce the .duplex_umi_counts.txt file with counts of duplex UMI observations.

True

fgbio_collect_duplex_seq_metrics_description

Description of data set used to label plots. Defaults to sample/library.

Argument Name

Summary

Default Value

fgbio_call_duplex_consensus_reads_trim

If true, quality trim input reads in addition to masking low Q bases.

fgbio_call_duplex_consensus_reads_sort_order

The sort order of the output, if :none: then the same as the input.

fgbio_call_duplex_consensus_reads_read_name_prefix

The prefix all consensus read names

fgbio_call_duplex_consensus_reads_read_group_id

The new read group ID for all the consensus reads.

fgbio_call_duplex_consensus_reads_output_file_name

Output SAM or BAM file to write consensus reads.

fgbio_call_duplex_consensus_reads_min_reads

The minimum number of input reads to a consensus read.

1 1 0

fgbio_call_duplex_consensus_reads_min_input_base_quality

Ignore bases in raw reads that have Q below this value.

fgbio_call_duplex_consensus_reads_max_reads_per_strand

The maximum number of reads to use when building a single-strand consensus. If more than this many reads are present in a tag family, the family is randomly downsampled to exactly max-reads reads.

fgbio_call_duplex_consensus_reads_error_rate_pre_umi

The Phred-scaled error rate for an error prior to the UMIs being integrated.

fgbio_call_duplex_consensus_reads_error_rate_post_umi

The Phred-scaled error rate for an error post the UMIs have been integrated.

Argument Name

Summary

Default Value

fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex

Reverse [complement] per base tags on reverse strand reads.- Simplex+Duplex

fgbio_filter_consensus_read_reverse_per_base_tags_duplex

Reverse [complement] per base tags on reverse strand reads. - Duplex

fgbio_filter_consensus_read_require_single_strand_agreement_simplex_duplex

Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).

fgbio_filter_consensus_read_require_single_strand_agreement_duplex

Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).

fgbio_filter_consensus_read_max_base_error_rate_duplex

The maximum error rate for a single consensus base. (Max 3 values) - Duplex

fgbio_filter_consensus_read_max_base_error_rate_simplex_duplex

The maximum error rate for a single consensus base. (Max 3 values) - Simplex + Duplex

fgbio_filter_consensus_read_max_no_call_fraction_duplex

Maximum fraction of no-calls in the read after filtering - Duplex

fgbio_filter_consensus_read_max_read_error_rate_duplex

The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Duplex

fgbio_filter_consensus_read_max_no_call_fraction_simplex_duplex

Maximum fraction of no- calls in the read after filtering - Simplex + Duplex

fgbio_filter_consensus_read_max_read_error_rate_simplex_duplex

The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Simplex + Duplex

fgbio_filter_consensus_read_min_base_quality_duplex

Mask (make N) consensus bases with quality less than this threshold. - Duplex

fgbio_filter_consensus_read_min_base_quality_simplex_duplex

Mask (make N) consensus bases with quality less than this threshold. - Simplex+Duplex

fgbio_filter_consensus_read_min_mean_base_quality_duplex

The minimum mean base quality across the consensus read - Duplex

fgbio_filter_consensus_read_min_mean_base_quality_simplex_duplex

The minimum mean base quality across the consensus read - Simplex + Duplex

fgbio_filter_consensus_read_min_reads_duplex

The minimum number of reads supporting a consensus base/read. (Max 3 values) - Duplex

2, 1, 1

fgbio_filter_consensus_read_min_reads_simplex_duplex

The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex

3, 3, 0

fgbio_filter_consensus_read_output_file_name_simplex_duplex

Output BAM file name Simplex + Duplex (Required)

fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics

Output file name Duplex alignment metrics

fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics

Output file name Simplex alignment metrics

fgbio_filter_consensus_read_output_file_name_duplex

Output BAM file name - Duplex (Required)

fgbio_filter_consensus_read_min_simplex_reads

The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex

Argument Name

Summary

Default Value

fgbio_postprocessing_output_file_name_simplex

Output BAM file name Simplex (Required)

Argument Name

Summary

Default Value

gatk_collect_alignment_summary_metrics_output_file_name

Output file name for metrics on collapsed BAM (Duplex+Simplex+Singletons)

Template Inputs File

inputs.yaml
BC_abra2_output_bams: null
BC_bwa_mem_output: null
BC_gatk_merge_bam_alignment_output_file_name: null
BC_gatk_sam_to_fastq_output_name_R1: null
BC_gatk_sam_to_fastq_output_name_R2: null
BC_picard_addRG_output_file_name: null
BC_picard_fixmate_information_output_file_name: null
UBG_abra2_output_bams: null
UBG_bwa_mem_output: null
UBG_gatk_merge_bam_alignment_output_file_name: null
UBG_picard_SamToFastq_R1_output_fastq: null
UBG_picard_SamToFastq_R2_output_fastq: null
UBG_picard_addRG_output_file_name: null
UBG_picard_fixmateinformation_output_file_name: null
abra2_bam_index: null
abra2_consensus_sequence: null
abra2_contig_anchor: null
abra2_maximum_average_depth: null
abra2_maximum_mixmatch_rate: null
abra2_no_edge_complex_indel: null
abra2_scoring_gap_alignments: null
abra2_soft_clip_contig: null
abra2_window_size: null
apply_bqsr_output_file_name: null
base_recalibrator_output_file_name: null
bedtools_genomecov_option_bedgraph: null
bedtools_merge_distance_between_features: null
bwa_mem_K: null
bwa_mem_T: null
bwa_mem_Y: null
create_bam_index: null
fastp_html_output_file_name: null
fastp_json_output_file_name: null
fastp_minimum_read_length: null
fastp_read1_adapter_sequence: null
fastp_read1_output_file_name: null
fastp_read2_adapter_sequence: null
fastp_read2_output_file_name: null
fgbio_async_io: null
fgbio_call_duplex_consensus_reads_min_reads: null
fgbio_call_duplex_consensus_reads_output_file_name: null
fgbio_collect_duplex_seq_metrics_duplex_umi_counts: null
fgbio_collect_duplex_seq_metrics_intervals: null
fgbio_collect_duplex_seq_metrics_output_prefix: null
fgbio_fastq_to_bam_input: null
fgbio_filter_consensus_read_min_base_quality_duplex: null
fgbio_filter_consensus_read_min_base_quality_simplex_duplex: null
fgbio_filter_consensus_read_min_reads_duplex: null
fgbio_filter_consensus_read_min_reads_simplex_duplex: null
fgbio_filter_consensus_read_output_file_name_duplex: null
fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics: null
fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics: null
fgbio_filter_consensus_read_output_file_name_simplex_duplex: null
fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex: null
fgbio_group_reads_by_umi_family_size_histogram: null
fgbio_group_reads_by_umi_output_file_name: null
fgbio_group_reads_by_umi_strategy: null
fgbio_postprocessing_output_file_name_simplex: null
gatk_base_recalibrator_add_output_sam_program_record: null
gatk_base_recalibrator_known_sites:
  - class: File
    metadata: {}
    path: >-
      /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/dbsnp_137_14_16.b37.vcf
    secondaryFiles:
      - class: File
        path: >-
          /Users/shahr2/Documents/test_reference/test_nucleo/known_sites/dbsnp_137_14_16.b37.vcf.idx
  - class: File
    metadata: {}
    path: >-
      /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/Mills_and_1000G_gold_standard-14_16.indels.b37.vcf
    secondaryFiles:
      - class: File
        path: >-
          /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/Mills_and_1000G_gold_standard-14_16.indels.b37.vcf.idx
gatk_collect_alignment_summary_metrics_output_file_name: null
gatk_mark_duplicates_duplication_metrics_file_name: null
gatk_mark_duplicates_output_file_name: null
gatk_merge_sam_files_output_file_name: null
library: null
merge_sam_files_sort_order: null
optical_duplicate_pixel_distance: null
picard_addRG_sort_order: null
platform: null
platform-model: null
platform-unit: null
read-group-id: null
read-structures: null
reference_sequence:
  class: File
  metadata: {}
  path: /Users/shahr2/Documents/test_reference/fasta/chr14_chr16.fasta
  secondaryFiles:
    - class: File
      path: ../../test_reference/fasta/chr14_chr16.fasta.amb
    - class: File
      path: ../../test_reference/fasta/chr14_chr16.fasta.ann
run-date: null
sample: null
sequencing-center: null
sort_order: null
temporary_directory: null
validation_stringency: null

Last updated