Inputs Description
Input files and parameters required to run workflow
Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/
Parameter Used by Tools
Common Parameters Across Tools
Argument Name | Summary | Default Value |
sequencing-center | The sequencing center from which the data originated | MSKCC |
sample | The name of the sequenced sample.(Required) | |
run-date | Date the run was produced, to insert into the read group header (Iso8601Date) | |
read-group-id | Read group ID to use in the file header (Required) | |
platform-unit | Read-Group Platform Unit (eg. run barcode) (Required) | |
platform-model | Platform model to insert into the group header (ex. miseq, hiseq2500, hiseqX) | novaseq |
platform | Read-Group platform (e.g. ILLUMINA, SOLID). | ILLUMINA |
library | The name/ID of the sequenced library. (Required) | |
description | Description of the read group. | |
comment | Comments to include in the output file’s header. | |
validation_stringency | Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values: STRICT or LENIENT or SILENT | LENIENT |
sort_order | GATK: The order in which the reads should be output. | |
create_bam_index | GATK: Generate BAM index file when possible | |
reference_sequence | Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file | |
temporary_directory | Temporary directory to be used for all steps | |
fgbio_async_io | Fgbio asynchronous execution |
Uncollapsed BAM Generation
Fgbio FastqToBam
Argument Name | Summary | Default Value |
fgbio_fastq_to_bam_umi-tag | Tag in which to store molecular barcodes/UMIs. | |
fgbio_fastq_to_bam_sort | If true, query-name sort the BAM file, otherwise preserve input order. | |
fgbio_fastq_to_bam_input | Fastq files corresponding to each sequencing read ( e.g. R1, I1, etc.). Please refer to the template file to get this correct. | |
read-structures | Read structures, one for each of the FASTQs. Refer to the tool for more details | |
fgbio_fastq_to_bam_predicted-insert-size | Predicted median insert size, to insert into the read group header | |
fgbio_fastq_to_bam_output_file_name | The output SAM or BAM file to be written. |
Picard MergeSamFiles
Argument Name | Summary | Default Value |
gatk_merge_sam_files_output_file_name | SAM or BAM file to write the merged result to (Required) | |
merge_sam_files_sort_order | Sort order of output file | queryname |
Picard SamToFastq
Argument Name | Summary | Default Value |
unpaired_fastq_file | unpaired fastq output file name | |
UBG_picard_SamToFastq_R1_output_fastq | Read1 fastq.gz output file name for uncollapsed bam generation (Required) | |
UBG_picard_SamToFastq_R2_output_fastq | Read2 fastq.gz output file name for uncollapsed bam generation (Required) | |
BC_gatk_sam_to_fastq_output_name_R1 | Read1 fastq.gz output file name for bam collapsing (Required) | |
BC_gatk_sam_to_fastq_output_name_R2 | Read2 fastq.gz output file name for bam collapsing (Required) | |
gatk_sam_to_fastq_include_non_primary_alignments | If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments. | |
gatk_sam_to_fastq_include_non_pf_reads | Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info. |
Argument Name | Summary | Default Value |
fastp_unpaired1_output_file_name | For PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it. | |
fastp_unpaired2_output_file_name | For PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If --unpaired2 is same as --unpaired1 (default mode), both unpaired reads will be written to this same file. | |
fastp_read1_adapter_sequence | the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. | GATCGGAAGAGC |
fastp_read2_adapter_sequence | The adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as (string) | AGATCGGAAGAGC |
fastp_read1_output_file_name | Read1 output File Name (Required) | |
fastp_read2_output_file_name | Read2 output File Name (Required) | |
fastp_minimum_read_length | reads shorter than length_required will be discarded | 25 |
fastp_json_output_file_name | the json format report file name (Required) | |
fastp_html_output_file_name | the html format report file name (Required) | |
disable_trim_poly_g | Disable Poly-G trimming. | True |
disable_quality_filtering | Disable base quality filtering. | True |
Argument Name | Summary | Default Value |
bwa_mem_Y | Force soft-clipping rather than default hard-clipping of supplementary alignments | True |
bwa_mem_T | Don’t output alignment with score lower than INT. This option only affects output. | 30 |
bwa_mem_P | In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair. | |
UBG_bwa_mem_output | Output SAM file name for uncollapsed bam generation (Required) | |
BC_bwa_mem_output | Output SAM file name for bam collapsing (Required) | |
bwa_mem_M | Mark shorter split hits as secondary | |
bwa_mem_K | to achieve deterministic alignment results (Note: this is a hidden option) | 1000000 |
bwa_number_of_threads | Number of threads |
Picard AddOrReplaceReadGroups
Argument Name | Summary | Default Value |
UBG_picard_addRG_output_file_name | Output BAM file name for uncollapsed bam generation (Required) | |
BC_picard_addRG_output_file_name | Output BAM file name for bam collapsing (Required) | |
picard_addRG_sort_order | Sort order for the BAM file | queryname |
GATK MergeBamAlignment
Argument Name | Summary | Default Value |
UBG_gatk_merge_bam_alignment_output_file_name | Output BAM file name for uncollapsed bam generation (Required) | |
BC_gatk_merge_bam_alignment_output_file_name | Output BAM file name for bam collapsing (Required) |
Picard MarkDuplicates
Argument Name | Summary | Default Value |
optical_duplicate_pixel_distance | The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is more appropriate. For other platforms and models, users should experiment to find what works best. | 2500 |
read_name_regex | Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. | |
duplicate_scoring_strategy | The scoring strategy for choosing the non-duplicate among candidates. | |
gatk_mark_duplicates_output_file_name | The output file to write marked records to (Required) | |
gatk_mark_duplicates_duplication_metrics_file_name | File to write duplication metrics to (Required) | |
gatk_mark_duplicates_assume_sort_order | If not null, assume that the input file has this order even if the header says otherwise. |
bedtools genomecov
Argument Name | Summary | Default Value |
bedtools_genomecov_option_bedgraph | option flag parameter to choose output file format. -bg refers to bedgraph format | True |
bedtools merge
Argument Name | Summary | Default Value |
bedtools_merge_distance_between_features | Maximum distance between features allowed for features to be merged. | 10 |
Argument Name | Summary | Default Value |
abra2_window_size | Processing window size and overlap (size,overlap) | "400,200" |
abra2_soft_clip_contig | Soft clip contig args [maxcontigs,min_base_qual,frac high_qual_bases,min_soft_clip_len] | "16,13,80,15" |
abra2_scoring_gap_alignments | Scoring used for contig alignments(match, mismatch_penalty,gap_open_penalty,gap_extend_penalty) | "8,32,48,1" |
abra2_no_sort | Do not attempt to sort final output | True |
abra2_no_edge_complex_indel | Prevent output of complex indels at read start or read end | True |
abra2_maximum_mixmatch_rate | Max allowed mismatch rate when mapping reads back to contigs | 0.1 |
abra2_maximum_average_depth | Regions with average depth exceeding this value will be down-sampled | 1000 |
abra2_contig_anchor | Contig anchor [M_bases_at_contig_edge,max_mismatches_near_edge] | "10,2" |
abra2_consensus_sequence | Use positional consensus sequence when aligning high quality soft clipping | |
BC_abra2_output_bams | The output BAM file to write to (Required) | |
UBG_abra2_output_bams | The output BAM file to write to (Required) |
Picard FixMateInformation
Argument Name | Summary | Default Value |
UBG_picard_fixmateinformation_output_file_name | The output BAM file to write to for uncollapsed bam generation (Required) | |
BC_picard_fixmate_information_output_file_name | The output BAM file to write to for bam collapsing (Required) |
Base Quality Score Recalibration
GATK BaseRecalibrator
Argument Name | Summary | Default Value |
gatk_base_recalibrator_known_sites | One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis (Required) | |
gatk_bqsr_read_filter | Read filters to be applied before analysis | |
base_recalibrator_output_file_name | The output recalibration table file to create (Required) |
GATK ApplyBQSR
Argument Name | Summary | Default Value |
apply_bqsr_output_file_name | The output BAM file (Required) | |
gatk_bqsr_disable_read_filter | Read filters to be disabled before analysis |
Collapsed BAM Generation
Fgbio GroupReadsByUmi
Argument Name | Summary | Default Value |
fgbio_group_reads_by_umi_input | The input BAM file | |
fgbio_group_reads_by_umi_strategy | The UMI assignment strategy. (identity, edit, adjacency, paired) | paired |
fgbio_group_reads_by_umi_raw_tag | The tag containing the raw UMI. | RX |
fgbio_group_reads_by_umi_output_file_name | The output BAM file name (Required) | |
fgbio_group_reads_by_umi_min_umi_length | The minimum UMI length. If not specified then all UMIs must have the same length, otherwise, discard reads with UMIs shorter than this length and allow for differing UMI lengths. | |
fgbio_group_reads_by_umi_include_non_pf_reads | Include non-PF reads. | False |
fgbio_group_reads_by_umi_family_size_histogram | Optional output of tag family size counts. (Required) | Give a file name. ex: samplename.hist |
fgbio_group_reads_by_umi_edits | The allowable number of edits between UMIs. | 1 |
fgbio_group_reads_by_umi_assign_tag | The output tag for UMI grouping. | MI |
Fgbio CollectDuplexSeqMetrics
Argument Name | Summary | Default Value |
fgbio_collect_duplex_seq_metrics_intervals | Optional set of intervals over which to restrict analysis. | |
fgbio_collect_duplex_seq_metrics_output_prefix | Prefix of output files to write. | |
fgbio_collect_duplex_seq_metrics_min_ba_reads | Minimum BA reads to call a tag family a ‘duplex’. | |
fgbio_collect_duplex_seq_metrics_min_ab_reads | Minimum AB reads to call a tag family a ‘duplex’. | |
fgbio_collect_duplex_seq_metrics_mi_tag | The output tag for UMI grouping. | MI |
fgbio_collect_duplex_seq_metrics_duplex_umi_counts | If true, produce the .duplex_umi_counts.txt file with counts of duplex UMI observations. | True |
fgbio_collect_duplex_seq_metrics_description | Description of data set used to label plots. Defaults to sample/library. |
Fgbio CallDuplexConsensusReads
Argument Name | Summary | Default Value |
fgbio_call_duplex_consensus_reads_trim | If true, quality trim input reads in addition to masking low Q bases. | |
fgbio_call_duplex_consensus_reads_sort_order | The sort order of the output, if :none: then the same as the input. | |
fgbio_call_duplex_consensus_reads_read_name_prefix | The prefix all consensus read names | |
fgbio_call_duplex_consensus_reads_read_group_id | The new read group ID for all the consensus reads. | |
fgbio_call_duplex_consensus_reads_output_file_name | Output SAM or BAM file to write consensus reads. | |
fgbio_call_duplex_consensus_reads_min_reads | The minimum number of input reads to a consensus read. | 1 1 0 |
fgbio_call_duplex_consensus_reads_min_input_base_quality | Ignore bases in raw reads that have Q below this value. | |
fgbio_call_duplex_consensus_reads_max_reads_per_strand | The maximum number of reads to use when building a single-strand consensus. If more than this many reads are present in a tag family, the family is randomly downsampled to exactly max-reads reads. | |
fgbio_call_duplex_consensus_reads_error_rate_pre_umi | The Phred-scaled error rate for an error prior to the UMIs being integrated. | |
fgbio_call_duplex_consensus_reads_error_rate_post_umi | The Phred-scaled error rate for an error post the UMIs have been integrated. |
Fgbio FilterConsensusReads
Argument Name | Summary | Default Value |
fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex | Reverse [complement] per base tags on reverse strand reads.- Simplex+Duplex | |
fgbio_filter_consensus_read_reverse_per_base_tags_duplex | Reverse [complement] per base tags on reverse strand reads. - Duplex | |
fgbio_filter_consensus_read_require_single_strand_agreement_simplex_duplex | Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only). | |
fgbio_filter_consensus_read_require_single_strand_agreement_duplex | Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only). | |
fgbio_filter_consensus_read_max_base_error_rate_duplex | The maximum error rate for a single consensus base. (Max 3 values) - Duplex | |
fgbio_filter_consensus_read_max_base_error_rate_simplex_duplex | The maximum error rate for a single consensus base. (Max 3 values) - Simplex + Duplex | |
fgbio_filter_consensus_read_max_no_call_fraction_duplex | Maximum fraction of no-calls in the read after filtering - Duplex | |
fgbio_filter_consensus_read_max_read_error_rate_duplex | The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Duplex | |
fgbio_filter_consensus_read_max_no_call_fraction_simplex_duplex | Maximum fraction of no- calls in the read after filtering - Simplex + Duplex | |
fgbio_filter_consensus_read_max_read_error_rate_simplex_duplex | The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Simplex + Duplex | |
fgbio_filter_consensus_read_min_base_quality_duplex | Mask (make N) consensus bases with quality less than this threshold. - Duplex | |
fgbio_filter_consensus_read_min_base_quality_simplex_duplex | Mask (make N) consensus bases with quality less than this threshold. - Simplex+Duplex | |
fgbio_filter_consensus_read_min_mean_base_quality_duplex | The minimum mean base quality across the consensus read - Duplex | |
fgbio_filter_consensus_read_min_mean_base_quality_simplex_duplex | The minimum mean base quality across the consensus read - Simplex + Duplex | |
fgbio_filter_consensus_read_min_reads_duplex | The minimum number of reads supporting a consensus base/read. (Max 3 values) - Duplex | 2, 1, 1 |
fgbio_filter_consensus_read_min_reads_simplex_duplex | The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex | 3, 3, 0 |
fgbio_filter_consensus_read_output_file_name_simplex_duplex | Output BAM file name Simplex + Duplex (Required) | |
fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics | Output file name Duplex alignment metrics | |
fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics | Output file name Simplex alignment metrics | |
fgbio_filter_consensus_read_output_file_name_duplex | Output BAM file name - Duplex (Required) | |
fgbio_filter_consensus_read_min_simplex_reads | The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex |
Fgbio Postprocessing
Argument Name | Summary | Default Value |
fgbio_postprocessing_output_file_name_simplex | Output BAM file name Simplex (Required) |
Argument Name | Summary | Default Value |
gatk_collect_alignment_summary_metrics_output_file_name | Output file name for metrics on collapsed BAM (Duplex+Simplex+Singletons) |
Template Inputs File
Last updated