Installation and Usage

How to install and run the workflow

Step 1: Create a virtual environment.

Step 1: For cwltool/toil, please install using python 3.6 as done below:

Here we can use either virtualenv or conda. Here we will use virtualenv.

pip3 install virtualenv
python3 -m venv my_project
source my_project/bin/activate

Once you execute the above command you will see your bash prompt something on these lines:

(my_project)[server]$

Step 2: Clone the repository

git-clone-with-submodule
git clone --recursive https://github.com/msk-access/uncollapsed_bam_generation.git
cd standard_bam_processing
git submodule update --recursive --remote

Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

python-package-installation-using-pip
#python2
pip install -r requirements.txt
#python3
pip3 install -r requirements.txt

To see help for the inputs for cwl workflow you can use: cwltool uncollapsed_bam_generation.cwl --help

To see help for the inputs for cwl workflow you can use: toil-cwl-runner uncollapsed_bam_generation.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil if you have proper input file generated either in json or yaml format. Please look at Inputs Description for more details.

Here we show how to use cwltool to run the workflow on single machine

Step 4: Run the workflow with a given set of input using cwltool on single machine

cwltool-execution
cwltool uncollapsed_bam_generation.cwl inputs.yaml

Here we show how to run the workflow using toil-cwl-runner using single machine interface.

Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in json or yaml format. Please look at Inputs Description for more details.

Step 4: Run the workflow with a given set of input using toil on single machine

toil-local-execution
toil-cwl-runner uncollapsed_bam_generation.cwl inputs.yaml

Here we show how to run the workflow using toil-cwl-runner on MSKCC internal compute cluster called JUNO which has IBM LSF as a scheduler.

Step 4: Run the workflow with a given set of input using toil on JUNO (MSKCC Research Cluster)

TMPDIR=$PWD
TOIL_LSF_ARGS='-W 3600'
toil-cwl-runner \
       --singularity \
       --logFile /path/to/toil_log/cwltoil.log  \
       --jobStore /path/to/jobStore \
       --batchSystem lsf \
       --workDir /path/to/toil_log \
       --outdir $PWD \
       --writeLogs /path/to/toil_log \
       --logLevel DEBUG \
       --stats \
       --retryCount 2 
       --disableCaching \
       --disableChaining \
       --maxLogFileSize 20000000000 \
       --cleanWorkDir onSuccess
       --preserve-environment TOIL_LSF_ARGS TMPDIR \
       /path/to/uncollapsed_bam_generation.cwl \
       /path/to/inputs.yaml \
       > toil.stdout \
       2> toil.stderr &

You should now be running the workflow on the specified batch system

Last updated