Introduction

Basics on the usage of biometrics

Biometrics is a Python package to compute various metrics for assessing sample contamination, sample swaps, and sample sex validation. The package is composed of five tools (see below). All the tools (except the sex mismatch one) depend on you providing a VCF file of SNPs to use for computing the metrics. The sex mismatch tool requires you to provide a BED file containing the Y chromosome regions of interest.

Extract

Running this step is required before running any of the other four tools. This step extracts the pileup and coverage information from your BAM file(s) and stores the result in a file. The file can then be accessed not just for your initial analysis but for all subsequent analyses that make use of the sample. This provides a significant speed boost to running the four downstream biometrics tools.

Click here to read more about this tool.

Genotype

Compares each each sample against each other to verify expected sample matches and identify any unexpected matches or mismatches. Relies on computing a discordance score between each pair of samples.

Click here to read more about this tool.

Cluster

Takes the output from the genotype comparison tool and clusters the samples into groups. Clustering is based on binarizing the discordance score into 0 or 1, and then finding the connected samples.

Click here to read more about this tool.

Minor contamination

Minor contamination check is done to see if a patient’s sample is contaminated with a little DNA from unrelated individuals.

Click here to read more about this tool.

Major contamination

Major contamination check is done to see if a patient’s sample is contaminated with DNA from unrelated individuals.

Click here to read more about this tool.

Sex mismatch

Used to determine if the predicted sex mismatches the expected sex for a given sample.

Click here to read more about this tool.

Last updated