Skip to content

Sketching and sequence distances

Ali Osman Berk Şapcı edited this page Nov 25, 2025 · 1 revision

In addition to indexing multiple references together and performing one-to-many queries, krepp can also create a sketch from a single FASTA/Q file (which can be a complete/draft assembly, perhaps a genome skim, or even a sample) for practical analysis (one-to-one). For this, you can simply run

krepp sketch -i /path/to/input.fasta -o /path/to/sketch --num-threads 16

where -i is a URL or a filepath containing reference sequences from which k-mers will be extracted, and -o is the path to save the resulting sketch as a single binary file.

A sketch can not be used for phylogenetic placement, but you can efficiently seek query sequences in a sketch to get distance estimates by running

krepp seek -i /path/to/sketch -q /path/to/query.fastq --num-threads 16

The output is in a tab-separated format with two columns: the sequence ID and the distance estimate.

Clone this wiki locally