Extracting Sequences

Nanalogue can extract and display read sequences from BAM files with highlighting for insertions, deletions, and modifications. This is useful for inspecting alignment quality and understanding modification patterns at the sequence level.

Quick Reference

FlagEffect
--region <REGION>Only include reads passing through the given region
--full-regionOnly include reads that pass through the given region in full
--seq-region <REGION>Display sequences from a specific genomic region
--seq-fullDisplay the entire basecalled sequence
--show-ins-lowercaseShow insertions as lowercase letters
--show-mod-zShow modified bases as Z (or z for modified insertions)
--show-base-qualShow basecalling quality scores

Regions are written in the common genomics notation of contig:start-end e.g. chr1:50-100. We use 0-based coordinates that are half open i.e. in the example above, we are including all bases from the 51st base of chr1 to the 100th base.

Display conventions:

  • Insertions: lowercase letters (with --show-ins-lowercase)
  • Deletions: shown as periods (.)
  • Modifications: shown as Z or z (with --show-mod-z)
  • Quality at deleted positions: 255

Prerequisites

You will need:

  • A BAM file with modification tags (MM and ML tags)
  • Nanalogue installed
  • For indel examples: a BAM file with insertions and/or deletions

Note: The contig names contig_00001, etc. are example names used throughout this guide. In real BAM files aligned to a reference genome, you will see names like chr1, chr2, NC_000001.11, or similar depending on your reference.

Basic Sequence Extraction

To extract sequences from a specific region, use --seq-region:

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.8f19ba27-a631-40e9-8b41-54e28ef35dea	360	360	secondary_reverse	m:58	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.4fb99956-bc82-47b6-8301-4296cccd9568	357	357	secondary_forward	m:47	TACGATAAAACTTTGCATATAC
0.95246772-dc18-4ff0-a7e9-c10528634799	441	441	primary_reverse	m:70	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
...

In the above example, you may see sequences of varying length. This is because not all the reads will pass through the given region in full. To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --full-region input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.af9c194f-b7da-40df-8d26-e955c5e1e344	189	189	supplementary_forward	m:30	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.8f19ba27-a631-40e9-8b41-54e28ef35dea	360	360	secondary_reverse	m:58	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.3d850c88-f86e-4522-a8aa-52fe097af716	450	450	primary_reverse	m:74	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
...

Inspecting Alignment Quality

Viewing Insertions

Insertions relative to the reference can be highlighted as lowercase letters using --show-ins-lowercase. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a	200	194	primary_forward	m:25	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e	200	194	primary_reverse	m:33	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.e7621073-aeb3-4c26-9034-b284542dca1c	200	194	primary_reverse	m:33	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
...

In the output, lowercase letters indicate bases that are insertions (present in the read but not in the reference).

Viewing Deletions

Deletions are automatically shown as periods (.) when displaying sequences from a region. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.20ce3c77-28a5-450d-b93d-8e72cd9fb679	200	194	supplementary_forward	m:25	AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e	200	194	primary_reverse	m:33	AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
0.26bcab2f-ba87-40eb-b79c-e8b7dd68bc34	200	194	primary_reverse	m:33	AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
...

Each period represents a position where the reference has a base but the read does not.

Viewing Modification Patterns

To mark modified bases in the sequence, use --show-mod-z:

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-mod-z input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.95246772-dc18-4ff0-a7e9-c10528634799	441	441	primary_reverse	m:70	TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
0.3d850c88-f86e-4522-a8aa-52fe097af716	450	450	primary_reverse	m:74	TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.af9c194f-b7da-40df-8d26-e955c5e1e344	189	189	supplementary_forward	m:30	TATGZTATZCACTTACAZTAZGATAAAAZTTTGZATATAZ
...

Modified bases are displayed as:

  • Z for modified bases on the reference
  • z for modified bases within an insertion (when combined with --show-ins-lowercase)

In the above example, you may see sequences of varying length. This is because not all the reads will pass through the given region in full. To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-mod-z --full-region input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.af9c194f-b7da-40df-8d26-e955c5e1e344	189	189	supplementary_forward	m:30	TATGZTATZCACTTACAZTAZGATAAAAZTTTGZATATAZ
0.95246772-dc18-4ff0-a7e9-c10528634799	441	441	primary_reverse	m:70	TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
0.8f19ba27-a631-40e9-8b41-54e28ef35dea	360	360	secondary_reverse	m:58	TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
...

Combining Display Options

You can combine multiple flags to see insertions, deletions, modifications, and quality scores all at once. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 \
    --show-ins-lowercase --show-mod-z --show-base-qual \
    input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence	qualities
0.20ce3c77-28a5-450d-b93d-8e72cd9fb679	200	194	supplementary_forward	m:25	AGAAGACGCZ..........aaaaGGTGTAZTZZGTZAGAAGGC	27.25.21.33.31.29.21.37.25.38.255.255.255.255.255.255.255.255.255.255.34.24.34.35.40.30.27.32.26.28.31.40.24.26.38.38.35.29.35.31.31.37.35.28
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e	200	194	primary_reverse	m:33	AZAAZACGCC..........aaaaGGTZTACTCCZTCAZAAZZC	23.21.37.25.31.31.31.21.25.28.255.255.255.255.255.255.255.255.255.255.23.26.31.28.38.24.33.20.29.36.40.35.29.25.34.25.25.29.37.20.26.24.20.38
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a	200	194	primary_forward	m:25	AGAAGACGCZ..........aaaaGGTGTAZTZZGTZAGAAGGC	23.35.34.20.31.27.31.25.33.35.255.255.255.255.255.255.255.255.255.255.22.25.33.31.20.35.22.28.32.30.35.27.22.23.29.20.27.20.26.39.33.39.40.40
...

This produces output with:

  • Lowercase letters for insertions
  • Periods for deletions
  • Z/z for modifications
  • Quality scores as period-separated integers (with 255 for deleted positions)

When to Use read-table-hide-mods

The read-table-hide-mods command is a simpler alternative when you don't need modification information. It supports the same sequence display options (--seq-region, --seq-full, --show-ins-lowercase, --show-base-qual) but does not include --show-mod-z or modification-related filters. We demonstrate usage using a file with indels in it.

Use read-table-hide-mods when:

  • Your BAM file doesn't have modification data
  • You only care about alignment quality (insertions/deletions)
  • You want slightly faster processing by skipping modification parsing
nanalogue read-table-hide-mods --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam

Example output:

read_id	align_length	sequence_length_template	alignment_type	sequence
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e	200	194	primary_reverse	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.e7621073-aeb3-4c26-9034-b284542dca1c	200	194	primary_reverse	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.26bcab2f-ba87-40eb-b79c-e8b7dd68bc34	200	194	primary_reverse	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a	200	194	primary_forward	AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
...

Creating Test Data

To create your own BAM files with insertions, deletions, and modifications for testing, see Test data with indels.

Next Steps

See Also