Extracting Sequences
Nanalogue can extract and display read sequences from BAM files with highlighting for insertions, deletions, and modifications. This is useful for inspecting alignment quality and understanding modification patterns at the sequence level.
Quick Reference
| Flag | Effect |
|---|---|
--region <REGION> | Only include reads passing through the given region |
--full-region | Only include reads that pass through the given region in full |
--seq-region <REGION> | Display sequences from a specific genomic region |
--seq-full | Display the entire basecalled sequence |
--show-ins-lowercase | Show insertions as lowercase letters |
--show-mod-z | Show modified bases as Z (or z for modified insertions) |
--show-base-qual | Show basecalling quality scores |
Regions are written in the common genomics notation of contig:start-end e.g. chr1:50-100.
We use 0-based coordinates that are half open i.e. in the example above,
we are including all bases from the 51st base of chr1 to the 100th base.
Display conventions:
- Insertions: lowercase letters (with
--show-ins-lowercase) - Deletions: shown as periods (
.) - Modifications: shown as
Zorz(with--show-mod-z) - Quality at deleted positions:
255
Prerequisites
You will need:
- A BAM file with modification tags (
MMandMLtags) - Nanalogue installed
- For indel examples: a BAM file with insertions and/or deletions
Note: The contig names
contig_00001, etc. are example names used throughout this guide. In real BAM files aligned to a reference genome, you will see names likechr1,chr2,NC_000001.11, or similar depending on your reference.
Basic Sequence Extraction
To extract sequences from a specific region, use --seq-region:
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 input.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.8f19ba27-a631-40e9-8b41-54e28ef35dea 360 360 secondary_reverse m:58 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.4fb99956-bc82-47b6-8301-4296cccd9568 357 357 secondary_forward m:47 TACGATAAAACTTTGCATATAC
0.95246772-dc18-4ff0-a7e9-c10528634799 441 441 primary_reverse m:70 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
...
In the above example, you may see sequences of varying length.
This is because not all the reads will pass through the given region in full.
To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 --full-region input.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.af9c194f-b7da-40df-8d26-e955c5e1e344 189 189 supplementary_forward m:30 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.8f19ba27-a631-40e9-8b41-54e28ef35dea 360 360 secondary_reverse m:58 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.3d850c88-f86e-4522-a8aa-52fe097af716 450 450 primary_reverse m:74 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
...
Inspecting Alignment Quality
Viewing Insertions
Insertions relative to the reference can be highlighted as lowercase letters using --show-ins-lowercase.
We demonstrate usage using a file with indels in it.
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a 200 194 primary_forward m:25 AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e 200 194 primary_reverse m:33 AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.e7621073-aeb3-4c26-9034-b284542dca1c 200 194 primary_reverse m:33 AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
...
In the output, lowercase letters indicate bases that are insertions (present in the read but not in the reference).
Viewing Deletions
Deletions are automatically shown as periods (.) when displaying sequences from a region.
We demonstrate usage using a file with indels in it.
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 input_indels.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.20ce3c77-28a5-450d-b93d-8e72cd9fb679 200 194 supplementary_forward m:25 AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e 200 194 primary_reverse m:33 AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
0.26bcab2f-ba87-40eb-b79c-e8b7dd68bc34 200 194 primary_reverse m:33 AGAAGACGCC..........AAAAGGTGTACTCCGTCAGAAGGC
...
Each period represents a position where the reference has a base but the read does not.
Viewing Modification Patterns
To mark modified bases in the sequence, use --show-mod-z:
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 --show-mod-z input.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.95246772-dc18-4ff0-a7e9-c10528634799 441 441 primary_reverse m:70 TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
0.3d850c88-f86e-4522-a8aa-52fe097af716 450 450 primary_reverse m:74 TATGCTATCCACTTACACTACGATAAAACTTTGCATATAC
0.af9c194f-b7da-40df-8d26-e955c5e1e344 189 189 supplementary_forward m:30 TATGZTATZCACTTACAZTAZGATAAAAZTTTGZATATAZ
...
Modified bases are displayed as:
Zfor modified bases on the referencezfor modified bases within an insertion (when combined with--show-ins-lowercase)
In the above example, you may see sequences of varying length.
This is because not all the reads will pass through the given region in full.
To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 --show-mod-z --full-region input.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence
0.af9c194f-b7da-40df-8d26-e955c5e1e344 189 189 supplementary_forward m:30 TATGZTATZCACTTACAZTAZGATAAAAZTTTGZATATAZ
0.95246772-dc18-4ff0-a7e9-c10528634799 441 441 primary_reverse m:70 TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
0.8f19ba27-a631-40e9-8b41-54e28ef35dea 360 360 secondary_reverse m:58 TATZCTATCCACTTACACTACZATAAAACTTTZCATATAC
...
Combining Display Options
You can combine multiple flags to see insertions, deletions, modifications, and quality scores all at once. We demonstrate usage using a file with indels in it.
nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
--seq-region contig_00001:80-120 \
--show-ins-lowercase --show-mod-z --show-base-qual \
input_indels.bam
Example output:
# mod-unmod threshold is 0.5
read_id align_length sequence_length_template alignment_type mod_count sequence qualities
0.20ce3c77-28a5-450d-b93d-8e72cd9fb679 200 194 supplementary_forward m:25 AGAAGACGCZ..........aaaaGGTGTAZTZZGTZAGAAGGC 27.25.21.33.31.29.21.37.25.38.255.255.255.255.255.255.255.255.255.255.34.24.34.35.40.30.27.32.26.28.31.40.24.26.38.38.35.29.35.31.31.37.35.28
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e 200 194 primary_reverse m:33 AZAAZACGCC..........aaaaGGTZTACTCCZTCAZAAZZC 23.21.37.25.31.31.31.21.25.28.255.255.255.255.255.255.255.255.255.255.23.26.31.28.38.24.33.20.29.36.40.35.29.25.34.25.25.29.37.20.26.24.20.38
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a 200 194 primary_forward m:25 AGAAGACGCZ..........aaaaGGTGTAZTZZGTZAGAAGGC 23.35.34.20.31.27.31.25.33.35.255.255.255.255.255.255.255.255.255.255.22.25.33.31.20.35.22.28.32.30.35.27.22.23.29.20.27.20.26.39.33.39.40.40
...
This produces output with:
- Lowercase letters for insertions
- Periods for deletions
Z/zfor modifications- Quality scores as period-separated integers (with
255for deleted positions)
When to Use read-table-hide-mods
The read-table-hide-mods command is a simpler alternative when you don't need modification information.
It supports the same sequence display options (--seq-region, --seq-full, --show-ins-lowercase, --show-base-qual)
but does not include --show-mod-z or modification-related filters.
We demonstrate usage using a file with indels in it.
Use read-table-hide-mods when:
- Your BAM file doesn't have modification data
- You only care about alignment quality (insertions/deletions)
- You want slightly faster processing by skipping modification parsing
nanalogue read-table-hide-mods --region contig_00001:80-120 \
--seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam
Example output:
read_id align_length sequence_length_template alignment_type sequence
0.bcc18a9c-f1db-4ada-b05b-f7bbf55df25e 200 194 primary_reverse AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.e7621073-aeb3-4c26-9034-b284542dca1c 200 194 primary_reverse AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.26bcab2f-ba87-40eb-b79c-e8b7dd68bc34 200 194 primary_reverse AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
0.eb9d7b4d-e9f6-4e08-9eb2-303c9277db0a 200 194 primary_forward AGAAGACGCC..........aaaaGGTGTACTCCGTCAGAAGGC
...
Creating Test Data
To create your own BAM files with insertions, deletions, and modifications for testing, see Test data with indels.
Next Steps
- Quality control of mod data — Assess modification call quality
- Extract raw mod calls — Get detailed modification data
- Finding highly modified reads — Filter reads by modification level
See Also
- Quick look at your data — Initial data inspection
- CLI Reference — Full documentation of all nanalogue commands
- Recipes — Quick copy-paste snippets