Extracting Sequences

Nanalogue can extract and display read sequences from BAM files with highlighting for insertions, deletions, and modifications. This is useful for inspecting alignment quality and understanding modification patterns at the sequence level.

Quick Reference

FlagEffect
--region <REGION>Only include reads passing through the given region
--full-regionOnly include reads that pass through the given region in full
--seq-region <REGION>Display sequences from a specific genomic region
--seq-fullDisplay the entire basecalled sequence
--show-ins-lowercaseShow insertions as lowercase letters
--show-mod-zShow modified bases as Z (or z for modified insertions)
--show-base-qualShow basecalling quality scores

Regions are written in the common genomics notation of contig:start-end e.g. chr1:50-100. We use 0-based coordinates that are half open i.e. in the example above, we are including all bases from the 51st base of chr1 to the 100th base.

Display conventions:

  • Insertions: lowercase letters (with --show-ins-lowercase)
  • Deletions: shown as periods (.)
  • Modifications: shown as Z or z (with --show-mod-z)
  • Quality at deleted positions: 255

Prerequisites

You will need:

  • A BAM file with modification tags (MM and ML tags)
  • Nanalogue installed
  • For indel examples: a BAM file with insertions and/or deletions

Note: The contig names contig_00001, etc. are example names used throughout this guide. In real BAM files aligned to a reference genome, you will see names like chr1, chr2, NC_000001.11, or similar depending on your reference.

Basic Sequence Extraction

To extract sequences from a specific region, use --seq-region:

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.f80f6351-96b0-4ca2-9414-75b71961b833	246	246	secondary_reverse	m:45	TGGGGAGCCCACTGGCGGAGGATTCA
0.1915bd81-2515-457b-9208-0bf516bd51be	236	236	secondary_forward	m:35	TCCTGGGGAGCCCACTGGCGGAGGATTCA
0.63611c92-f1a7-44a8-99d1-fdadb1cb228c	404	404	supplementary_forward	m:56	AGCCCACTGGCGGAGGATTCA
...

In the above example, you may see sequences of varying length. This is because not all the reads will pass through the given region in full. To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --full-region input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.819110e0-91fd-4c29-9732-f66b6d201d23	384	384	supplementary_forward	m:54	GTGCCATCTCCTCCTGGGGAGCCCACTGGCGGAGGATTCA

Inspecting Alignment Quality

Viewing Insertions

Insertions relative to the reference can be highlighted as lowercase letters using --show-ins-lowercase. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.4e6fe0ad-fac4-476d-a1ac-f595e7aec8d1	200	194	primary_reverse	m:30	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
0.03cdcf7a-97c9-40f9-872f-4e4e710e9065	200	194	secondary_forward	m:33	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
0.85487837-e722-4cbc-bff4-9d48a65a1871	200	194	secondary_reverse	m:30	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
...

In the output, lowercase letters indicate bases that are insertions (present in the read but not in the reference).

Viewing Deletions

Deletions are automatically shown as periods (.) when displaying sequences from a region. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.85487837-e722-4cbc-bff4-9d48a65a1871	200	194	secondary_reverse	m:30	AGAGCCGTCG..........AAAATCCTGAGCATCATCCTAGTT
0.f8b42ff2-25df-44c7-9869-a15fbfb59049	200	194	primary_forward	m:33	AGAGCCGTCG..........AAAATCCTGAGCATCATCCTAGTT
0.4e6fe0ad-fac4-476d-a1ac-f595e7aec8d1	200	194	primary_reverse	m:30	AGAGCCGTCG..........AAAATCCTGAGCATCATCCTAGTT
...

Each period represents a position where the reference has a base but the read does not.

Viewing Modification Patterns

To mark modified bases in the sequence, use --show-mod-z:

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-mod-z input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.1915bd81-2515-457b-9208-0bf516bd51be	236	236	secondary_forward	m:35	TZZTGGGGAGZZZACTGGCGGAGGATTCA
0.63611c92-f1a7-44a8-99d1-fdadb1cb228c	404	404	supplementary_forward	m:56	AGZZZAZTGGZGGAGGATTCA
0.f80f6351-96b0-4ca2-9414-75b71961b833	246	246	secondary_reverse	m:45	TGGGZAZCCCACTZZCZGAGGATTCA
...

Modified bases are displayed as:

  • Z for modified bases on the reference
  • z for modified bases within an insertion (when combined with --show-ins-lowercase)

In the above example, you may see sequences of varying length. This is because not all the reads will pass through the given region in full. To only include such reads and thus ensure more uniformity, you can use --full-region as shown below.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-mod-z --full-region input.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence
0.819110e0-91fd-4c29-9732-f66b6d201d23	384	384	supplementary_forward	m:54	GTGCCATZTZZTZZTGGGGAGCCCAZTGGZGGAGGATTZA

Combining Display Options

You can combine multiple flags to see insertions, deletions, modifications, and quality scores all at once. We demonstrate usage using a file with indels in it.

nanalogue read-table-show-mods --tag m --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 \
    --show-ins-lowercase --show-mod-z --show-base-qual \
    input_indels.bam

Example output:

# mod-unmod threshold is 0.5
read_id	align_length	sequence_length_template	alignment_type	mod_count	sequence	qualities
0.3fc2167f-4e32-4b53-8368-6fbbabb19875	200	194	supplementary_forward	m:33	AGAGZZGTZG..........aaaaTCCTGAGCATZATZZTAGTT	36.38.32.22.39.28.29.26.35.21.255.255.255.255.255.255.255.255.255.255.27.24.39.31.26.38.21.24.34.40.28.31.29.38.29.27.32.34.23.28.29.21.25.21
0.37519c57-da71-4c26-9d63-2f3c153bb481	200	194	primary_reverse	m:30	AZAZCCZTCZ..........aaaaTCCTGAGCATCATCCTAGTT	39.22.33.24.30.37.30.27.29.21.255.255.255.255.255.255.255.255.255.255.37.30.32.20.26.26.30.29.22.28.24.29.28.26.31.20.31.34.30.30.20.30.34.31
0.03cdcf7a-97c9-40f9-872f-4e4e710e9065	200	194	secondary_forward	m:33	AGAGZZGTZG..........aaaaTCCTGAGCATZATZZTAGTT	28.39.22.31.29.36.25.25.32.31.255.255.255.255.255.255.255.255.255.255.29.29.23.27.28.28.38.38.20.30.31.38.28.40.27.32.40.23.35.40.35.32.20.36
...

This produces output with:

  • Lowercase letters for insertions
  • Periods for deletions
  • Z/z for modifications
  • Quality scores as period-separated integers (with 255 for deleted positions)

When to Use read-table-hide-mods

The read-table-hide-mods command is a simpler alternative when you don't need modification information. It supports the same sequence display options (--seq-region, --seq-full, --show-ins-lowercase, --show-base-qual) but does not include --show-mod-z or modification-related filters. We demonstrate usage using a file with indels in it.

Use read-table-hide-mods when:

  • Your BAM file doesn't have modification data
  • You only care about alignment quality (insertions/deletions)
  • You want slightly faster processing by skipping modification parsing
nanalogue read-table-hide-mods --region contig_00001:80-120 \
    --seq-region contig_00001:80-120 --show-ins-lowercase input_indels.bam

Example output:

read_id	align_length	sequence_length_template	alignment_type	sequence
0.8cf7919d-ef1f-43b7-884a-e91a721e4b62	200	194	secondary_reverse	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
0.3fc2167f-4e32-4b53-8368-6fbbabb19875	200	194	supplementary_forward	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
0.e24f8b86-72be-465b-af68-f64b94537c98	200	194	secondary_reverse	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
0.37519c57-da71-4c26-9d63-2f3c153bb481	200	194	primary_reverse	AGAGCCGTCG..........aaaaTCCTGAGCATCATCCTAGTT
...

Creating Test Data

To create your own BAM files with insertions, deletions, and modifications for testing, see Test data with indels.

Next Steps

See Also