Quick Look at Your Data

After basecalling with a tool like Dorado, your first question is usually: "What's in my BAM file?" Nanalogue provides two commands for quickly inspecting your data before diving into detailed analysis.

Prerequisites

You will need:

  • A BAM file with modification tags (MM and ML tags), typically produced by a basecaller like Dorado
  • Nanalogue installed

The peek Command

The peek command gives you a quick overview of your BAM file by examining the header and first 100 records:

nanalogue peek input.bam

Example output:

contigs_and_lengths:
contig_00000	623
contig_00001	604
contig_00002	693

modifications:
C+m

Note: The contig names contig_00000, contig_00001, etc. are example names used here. In real BAM files aligned to a reference genome, you will see names like chr1, chr2, NC_000001.11, or similar depending on your reference.

This tells you:

  • Contigs: The reference sequences in your BAM and their lengths
  • Modifications: The modification types detected (e.g., C+m means 5-methylcytosine on the + strand)

Common Modification Codes

CodeModification
C+m5-methylcytosine (5mC)
C+h5-hydroxymethylcytosine (5hmC)
A+aN6-methyladenine (6mA)
T+472552BrdU (5-bromodeoxyuridine). Older BAM files may use T+T (generic thymidine modification) or T+B

The read-stats Command

For a more detailed summary of your reads, use read-stats:

nanalogue read-stats input.bam

Example output:

key	value
n_primary_alignments	12
n_secondary_alignments	10
n_supplementary_alignments	5
n_unmapped_reads	3
n_reversed_reads	12
align_len_mean	337
align_len_max	515
align_len_min	189
align_len_median	342
align_len_n50	411
seq_len_mean	344
seq_len_max	551
seq_len_min	189
seq_len_median	356
seq_len_n50	411

This provides:

  • Alignment counts: Primary, secondary, supplementary, and unmapped reads
  • Length statistics: Mean, median, min, max, and N50 for both alignment and sequence lengths

Troubleshooting: No Modifications Detected

If peek shows no modifications, check:

  1. Basecaller model: Did you use a modification-aware model? For Dorado, models with 5mC or 6mA in the name produce modification calls.

  2. MM/ML tags present: Check if your BAM has the required tags:

    samtools view input.bam | head -1 | grep -o "MM:Z:[^ ]*"
    

    If this returns nothing, your BAM lacks modification data.

  3. Correct reference: For aligned BAMs, ensure reads actually align to the reference. A high unmapped count in read-stats suggests alignment issues.

Quick Sanity Checks

Before detailed analysis, verify your data looks reasonable:

# Check you have enough reads
nanalogue read-stats input.bam | grep n_primary_alignments

# Verify modification types match your experiment
nanalogue peek input.bam | grep modifications

Next Steps

Once you've confirmed your data looks correct:

See Also

  • CLI Reference — Full documentation of all nanalogue commands
  • Recipes — Quick copy-paste snippets for common tasks