Quick Look at Your Data

After basecalling with a tool like Dorado, your first question is usually: "What's in my BAM file?" Nanalogue provides two commands for quickly inspecting your data before diving into detailed analysis.

Prerequisites

You will need:

A BAM file with modification tags (MM and ML tags), typically produced by a basecaller like Dorado
Nanalogue installed

The `peek` Command

The peek command gives you a quick overview of your BAM file by examining the header and first 100 records:

nanalogue peek input.bam

Example output:

contigs_and_lengths:
contig_00000	775
contig_00001	601
contig_00002	666

modifications:
C+m

Note: The contig names contig_00000, contig_00001, etc. are example names used here. In real BAM files aligned to a reference genome, you will see names like chr1, chr2, NC_000001.11, or similar depending on your reference.

This tells you:

Contigs: The reference sequences in your BAM and their lengths
Modifications: The modification types detected (e.g., C+m means 5-methylcytosine on the + strand)

Common Modification Codes

Code	Modification
`C+m`	5-methylcytosine (5mC)
`C+h`	5-hydroxymethylcytosine (5hmC)
`A+a`	N6-methyladenine (6mA)
`T+472552`	BrdU (5-bromodeoxyuridine). Older BAM files may use `T+T` (generic thymidine modification) or `T+B`

The `read-stats` Command

For a more detailed summary of your reads, use read-stats:

nanalogue read-stats input.bam

Example output:

key	value
n_primary_alignments	8
n_secondary_alignments	8
n_supplementary_alignments	11
n_unmapped_reads	3
n_reversed_reads	13
align_len_mean	389
align_len_max	605
align_len_min	192
align_len_median	387
align_len_n50	431
seq_len_mean	390
seq_len_max	605
seq_len_min	192
seq_len_median	390
seq_len_n50	431

This provides:

Alignment counts: Primary, secondary, supplementary, and unmapped reads
Length statistics: Mean, median, min, max, and N50 for both alignment and sequence lengths

Troubleshooting: No Modifications Detected

If peek shows no modifications, check:

Basecaller model: Did you use a modification-aware model? For Dorado, models with 5mC or 6mA in the name produce modification calls.
MM/ML tags present: Check if your BAM has the required tags:
```
samtools view input.bam | head -1 | grep -o "MM:Z:[^ ]*"
```
If this returns nothing, your BAM lacks modification data.
Correct reference: For aligned BAMs, ensure reads actually align to the reference. A high unmapped count in read-stats suggests alignment issues.

Quick Sanity Checks

Before detailed analysis, verify your data looks reasonable:

# Check you have enough reads
nanalogue read-stats input.bam | grep n_primary_alignments

# Verify modification types match your experiment
nanalogue peek input.bam | grep modifications

Next Steps

Once you've confirmed your data looks correct:

Quality control of mod data — Assess modification call quality
Find highly modified reads — Filter reads by modification level
Explore a specific region — Focus on genes or features of interest

Nanalogue cookbook