How Rast Analyses a Song

You hand Rast an audio file or a YouTube link. A few minutes later, you get back a synced chord chart, a key label that names a Greek dromos (one of the eleven traditional scales), a beat grid that scrolls in time with the music, and two stems — vocals on one side, everything-else on the other. Nothing leaves your computer: every model runs on-device, in Rust, against ONNX (an open neural-network format) files that ship inside the app.

This page is a tour of how that happens. Later pages dig into the interesting bits.

What goes in, what comes out

audio file ──┐                                ┌── synced chord chart
             │                                │── key & dromos label
YouTube URL ─┼──>  Rast analysis pipeline  ──>┤── beat grid + tempo (BPM)
             │                                │── vocals.flac
ingested ────┘                                └── instrumental.flac

Behind that arrow is a pipeline of stages, some of which run side by side because they don't need each other's output. The ordering, names, and weights below match what you'd see in the progress bar inside the app.

The stages

              ┌─────────────────────────────────────────────────────┐
              │                  IMPORT                             │
              │  ffmpeg → original.flac → SHA-256 hash → library DB │
              └────────────────────────┬────────────────────────────┘
                                       │
                                       v
              ┌────────────────────────────────────────────────────┐
              │     SEPARATION  (Spleeter or Demucs, ONNX)         │
              │     full mix  →  vocals + instrumental             │
              └─────┬────────────────────────────────────────┬─────┘
                    │ runs in parallel with...               │
                    v                                        v
       ┌──────────────────────────┐         ┌─────────────────────────────┐
       │  BEATS  (beat_this +     │         │  CHORDS  (CREMA or BTC,     │
       │  Dynamic Bayesian Net)   │         │  on the instrumental stem)  │
       │  → beat_times, BPM       │         │  → raw chord segments       │
       └────────────┬─────────────┘         └────────────────┬────────────┘
                    │                                        │
                    │     (NOTE TRANSCRIPTION runs here too, │
                    │      feeding into key detection)       │
                    │                                        │
                    └────────────────┬───────────────────────┘
                                     v
                  ┌────────────────────────────────────────┐
                  │   KEY DETECTION  (chord-aware over     │
                  │   the 11 dromoi; chroma reinforces)    │
                  │   → primary key, candidates, relatives │
                  └────────────────────┬───────────────────┘
                                       v
                  ┌────────────────────────────────────────┐
                  │   CHORD SNAP  (align chord boundaries  │
                  │   to the beat grid; merge runs)        │
                  └────────────────────┬───────────────────┘
                                       v
                  ┌────────────────────────────────────────┐
                  │   BEAT-CHROMA MATRIX  (one chroma      │
                  │   vector per beat; powers "find        │
                  │   similar sections")                   │
                  └────────────────────┬───────────────────┘
                                       v
                                AnalysisOutput

A chroma vector, mentioned a couple of times above, is just a 12-number summary of which pitch classes (C, C#, D, …, B) are loudest in a window of audio. It is the lingua franca of chord and key analysis.

The pipeline is implemented in rust/rast-core/src/pipeline.rs. Five named stages report progress to the UI: separation, beats, chords, key, and chroma similarity. Note transcription runs alongside chords on the same instrumental stem and folds into key detection — it is not a separately reported stage.

Why this order, and why parallel

Separation, beats, and chords share a thread scope. Separation produces the instrumental stem; chord detection wants that stem (vocals interfere with chord recognition, see Separation); beat detection only needs the full mix, so it can race ahead while separation is still running. The end-of-pipeline stages — key, chord-snap, and the beat-chroma matrix — are quick and run sequentially.

Key detection comes after chords because the algorithm is chord-aware: it disambiguates between dromoi that share the same notes (Hidjaz vs. Harmonic Minor, for example) by reading the qualities of your I and V chords. Chroma alone cannot do this. See Key & dromos detection.

Chord snapping is the last polishing pass. The neural model emits chord segments with millisecond-precision boundaries that don't line up with beats; snapping rewrites those boundaries onto the beat grid so the chord lane in the timeline doesn't look ragged and so chord edits land on whole-beat targets. See Chord detection.

What it doesn't do (yet, or by design)

Rast doesn't pitch-shift or time-stretch audio during analysis — those happen live during playback through a SoundTouch worklet inside the browser engine. Source separation introduces audible artefacts (smeared transients, "underwater" bass), and chord detection makes mistakes. The app is built around the assumption that you, a musician with ears, will fix things; the chord chart is editable and Rast remembers your edits.

The next four pages walk through each non-trivial stage in turn — what model is running, why it works, and where it tends to struggle.

How Rast Analyses a Song ​

What goes in, what comes out ​

The stages ​

Why this order, and why parallel ​

What it doesn't do (yet, or by design) ​

How Rast Analyses a Song

What goes in, what comes out

The stages

Why this order, and why parallel

What it doesn't do (yet, or by design)