Lyrics transcription for music professionals.

Vocal isolation, dialect-aware processing, and structured output — purpose-built for catalogs in Spanish, English, and code-switched production.

Start Free Trial Read the methodology

In production with labels and publishers across Latin America and the U.S. Music Technology Venture Studio — San Juan, Puerto Rico.

musavox.io / dashboard

Good afternoon, Alex

Wed, May 6

Demo preview

Tracks

Avg accuracy

97%

Dialect

PR · MX

Pipeline

Active

Recent transcriptions

Bad Bunny — DTMF
PR · Reggaetón · 47s
0.0%
Peso Pluma — Lady Gaga
MX · Corrido Tumbado · 52s
0.0%
Karol G — Si Antes Te Hubiera Conocido
CO · Reggaetón Pop · 49s
0.0%
Rauw Alejandro — Touching The Sky
PR · Trap Latino · 51s
0.0%
ELENA ROSE — Me Lo Merezco
US Latin · Pop · 44s
0.0%

Active pipeline

Vocal isolation

Transcription

Post-processing

Done

Smart Review

82%

“Que se joda el que corille”

Capabilities

Capabilities.

Six capabilities that compound across the workflow.

VOCAL ISOLATION

Source separation, before the transcriber sees a single bar.

Source separation removes the instrumental before transcription. The transcriber receives clean vocals, not the full mix.

ADAPTIVE DIALECT ENGINE

Region-aware Spanish processing.

Spanish vocabulary differs sharply across territories. Musavox detects the regional dialect from the audio and applies a curated lexicon during post-processing. Current coverage spans 8 regions including Puerto Rico, México, Colombia, República Dominicana, Venezuela, U.S. Latin, Argentina, and Chile. Coverage expands with catalog demand.

CONFIDENCE HIGHLIGHTING

Every line carries a score. Reviewers see only what matters.

Each transcribed line carries a confidence score. Lines below threshold are flagged in the editor with smart-review navigation, so reviewers focus on the lines that need a second pass instead of reading every line.

CODE-SWITCH DETECTION

Bilingual tracks are first-class citizens.

Mid-verse switches between Spanish and English are detected and tagged in the output. The post-processor handles bilingual production without requiring manual segmentation.

INDUSTRY-STANDARD EXPORTS

TXT, LRC, SRT, JSON. Plus the integrations that publishing actually needs.

Word-level timestamps in LRC. Plain text with section markers in TXT. Subtitle-format SRT for music video workflows. Structured JSON for downstream metadata pipelines.

CATALOG-LEVEL CONSISTENCY

Catalog-level consistency.

Editorial corrections you make on one track propagate as suggestions within your own catalog only. Your editorial layer stays inside your account; nothing leaves your tenant boundary. Use of anonymized corrections to improve global model accuracy is opt-in per organization and disabled by default.

ADAPTIVE DIALECT ENGINE

Region-aware processing across 8 dialect modules.

Spanish vocabulary, idiom, and pronunciation differ sharply across territories. Reggaetón from Puerto Rico, corridos tumbados from México, and dembow from República Dominicana each carry distinct lexicons that generic models smooth into a textbook neutral. The result reads wrong to native speakers and creates downstream metadata problems.

Eight fully curated regional modules covering Puerto Rico, México, República Dominicana, Colombia, Argentina, Chile, Venezuela, and U.S. Latin. 302+ curated terms across all regions. Lexicons expand as catalogs in new territories enter the platform.

Read the full methodology

Detected dialectPR · Puerto Rico

Puerto Rico

82%

matched: jevo, corillo, bellaco, perreo

Mexico

31%

Colombia

18%

How Musavox compares

How Musavox compares.

Capability	Moises	VEED	Songscription	Musavox
Lyrics transcription
Vocal isolation	Basic	—	—	Production-grade
Latin music focus	—	—	—	Native, regional-aware
Regional dialect detection	—	—	—	Auto-detect, 6+ regions
Ad-lib + producer tag detection	—	—	—
Code-switch (Spanish/English)	Basic	—	—	First-class citizen
Confidence scoring per line	—	—	—
Smart Review flagging	—	—	—
Industry-standard exports	TXT	TXT, SRT	TXT	TXT, LRC, SRT, JSON
Correction-capture training loop	—	—	—	Catalog-specific learning
Built for label/publisher catalogs	—	—	—

Built for

Built for catalog operators.

A&R Executives

Evaluate prospective signings with lyrics transcribed in the dialect of the artist. Identify thematic patterns and lyrical consistency across a roster before contract.

Label Managers and Artist Relations

Transcribe lyrics at the pace of release week. Submit final metadata to distribution with verified lyrics, not approximated ones. Audit trail for every editorial decision lives inside the platform.

Publishers

Lyrics ready for copyright registration with word-level timestamps. Confidence scores flag lines that need a second pass before submission.

Music Attorneys

Lyric documentation with confidence scoring, version history, and editorial provenance per track.

Why Musavox

Musavox was built by operators who needed a transcription tool

that understood the music they were releasing.

Generic tools smooth regional vocabulary into neutral Spanish.

Editorial teams ship with errors. Metadata gets rejected.

We built the tool we wanted to use. Then we opened it to other catalogs.

— The Musavox team

Simple, Transparent Pricing

Start free. Scale as you grow.

free

Try it out

$0/mo

5 transcriptions/month
English + Spanish
TXT export
Confidence scoring

Get Started

studio

Independent A&Rs, managers, songwriters

$19/mo

30 transcriptions/month
English + Spanish
TXT + LRC export
Confidence scoring
Email support

Choose Studio

Popular

pro

Growing teams, production houses

$49/mo

100 transcriptions/month
English + Spanish
All export formats
Batch upload
Priority support

Choose Pro

label

Established labels, publisher catalogs

$149/mo

400 transcriptions/month
All languages
All export formats
Batch upload
Multi-seat (up to 5 users)
Custom dialect tuning per territory
Priority pipeline lane
Dedicated support

Choose Label

enterprise

Major labels, publisher catalogs, multi-territory operations

Custom

Everything in Label
Unlimited transcriptions
Unlimited team seats
Multi-office support

Talk to sales

Frequently Asked Questions

Musavox supports MP3, WAV, FLAC, OGG, and M4A files up to 20MB.

The pipeline isolates vocals from instrumentals before transcription, then applies a region-aware language model. Each transcribed line carries a confidence score, so reviewers know exactly which lines need a second pass. Accuracy varies by genre, production density, and audio quality — heavily processed vocals (autotune, layered ad-libs) and live recordings tend to score lower than clean studio vocals.

English and Spanish are fully supported from day one, including code-switching detection for bilingual tracks. More languages coming soon.

Most tracks are processed in 30-90 seconds. You can watch the pipeline progress in real time.

Yes. Every transcription can be reviewed and edited. Your edits are saved separately so you can always revert to the AI version.

Credits reset monthly and don't carry over. This keeps pricing simple and predictable.

The Musavox Brief

A weekly brief for music professionals.

Notable releases, transcription patterns observed across the platform, and editorial guidance for catalog teams. Free, weekly.

No spam. One email per week. Unsubscribe anytime. Read the privacy policy.