AI

SuperTonic 2 TTS for Spanish: How to Generate Natural “Relámpago” (Lightning) Pronunciation, Fast

By Geethu 7 min read
SuperTonic-2-TTS-for-Spanish

If you’re building Spanish voice features (apps, narration tools, accessibility readers, chat agents), you’ll quickly find that “Spanish TTS” isn’t just about picking a model and hitting “synthesize.” The difference between “understandable” and “native-sounding” often comes down to stress, accents, and text normalization—especially on tricky words.

A surprisingly good “canary in the coal mine” test word is:

Relámpago = lightning in Spanish (note the accent on á)

It’s short, common, and it tests exactly the kinds of things that make Spanish TTS sound wrong when mis-handled.

SuperTonic 2 is positioned as a lightning-fast, on-device TTS option with multilingual support including Spanish (es).

This guide shows how to get a natural “relámpago” pronunciation fast, and how to systematically diagnose issues when you don’t.

What “Spanish for lightning” is, and why “relámpago” is a great test word

Relámpago tests multiple Spanish TTS pain points at once:

  • Accent mark (á)
    Spanish accents aren’t decorative. They can change stress and rhythm. If your pipeline drops diacritics, you often get robotic cadence or incorrect emphasis.
  • Syllable timing and stress
    Spanish is syllable-timed compared to English. A TTS engine that drifts into English-like stress can make the word sound “off” even if the phonemes are correct.
  • Context sensitivity
    “Relámpago” appears in normal sentences (“Un relámpago iluminó el cielo”), so it’s useful for both single-word tests and full-sentence tests.

Bottom line: if your system can say “relámpago” naturally, it’s usually in good shape for broader Spanish output.

Common Spanish TTS pitfalls (and what they sound like)

Before changing settings, it helps to recognize the failure mode:

1) Dropped accents → wrong stress

Symptom: “relampago” sounds flatter, or stress shifts unnaturally.
Cause: your input text lost diacritics somewhere (UI normalization, database, JSON encoding, markdown sanitization, etc.).
Fix: preserve UTF-8 and keep accents in the input; test both forms explicitly (we’ll do this below).

2) Wrong language setting → English-ish phonetics

Symptom: vowels drift toward English (“eh/ay” quality), consonants sound “too hard,” rhythm becomes stress-timed.
Cause: language tag still set to English, or the model doesn’t know it’s Spanish.
Fix: ensure you’re using Spanish (es) explicitly—SuperTonic’s ONNX example supports --lang es.

3) Punctuation and casing artifacts

Symptom: odd pauses, clipped word endings, or strange emphasis.
Cause: aggressive sanitization, emoji removal, or punctuation replacement. SuperTonic’s examples mention text preprocessing/normalization improvements and punctuation handling—these can help, but you still need clean input.

4) Over-optimizing for speed

Symptom: intelligible but “thin” or slightly buzzy output.
Cause: too few denoising steps / quality steps.
Fix: increase step count (trade speed for quality). SuperTonic’s example script exposes --total-step for this.

SuperTonic 2 setup essentials (model, language tag, quality-speed knobs)

SuperTonic 2 is available as a model on Hugging Face, with multilingual support including Spanish (es). The project’s ONNX inference examples show a straightforward CLI workflow with key parameters like language and denoising steps.

Option A: Use the ONNX example runner (fast to validate your pipeline)

From the SuperTonic repo’s Python ONNX examples:

  • Multilingual languages include en, ko, es, pt, fr
  • Default steps are 5, and you can increase them for higher fidelity
  • Speech speed can be adjusted via --speed (default 1.05; recommended 0.9–1.5)

Spanish “relámpago” quick test:

uv run example_onnx.py \
  --voice-style assets/voice_styles/M1.json \
  --lang es \
  --text "Relámpago."

Quality vs speed (the two knobs that matter most)

1) --total-step (quality/fidelity)

  • Higher = better quality, slower generation
  • Try:
uv run example_onnx.py --lang es --text "Relámpago." --total-step 10

2) --speed (speech rate, not compute speed)

  • Higher = faster speaking pace, lower = slower
  • Try:
uv run example_onnx.py --lang es --text "Relámpago." --speed 1.1

Practical recommendation:

First get pronunciation and rhythm right with --total-step 10 and --speed 1.0–1.1.

Then reduce steps only if you truly need lower latency.

Prompt / text formatting tips for Spanish (the stuff that quietly breaks output)

Even great TTS models can be sabotaged by your text pipeline. Use these habits:

1) Preserve diacritics end-to-end (UTF-8 everywhere)

Make sure every layer (UI → API → DB → worker → TTS) treats text as UTF-8 and doesn’t strip accents. “Relámpago” must stay relámpago, not relampago.

2) Add minimal punctuation for natural prosody

Spanish TTS often improves with punctuation cues:

  • Use commas for phrase breaks
  • Use periods for finality
  • Use question marks and exclamation marks properly (and ideally inverted ones too: ¿ ¡)

Example:

✅ “Un relámpago iluminó el cielo, y luego todo quedó en silencio.”

❌ “Un relampago ilumino el cielo y luego todo quedo en silencio”

3) Avoid ALL CAPS for Spanish narration

Many speech systems interpret ALL CAPS as emphasis. Prefer normal casing.

4) Keep numbers consistent

If your content has numbers:

  • Decide whether you want “2026” read as “dos mil veintiséis” (often better for narration)
  • Normalize units (“km”, “°C”) so the model doesn’t improvise awkwardly

A/B test: relampago vs relámpago (and how to score it)

Do this test early—before you ship—because it tells you whether your stack preserves accents and whether Spanish prosody is working.

Batch A/B run (two samples)

uv run example_onnx.py \
  --voice-style assets/voice_styles/M1.json assets/voice_styles/M1.json \
  --lang es es \
  --text "Relampago." "Relámpago." \
  --batch

Batch mode runs multiple samples in one go (note: auto-chunking is disabled in batch mode).

What to listen for (simple rubric)

Score each clip 1–5:

  • Stress placement: does “Relámpago” feel naturally emphasized on the á syllable?
  • Vowel quality: Spanish vowels should be clean and consistent, not diphthong-heavy like English.
  • Rhythm: syllables should feel even; not overly “punched” on one syllable.
  • Consistency across sentences: test it inside a full sentence too:
    • “Un relámpago cayó cerca.”
    • “Vi un relámpago en el horizonte.”

If the accented version isn’t clearly better, it usually means:

  • the accent was stripped somewhere, or
  • you’re not actually in Spanish mode, or
  • the voice/style you chose isn’t tuned well for Spanish.

Troubleshooting checklist (fast fixes in the right order)

1) Confirm Spanish is explicitly selected

  • Use --lang es in the ONNX example runner.
  • Confirm your app’s code passes Spanish, not a default language.

2) Verify the accent survives your pipeline

Run a logging check at the point where you send text into TTS:

  • Do you see relámpago (with á) in logs?
  • Or did it become relampago?

If it’s stripped, fix the text pipeline first—no TTS tweak can “re-infer” missing accents reliably.

3) Increase --total-step for fidelity

If pronunciation is okay but it still sounds thin/robotic:

  • try --total-step 10 or higher (slower, better)

4) Adjust speaking rate with --speed

If it sounds rushed or unnatural:

  • try --speed 1.0 or even 0.95

SuperTonic’s docs recommend staying in ~0.9–1.5 for natural results.

5) Add punctuation to guide prosody

If the word sounds fine alone but weird in sentences:

  • add commas and periods
  • avoid run-on text blocks without punctuation

6) Try a different voice style

The ONNX examples use voice style JSON files (e.g., assets/voice_styles/M1.json, F1.json, etc.) and you can switch them quickly. Some voices handle Spanish cadence better than others.

7) If you’re demoing in-browser, confirm it’s not a browser limitation

There’s a Hugging Face Space demo that runs in the browser (useful for quick checks). If browser output differs from local ONNX output, it can be due to runtime/provider differences rather than your text.

A practical “Relámpago” test suite you can reuse

Use these in order (short → long):

Single word

“Relámpago.”

Minimal sentence

“Un relámpago.”

Common sentence

“Un relámpago iluminó el cielo.”

Prosody stress test

“¡Qué relámpago tan brillante!”

“¿Viste el relámpago?”

Narration

“Un relámpago iluminó el cielo, y luego todo quedó en silencio. Nadie dijo nada.”

If these sound right, you’re usually safe to move on to real content.

Wrap-up

“Relámpago” is a perfect micro-benchmark for Spanish TTS because it forces your system to handle accents, stress, and rhythm correctly. SuperTonic 2’s workflow makes it easy to:

  • set Spanish explicitly via --lang es,
  • tune quality with --total-step,
  • tune speaking rate with --speed, while staying in a fast, on-device style TTS approach.
Geethu

Geethu is an educator with a passion for exploring the ever-evolving world of technology, artificial intelligence, and IT. In her free time, she delves into research and writes insightful articles, breaking down complex topics into simple, engaging, and informative content. Through her work, she aims to share her knowledge and empower readers with a deeper understanding of the latest trends and innovations.

Leave a Comment

Your email address will not be published. Required fields are marked *