How to Fix GLM-5 Text Compression on NVIDIA NIM

If you’ve noticed that the GLM-5 model suddenly stops using paragraph breaks and spits out a giant “wall of text” in SillyTavern, you know how frustrating it can be. This issue often appears during “swipes” (regenerating responses) or when the API is under heavy load, turning perfectly good roleplay dialogue into an unreadable block. The good news is that this is a known connection issue between the API and SillyTavern, and it is almost always fixable with a few setting tweaks.

Quick Explainer: Why Does This Happen?

Network Congestion: The NVIDIA NIM API gets overloaded, causing data “packets” to smash together during delivery. Usually, newline characters (the invisible code for a paragraph break) get lost in the crush.
Streaming Errors: When “Text Streaming” is on, the API sends text in chunks. If a chunk ends right where a paragraph break should be, the next chunk might glue itself directly to the previous one, skipping the break.
Aggressive Cleanup: SillyTavern has a feature called “Trim Space” that tries to clean up messy text. When the API glitches slightly, this feature sometimes overreacts and deletes the few remaining spaces you actually wanted.

Immediate Fixes You Can Try

Start with the easiest solutions first. These settings are found directly inside your SillyTavern interface.

1. Disable “Trim Space” (Easiest Fix)

The most common culprit is a feature designed to help you, which backfires when the connection is spotty.

Go to User Settings (the person icon).
Look for the Advanced or Input/Output section.
Uncheck or toggle off Trim Space.

Why this works: When the API drops a packet, it might leave a messy/partial space. “Trim Space” sees this mess and deletes it entirely. Turning it off forces SillyTavern to keep whatever spacing arrives, even if it’s imperfect.

2. Adjust Your Prompting Strategy

Sometimes you have to “bully” the model into formatting correctly by making the instructions impossible to ignore.

Open your Advanced Formatting settings.
In the Post-History Instructions or Main Prompt, add a strict rule like:

Ensure you use double newline characters between every paragraph. Never produce dense blocks of text.

Pro Tip: Set strict word count limits (e.g., “Write under 200 words”). Shorter responses are less likely to suffer from network “chunking” errors.

3. Use a Regex Script (The “Nuclear” Option)

If settings don’t work, you can force SillyTavern to insert spaces automatically using a Regular Expression (Regex) script. This acts like a spell-checker that specifically looks for run-on sentences and fixes them before you even see them.

Go to the Extensions menu (the puzzle piece icon) → Regex.
Create a new script and set Placement to “Global” and Source to “AI Output”.
Use these values to force breaks after dialogue usage:
- Regex Pattern: /" )/g
- Replacement String: "\n\n
Note: This specific pattern looks for a closing quote followed by a space and forces a double paragraph break there.

Advanced Troubleshooting

If the basics above haven’t solved it, the issue might be strictly related to how the data is being sent over the network.

Turn Off Streaming

The error almost exclusively happens because of “Server-Sent Events” (SSE)—the tech that lets text type out letter-by-letter.

Go to your API Settings (the plug icon).
Uncheck Streaming.
Warning: You will have to wait for the entire message to generate before seeing anything. Additionally, some users report getting “Bad Request” errors on the NVIDIA NIM free tier when streaming is off. If that happens, you’ll need to turn streaming back on and rely on the Regex fix above.

Check Your Context Templates

Using the wrong “Instruct Mode” template can confuse the model, making it more likely to glitch out.

Go to Advanced Formatting (the ‘A’ icon).
Ensure your Context Template matches the model family. For GLM-5, standard templates like ChatML or Llama 3 Instruct often provide better stability than the default presets.

What NOT to Do

Don’t crank up the “Temperature”: Raising the temperature (randomness) generally makes formatting worse, not better. Keep samplers neutral (Temperature around 0.8, Min-P around 0.05).
Don’t assume the model is “dumb”: GLM-5 is a massive 744B parameter model. It knows how to write paragraphs. Re-rolling the same generation without changing settings usually won’t fix it because the issue is the connection, not the model’s intelligence.

Conclusion

The “wall of text” bug on GLM-5 is annoying, but it’s almost always a transport error rather than a broken model. The problem is usually that the NVIDIA NIM API is dropping the “newline” code during busy times. To get back to writing, disable “Trim Space” first. If that fails, set up the Regex script to force spacing on dialogue. These two steps resolve the vast majority of formatting collapse issues.

Geethu

Geethu is an educator with a passion for exploring the ever-evolving world of technology, artificial intelligence, and IT. In her free time, she delves into research and writes insightful articles, breaking down complex topics into simple, engaging, and informative content. Through her work, she aims to share her knowledge and empower readers with a deeper understanding of the latest trends and innovations.