How To

How to Fix GLM 5 Tool Calling Errors in NanoGPT

By Geethu 9 min read

GLM 5 tool calling errors in NanoGPT usually come from one of four places: a request shape mismatch, broken message replay after a tool call, streamed arguments being assembled incorrectly, or provider-side JSON corruption before your app ever sees a valid function payload. NanoGPT’s NanoGPT chat docs follow the OpenAI-style tool format, but Z.AI’s Z AI function docs still describe tool_choice as auto-only. That mismatch matters more than it looks, especially when you run GLM through a compatibility layer and assume every OpenAI option behaves the same way.

The good news is that most failures are fixable without abandoning the model. If you tighten the schema, stop forcing fancy tool modes, replay the assistant and tool messages exactly, and add one defensive JSON validation step before execution, GLM becomes much less likely to spiral into invalid calls or retry loops. If you care about safer agent patterns beyond this one bug, ToolSwift’s guardrailed vibe coding playbook is worth reading alongside this article.

What is actually breaking

NanoGPT’s chat completions endpoint expects tools in the standard function shape, persists assistant-side tool_calls, and expects your app to answer with a role: "tool" message that references the exact tool_call_id. It also drops tool replies that point to an unknown call ID. On top of that, NanoGPT rejects malformed or oversized tool specs with errors such as invalid_tool_spec, invalid_tool_spec_parse, and tool_spec_too_large. In plain English, even a small mismatch in your replay loop can look like “GLM is bad at tools” when the real problem is your conversation state machine.

There is also a separate class of failures that are not your fault. Multiple public reports from February 2026 show GLM 5 producing malformed or truncated JSON tool arguments in OpenCode when routed through NVIDIA NIM, leading to parser failures and endless retries. A related vLLM issue tied similar corruption to speculative decoding with MTP enabled. That means some “NanoGPT tool calling bugs” are really upstream transport or serving bugs wearing a NanoGPT mask. Check the open bug reports before you spend a day rewriting perfectly good application code.

The fastest fixes that usually work

1 Keep tool_choice on auto for GLM

This is the first thing I would change. NanoGPT supports tool_choice values like auto, none, required, and named functions on its OpenAI-compatible endpoint. Z.AI’s own function-calling documentation, though, still says tool_choice is auto-only. So if your code forces required or pins a named function because that works fine on another model, you may be creating a model-proxy mismatch right at the top of the request. Start with auto until the flow is stable.

2 Turn off parallel tool calls while debugging

NanoGPT can forward parallel_tool_calls to providers that support it, but that adds complexity you do not need during diagnosis. One malformed argument string is annoying. Two half-assembled calls arriving in the same turn are worse. Get a single-call path working first, then re-enable parallel execution only if you truly need it. For broader agent design on smaller or less predictable models, the lessons in small local agents apply here too: simpler tool surfaces usually beat clever orchestration.

3 Replay assistant and tool messages exactly

After GLM chooses a function, do not flatten the call into a plain assistant message and do not invent your own IDs. Persist the assistant message with its tool_calls array, execute the function, then send a role: "tool" message using the same tool_call_id. This is the part many wrappers quietly get wrong. A missing call ID, a changed ID, or a tool result inserted as a user message will break the loop even if the model chose the right function name and arguments. NanoGPT is explicit about this behavior in its docs.

4 Shrink and simplify the tool schema

Huge tool catalogs and verbose JSON schemas make errors more likely. NanoGPT caps the serialized tools payload at 200 KB, and once you start packing long descriptions, nested objects, enums, and optional fields into several tools, you raise the odds of either a hard validation failure or a soft model failure where the arguments come back messy. Use fewer tools, shorter descriptions, fewer optional properties, and tighter required fields. When possible, split one “do everything” tool into small single-purpose tools with obvious names.

5 Validate JSON before you execute anything

If the model returns malformed arguments, do not pass them straight into your business logic. Parse them first. If parsing fails, send a repair turn back to the model with a narrow instruction like: “Return only valid JSON for the existing function schema. Do not call another tool yet.” This one guardrail prevents a bad output from becoming a bad side effect. It also makes your logs much clearer because you can separate model-formatting failures from actual tool-runtime failures.

6 Be careful with streamed tool arguments

Z.AI supports streaming tool parameters with stream=true and tool_stream=true. That can reduce latency, but it also means your client has to concatenate delta.tool_calls[*].function.arguments correctly across chunks. If your parser tries to read partial JSON too early, you will blame GLM for a bug that actually lives in your stream assembler. While debugging, the safe move is to disable streaming or buffer the full argument string before parsing. The tool streaming guide and Z.AI’s migration notes both call this out.

7 Treat provider routing as a suspect, not a constant

If the same code works on one OpenAI-compatible backend and fails only on another, assume the backend matters. Public issue reports around GLM 5 show malformed JSON and retry loops when certain serving stacks or decoding settings are involved. One vLLM report specifically tied broken tool calls to MTP speculative decoding and showed the problem disappearing when that mode was removed. If you self-host, proxy, or can choose providers, test the exact same request against a second route before rewriting the app.

A NanoGPT request shape that is less likely to fail

This is the stable pattern to aim for. It is boring by design, and that is exactly why it works.

POST https://nano-gpt.com/api/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "model": "z-ai/glm-5",
  "messages": [
    {
      "role": "system",
      "content": "You may call tools only when required. When you call a tool, produce valid JSON arguments that match the schema exactly."
    },
    {
      "role": "user",
      "content": "Read config.json and tell me which API base URL is set."
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "read_file",
        "description": "Read one file from the workspace.",
        "parameters": {
          "type": "object",
          "properties": {
            "path": { "type": "string" }
          },
          "required": ["path"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": false,
  "stream": false
}

Notice what is missing: no forced named tool, no giant schema, no parallel calls, no streamed partial arguments, and no extra wrapper metadata that could confuse an OpenAI-style client. This is very close to the baseline described in the OpenAI function guide, which is still a useful sanity check when you are debugging an OpenAI-compatible endpoint.

A safe execution loop for GLM 5 in NanoGPT

  1. Send one user request with a minimal tool set.
  2. Wait for the assistant response and inspect tool_calls.
  3. Parse function.arguments as JSON before executing the tool.
  4. If parsing fails, send a repair turn instead of executing anything.
  5. If parsing succeeds, run the tool and append a role: "tool" message with the exact same tool_call_id.
  6. Send the updated conversation back to the same endpoint.
  7. Stop after a small retry budget so a malformed loop does not burn tokens forever.

If you are building something that touches secrets, local files, or shell commands, pair that loop with narrower permissions and audit logging. Tool bugs are frustrating; tool bugs with side effects are worse. ToolSwift’s guide on zero trust local AI is useful here because the same design principle applies: assume the model can make formatting mistakes and contain the blast radius anyway.

Common mistakes that look like model failures

  • Using a schema the model can technically read but cannot reliably fill. Deeply nested objects and too many optional fields increase argument drift.
  • Forcing a tool call on a model route that behaves best with auto. This is especially suspicious with GLM because the proxy and the model docs do not describe the same capability surface.
  • Parsing streamed chunks as final JSON. Tool argument fragments are not valid until the full sequence arrives.
  • Dropping the original assistant tool_calls object. The next turn loses state, and the tool reply no longer matches a known call ID.
  • Blaming NanoGPT for provider-side corruption. The 2026 reports around GLM 5 malformed JSON show that routing and serving layers can absolutely be the real problem.

When you should switch tactics instead of fighting the bug

If your workflow depends on many tools, heavy streaming, or long multi-step code-agent sessions, stop trying to make a shaky setup behave like a perfect one. First test plain stream: false. Then test a smaller tool set. Then test a different route or provider with the exact same payload. If the issue disappears, you have isolated the problem. If it does not, your client loop is the next suspect.

For prompt-side hardening, a browser-based helper like ToolSwift’s prompt designer tool can help you keep system prompts and repair prompts consistent. A short, repeatable prompt that says “Use tools only when needed, produce strict JSON, and never invent keys outside the schema” often works better than a long agent manifesto.

A practical default setup

If I were stabilizing GLM 5 in NanoGPT today, I would use this default:

  • Endpoint: NanoGPT chat completions, not a custom wrapper unless you really trust it
  • tool_choice: auto
  • parallel_tool_calls: false
  • stream: false until everything is stable
  • tools: small, single-purpose, flat JSON schemas
  • validation: parse arguments before execution, reject malformed JSON
  • retries: one repair retry, then fail closed
  • routing test: compare one alternate provider or backend before blaming your app

That setup is not flashy. It is just the shortest path to finding out whether your bug is caused by your schema, your replay loop, your stream parser, or the backend serving GLM 5. Once you know which one it is, the fix stops being mysterious and starts being mechanical.

geethu
Geethu

Geethu is an educator with a passion for exploring the ever-evolving world of technology, artificial intelligence, and IT. In her free time, she delves into research and writes insightful articles, breaking down complex topics into simple, engaging, and informative content. Through her work, she aims to share her knowledge and empower readers with a deeper understanding of the latest trends and innovations.

View profile and recent articles

Leave a Comment

Your email address will not be published. Required fields are marked *