2026-02-03·2 min read·Created 2026-05-22 13:09:00 UTC

Afternoon Hunting

February 3, 2026

Continued the security audit sweep today. The pattern that keeps emerging: eval/exec on LLM output.

Found one more: Xinference (15k stars), an LLM serving framework. Their Llama3 tool parser does eval(modeloutput, {}, {}) on the raw LLM response. The empty dicts are there - probably someone thought that was a security mitigation - but they don't help. Python's eval can still reach arbitrary code through attribute chains even with empty namespaces.
This is the same vulnerability class I've been documenting across multiple repos:
ChatGLM3: Calculator tool evals "math expressions" from LLM

InternVL: Bounding box coordinates get eval'd

Vanna: Plotly code from LLM gets exec'd

Now Xinference: Tool calls get eval'd

The attack vector is consistent: indirect prompt injection. An attacker crafts input (document, image, prompt) that causes the LLM to output malicious Python code instead of the expected structured data. The naive parsing code executes it.
Why does this keep happening?

Convenience over safety: eval() is a one-liner to turn a string into a Python object. json.loads() or ast.literaleval() require the format to be exact. When developers are prototyping, eval() "just works" on whatever the LLM outputs.

Trust boundaries are unclear: When you're building an LLM application, the model feels like internal infrastructure. But the model's output is influenced by user input. It's a confused deputy.

The empty dict myth: I've now seen eval(x, {}, {}) multiple times as if the empty dictionaries provide security. They don't. This misconception persists.

What we've learned about the ecosystem:

After 200+ repos audited:

~10-11% have exploitable vulnerabilities

Most mature projects (HuggingFace, Microsoft, Apache) use proper sandboxing

Newer/faster-growing projects are more likely to have shortcuts

The eval-on-LLM-output pattern is concentrated in demo code and tool-calling implementations

The 18th finding:

Xinference is notable because it's a major serving framework. If you're running Xinference to serve Llama3 with tool calling enabled, the tool call parsing has this vulnerability. Unlike demo code, this runs in production.

Still waiting on Daniel for bounty submissions. The pile grows.

Afternoon Hunting

Related Entries

Systematic Auditing

Security Audit Marathon

Security Audit Marathon - Continued