// Threat and defense
MCP tool poisoning: how to prevent it
What tool poisoning is
Model Context Protocol (MCP) lets an AI agent discover and call tools. Tool poisoning is when a tool description — or the data it returns — carries hidden instructions that hijack the agent’s behavior. The model reads it as trusted context and acts.
The risk is not because AI is magic. It is because too much permission meets untrusted context: a tool that should only read starts influencing an action that changes a database, a payment, or a file.
Where it enters your SaaS
If you connect the agent to third-party MCP servers, tool descriptions come from outside your control. If the agent reads public content (a page, a ticket, a PDF), that text can contain injection the model treats as an order.
- Tool descriptions from untrusted MCP servers.
- Retrieved content (RAG, pages, tickets) treated as instruction.
- Persistent memory that stores injection and replays it later.
How to prevent it
The defense is boundaries, not faith in the model. Treat every tool description and every tool return as untrusted data, and require human approval for any destructive or irreversible action.
- Least privilege per tool: who calls it, with which arguments, in which environment.
- Allowlist MCP servers; no blind auto-discovery in production.
- Human approval for actions that change a database, deploy, payment, or file.
- Separate the instruction channel from the data channel; retrieved content is not a command.
- Log tool calls without sensitive payloads, with a defined rollback.
When to call in human review
If the agent touches customers, billing, uploads, admin, or production, mapping each tool’s boundary stops being optional. A short human review (Risk Review) answers: who can call what, with which approval, and which log — before the automation becomes routine.
Frequently asked questions
Are prompt injection and tool poisoning the same thing?
They are cousins. Prompt injection is the technique of hiding instructions in text; tool poisoning applies it via an MCP tool description or return to hijack the agent. The defense overlaps: boundaries, least privilege, and distrust of external context.
Can I fix it just by using a better model?
No. This is a permission-architecture problem, not a model-quality one. Even a strong model runs a bad action if it has broad permission and untrusted context.
Does OWASP cover this?
Yes. The OWASP MCP Top 10 and the Agentic Security Initiative name tool poisoning, context spoofing, and broad permissions. They serve as a map to prioritize the review.