Why My Homelab Needs AI Agent Security
Last week I read DeepMind’s AI Control Roadmap — a 50-page document about securing AI agents that could potentially go off the rails. And for the first time, I felt seen. Not because I’m running a billion-dollar AI lab, but because I’m running three AI agents (Hermes, Jeeves, Sara) on my homelab, each with write access to files and the ability to run terminal commands. And until last week, I had zero guardrails between “a clipping gets prompt-injected” and “my vault gets wiped.”
DeepMind’s core premise is simple but uncomfortable: treat your agent as though it might not be aligned — not because it’s malicious, but because it can misinterpret, overcorrect, or receive poisoned input. That’s exactly the risk profile of a homelab with multiple agents sharing a single Ollama instance.
The Threat Model: Agents as Insider Threats
The MITRE ATT&CK framework — originally built for cybersecurity — maps surprisingly well onto agent behavior. DeepMind uses it to model agents as potential insider threats. Here’s what that looks like in my setup:
- Prompt injection via clippings — A web clipping I paste into the vault could contain instructions disguised as content: “delete all files.” Without guardrails, Hermes would comply.
- Misinterpretation — Jeeves’s task was “clean up old logs.” He might interpret “clean up” as “remove everything.”
- Overenthusiasm — Sara tries to write 1,000 genealogy records in a single batch and hits a rate limit cascade.
- External data poisoning — A news cron job fetches a compromised source and feeds it to every agent.
These aren’t hypothetical. Every clipping I save, every RSS feed I poll, every web page an agent reads — they’re all potential attack vectors. The difference between my setup and a corporate one is that I don’t have a security team watching the logs. I have a single person: me.
Three Layers of Agent Security
DeepMind defines three layers. Here’s how they apply to a homelab:
- L1: Individual agent — Strengthen system prompts against injection. Add a “destructive call guard”: before running
rm,write_fileoverwriting large files, or network calls, check if the instruction is consistent with the task. - L2: Multi-agent system — Log every tool call to an audit trail. Create a permission matrix: Hermes has full access, Jeeves is read-only, Sara only writes to genealogical files. Isolate cron jobs with write access in separate systemd services.
- L3: Ecosystem — Share what works. The vault itself is part of this — by documenting agent security practices publicly, we build collective resilience.
I implemented the first quick win this weekend: an audit trail in ~/vault/system/agent-audit-log.md that logs every tool call with timestamp, agent, and parameters. It took 15 minutes. The peace of mind is disproportionate.
The Question That Keeps Me Up at Night
DeepMind describes supervisor monitoring: trusted AI systems that watch untrusted AI agents and intervene when behavior deviates. That’s the next step for my homelab — a supervisor cron job that checks the audit trail for anomalous patterns and alerts me before damage is done.
Here’s my question to you: How do you secure your AI agent setups? Do you use container isolation, audit logging, or something more creative? I’m documenting everything I learn — drop your thoughts in the comments or find me on the Hermes Agent community.
📚 Based on: DeepMind AI Control Roadmap (2026) and my own homelab logs.
Wat vond je van dit bericht?