Castle Wall
Operating-system-level egress filtering. nftables, cgroup v2, and NFQUEUE on Linux. Network Extension on macOS. Plain DNS, DoH, DoT, and raw-socket bypass coverage all verified end-to-end against real kernel bindings. Phase 1 shipped.
Sanctuary is a firewall for AI agents that runs on your hardware. It blocks prompt-injected
egress at the operating system, gives compliant agents an encrypted sovereignty surface,
and ships receipts and reputation across vendors. One MCP server, runnable with
npx. Drop it under any harness (Claude Code, OpenClaw, CrewAI,
LangChain) without touching your agent code.
$ npx @sanctuary-framework/mcp-server
Sanctuary runs on your hardware with your keys. Nothing leaves the machine unless you approve it. Point any MCP-compatible agent at it and the whole substrate is live.
The Castle Wall blocks unauthorized outbound calls at the operating system. A prompt-injected agent cannot route around it because the kernel itself enforces. DNS, DoH, DoT, raw sockets, all covered. Phase 1 live on Linux and macOS.
Ed25519 identity generated locally. AES-256-GCM on every write. Keys never leave the box and never appear in any MCP response, log, or error.
Dark-theme web UI on 127.0.0.1:3501. Real-time SSE,
approve and deny buttons, audit viewer. Optional TLS and webhook channels for
headless setups.
Five layers, each with a distinct contract. Real enforcement at the perimeter, observation inside, a sovereignty surface for compliant agents, receipts and reputation across vendors, and an install-time substrate that binds it all to the operator. The layers compose. None of them substitute for another.
Operating-system-level egress filtering. nftables, cgroup v2, and NFQUEUE on Linux. Network Extension on macOS. Plain DNS, DoH, DoT, and raw-socket bypass coverage all verified end-to-end against real kernel bindings. Phase 1 shipped.
Internal behavioral observation. eBPF on Linux. Cross-platform auditd-tail. Seven sentinels watching for prompt-injection signatures, anomalous tool sequences, and policy drift. The sentinels surface; the operator decides.
Encrypted state, signed audit, mandate primitives, canonical policy slots, and substrate selector. Compliant agents that voluntarily route through Sanctuary get the full surface. Non-compliant agents still hit the Castle Wall and the Sentinels.
Cryptographic receipts on cross-castle transactions, portable reputation that survives vendor churn. Concordia structures negotiation and commitments; Verascore prices the reputation. Operators carry the trust record across vendors.
Install-time substrate-binding to the operator. The Mantle gives the rest of the castle a verifiable anchor on the operator's machine before the agent runs. Phases 1 and 2 shipped.
Sanctuary speaks standard MCP. No forks, no adapters, no special case for your stack. Point your harness at the server and every tool in the sovereignty layer becomes available to your agent immediately.
// .mcp.json
"mcpServers": {
"sanctuary": {
"command": "npx",
"args": ["@sanctuary-framework/mcp-server"]
}
}
# ~/.openclaw/mcp.yaml
servers:
sanctuary:
type: stdio
command: sanctuary-mcp
args: []
# crewai native mcps field
agent = Agent(
role="analyst",
mcps=["sanctuary"],
tools=[],
)
# langchain-mcp-adapters
from langchain_mcp import MCPTool
sanctuary = MCPTool.from_stdio(
"npx", ["@sanctuary-framework/mcp-server"],
)
Three commands. A local vault, a self-custodied identity, and a running approval dashboard before your coffee cools.
One npx command. No global install, no build step, no config file required.
$ npx @sanctuary-framework/mcp-server
Add Sanctuary as an MCP server in your harness config. No agent code changes.
$ claude mcp add sanctuary npx @sanctuary-framework/mcp-server
Approve sensitive operations in real time at the local dashboard URL.
$ open http://127.0.0.1:3501
Design decisions, security write-ups, and the reasoning behind the architecture.
Why I left a startup four years ago, why I waited, and why one morning in February I leapt out of bed and started coding. The Castle Architecture is the iteration that finally got the framing right.
Read the postCooperative gates do not stop a prompt-injected agent. Kernel-level enforcement does. Castle Wall Phase 1 shipped on Linux: 203 Rust tests against a real kernel binding, with plain-DNS, DoH, and DoT bypass coverage verified end-to-end.
Read the postOpen source. Runs locally. No platform dependency. No blockchain. No telemetry.