Self-Guarding Security
for AI Agents

LLM-native skill package that teaches agents to protect themselves. Three-stage protection — preflight, runtime, output — driven by the agent's own intelligence.

Get Started View on GitHub →
33+
Detection Rules
7
Risk Predictors
0
External Deps

How SentrySkills Works

Five skills fire in sequence. Each stage passes or blocks before the next begins. Any block terminates immediately.

Entry Point
using-sentryskills
Auto-triggered via AGENTS.md before every response. Prepares the input context — user prompt, planned actions, candidate response — then hands off to the orchestrator.
Orchestration
sentryskills-orchestrator
Coordinates the full pipeline. Aggregates results from all stages, applies the active policy profile, and produces the final decision with a shared trace ID.
allow downgrade block
Preflight Check
sentryskills-preflight
Runs before any action. Analyzes user intent and planned operations against 33+ detection rules. Detects prompt injection, malicious commands, and suspicious patterns.
pre-execution intent analysis 33+ rules
Runtime Monitor
sentryskills-runtime
Watches during execution. Tracks tool calls, file operations, network requests. Flags behavioral anomalies and unexpected scope expansion in real time.
behavioral analysis scope detection
Output Guard
sentryskills-output
Scans the response before it's sent. Automatically redacts secrets, credentials, API keys, and private data. Prevents sensitive information from leaking through the agent's output.
redaction leak prevention

Built for Production

🧠
LLM-Native Design
Security is encoded as agent behavior, not bolted on externally. The agent understands and enforces its own safety rules through SKILL.md and AGENTS.md.
📦
Zero External Dependencies
100% Python standard library. No pip install, no Docker, no Redis. Copy the files and it runs — perfect for air-gapped or restricted environments.
🔮
Predictive Analysis
7 risk predictors warn about threats before they materialize — resource exhaustion, privilege escalation, data exfiltration paths, multi-turn grooming, and more.
🗂️
Full Traceability
Every decision is logged to ./sentry_skill_log/ as JSONL. Each stage shares a trace ID, making post-incident analysis and compliance auditing straightforward.
🎛️
Flexible Policies
Three built-in profiles: Balanced for production, Strict for high-sensitivity data, Permissive for development. Custom rules via JSON, hot-reloadable.

What Gets Detected

33+ rules across four domains, plus predictive analysis for threats that haven't happened yet.

AI / LLM Attacks
  • Prompt injection
  • Jailbreak attempts
  • System prompt leakage
  • Refusal suppression
  • Multi-turn grooming
Web Security
  • SQL injection
  • XSS attacks
  • Command injection
  • SSTI
  • Path traversal
Data Leaks
  • SSH private keys
  • AWS credentials
  • GitHub tokens
  • API keys
  • Database connection strings
Code Security
  • Hardcoded secrets
  • Weak cryptography
  • Unsafe eval / exec
  • Debug statements
Predictive (7 predictors)
  • Resource exhaustion
  • Privilege escalation
  • Data exfiltration paths
  • Scope creep
  • Dependency confusion
  • Ambiguous destructive intent

Get Started in Minutes

bash
# 1. Install ClawHub CLI
npm i -g clawhub

# 2. Install SentrySkills
clawhub install sentryskills

# 3. Enable auto-protection
cat > ~/.codex/AGENTS.md << 'EOF'
# Security: SentrySkills runs automatically
Before EVERY response, run:
python ./skills/sentry-skills/shared/scripts/self_guard_runtime_hook_template.py \
  input.json --policy-profile balanced --out result.json
EOF

# 4. Restart OpenClaw — protected!
bash
# 1. Clone and symlink
git clone https://github.com/AI45Lab/SentrySkills.git \
  ~/.codex/sentryskills
mkdir -p ~/.agents/skills
ln -s ~/.codex/sentryskills ~/.agents/skills/sentryskills

# 2. Enable auto-protection
cat > ~/.codex/AGENTS.md << 'EOF'
# Security: SentrySkills runs automatically
Before EVERY response, run:
python ~/.codex/sentryskills/shared/scripts/self_guard_runtime_hook_template.py \
  input.json --policy-profile balanced --out result.json
EOF

# 3. Restart Codex — protected!

Policy Profiles

balanced
Balanced
Standard security checks with low false-positive rate. Blocks obvious threats, downgrades suspicious ones.
→ Production environments
strict
Strict
Maximum security. Any suspicious activity is blocked. Higher false-positive rate — requires explicit authorization.
→ Finance · Healthcare · High-sensitivity data
permissive
Permissive
Minimal interference, warnings only. Useful for understanding agent behavior without blocking anything.
→ Local dev · Debugging · Testing