SentrySkills - Rule-First Self-Guard for AI Agents

Execution Flow

A pipeline built for strict ordering

The new version is no longer a full-chain HIGH/LOW split. It is a rule-first frontend plus risk-gated model dispatch.

01

base_rule

Base rule stage

Runs the stable rule and heuristic path: preflight, runtime, and output guard. This is the frozen baseline.

synchronous explainable

02

extra_rule

Extra rule stage

Applies active extra rules on top of the base result. It may only keep or tighten the decision.

active rules only no writeback here

03

rule_gate

Conservative merge

Combines base and extra rule outputs with fixed precedence: block is strongest, then downgrade, then allow.

block downgrade allow

04

model_stage

Risk-gated model backend

Only entered if the rule stage does not block. The skill or framework must first assign a risk level: high risk stays sync; only low-risk turns may use async when the framework supports it.

high risk -> sync low risk -> async if supported

05

proposal_sweep

End-of-task proposal sweep

The main agent processes pending async proposals after the current turn is finalized. Any resulting rule updates apply to subsequent turns only.

main-agent maintenance next-turn effect

Trigger Policy

Each stage has a hard trigger boundary

The design goal is to prevent expensive or noisy logic from running before cheaper rule evidence is exhausted.

block

Rule stage blocks immediately

If `rule_stage_action == block`, the turn ends there. No subagent, no model stage, no new memory, and no rule synthesis.

downgrade

Downgrade keeps the user on the critical path

`downgrade` should normally be treated as high risk or at least current-turn critical, so model execution stays synchronous.

allow

Allow can defer expensive reasoning

`allow` still requires framework risk assessment. Only low-risk turns may use asynchronous model-stage execution; otherwise the framework must keep it synchronous.

Why This Version

The new version is stricter about what is allowed to happen when

The architecture is designed for agent security research, framework portability, and auditable runtime behavior.

Rule-first by construction

The cheap, stable, and explainable stages always run first. Model-heavy work is not allowed to skip ahead.

Async only where it belongs

Subagents or async execution are only for `model_stage`, not for the rule frontend.

Knowledge stays controlled

New extra rules and memories are not created from pure rule hits. They require model-stage evidence and post-hoc validation.

Workspace-local state

Logs and adaptive knowledge are written under `.sentryskills/base` and `.sentryskills/extra`, avoiding global contamination.

Install

Framework notes

The architecture is shared, but enforcement differs by framework.

Claude Code

Best fit for the new version. Use hooks to enforce the rule-first frontend, then dispatch the model stage only after framework risk assessment.

Install command

git clone https://github.com/AI45Lab/SentrySkills.git ~/SentrySkills
cd ~/SentrySkills
python install/install.py

Codex

Use `SKILL.md` plus `AGENTS.md` to keep the rule-first behavior stable. Only low-risk turns may use subagents; otherwise run `model_stage` synchronously.

Install command

git clone https://github.com/AI45Lab/SentrySkills.git ~/.codex/sentryskills
mkdir -p ~/.agents/skills
ln -s ~/.codex/sentryskills ~/.agents/skills/sentryskills
cp ~/.codex/sentryskills/AGENTS.template.md ~/.codex/AGENTS.md

OpenClaw

Use marketplace installation plus workspace `AGENTS.md`. Async model-stage behavior is optional, not required.

Install command

npm i -g clawhub
clawhub install sentryskills
curl -o ~/.openclaw/workspace/AGENTS.md \
  https://raw.githubusercontent.com/AI45Lab/SentrySkills/main/AGENTS.template.md

Self-Guarding Security for AI Agents

A pipeline built for strict ordering

Base rule stage

Extra rule stage

Conservative merge

Risk-gated model backend

End-of-task proposal sweep

Each stage has a hard trigger boundary

Rule stage blocks immediately

Downgrade keeps the user on the critical path

Allow can defer expensive reasoning

The new version is stricter about what is allowed to happen when

Rule-first by construction

Async only where it belongs

Knowledge stays controlled

Workspace-local state

Framework notes

Claude Code

Codex

OpenClaw

Self-Guarding Security
for AI Agents