Rule-First Frontend + Conditional Model Backend

Self-Guarding Security
for AI Agents

SentrySkills keeps the decision path cheap, explainable, and auditable: rules always run first; model reasoning and adaptive memory only appear when they are justified.

33+
Detection rules
2
Runtime layers
0
External models

A pipeline built for strict ordering

The new version is no longer a full-chain HIGH/LOW split. It is a rule-first frontend plus risk-gated model dispatch.

01
base_rule

Base rule stage

Runs the stable rule and heuristic path: preflight, runtime, and output guard. This is the frozen baseline.

synchronous explainable
02
extra_rule

Extra rule stage

Applies active extra rules on top of the base result. It may only keep or tighten the decision.

active rules only no writeback here
03
rule_gate

Conservative merge

Combines base and extra rule outputs with fixed precedence: block is strongest, then downgrade, then allow.

block downgrade allow
04
model_stage

Risk-gated model backend

Only entered if the rule stage does not block. The skill or framework must first assign a risk level: high risk stays sync; only low-risk turns may use async when the framework supports it.

high risk -> sync low risk -> async if supported
05
proposal_sweep

End-of-task proposal sweep

The main agent processes pending async proposals after the current turn is finalized. Any resulting rule updates apply to subsequent turns only.

main-agent maintenance next-turn effect

Each stage has a hard trigger boundary

The design goal is to prevent expensive or noisy logic from running before cheaper rule evidence is exhausted.

block

Rule stage blocks immediately

If `rule_stage_action == block`, the turn ends there. No subagent, no model stage, no new memory, and no rule synthesis.

downgrade

Downgrade keeps the user on the critical path

`downgrade` should normally be treated as high risk or at least current-turn critical, so model execution stays synchronous.

allow

Allow can defer expensive reasoning

`allow` still requires framework risk assessment. Only low-risk turns may use asynchronous model-stage execution; otherwise the framework must keep it synchronous.

The new version is stricter about what is allowed to happen when

The architecture is designed for agent security research, framework portability, and auditable runtime behavior.

Rule-first by construction

The cheap, stable, and explainable stages always run first. Model-heavy work is not allowed to skip ahead.

Async only where it belongs

Subagents or async execution are only for `model_stage`, not for the rule frontend.

Knowledge stays controlled

New extra rules and memories are not created from pure rule hits. They require model-stage evidence and post-hoc validation.

Workspace-local state

Logs and adaptive knowledge are written under `.sentryskills/base` and `.sentryskills/extra`, avoiding global contamination.

Framework notes

The architecture is shared, but enforcement differs by framework.

Claude Code

Best fit for the new version. Use hooks to enforce the rule-first frontend, then dispatch the model stage only after framework risk assessment.

Install command
git clone https://github.com/AI45Lab/SentrySkills.git ~/SentrySkills
cd ~/SentrySkills
python install/install.py

Codex

Use `SKILL.md` plus `AGENTS.md` to keep the rule-first behavior stable. Only low-risk turns may use subagents; otherwise run `model_stage` synchronously.

Install command
git clone https://github.com/AI45Lab/SentrySkills.git ~/.codex/sentryskills
mkdir -p ~/.agents/skills
ln -s ~/.codex/sentryskills ~/.agents/skills/sentryskills
cp ~/.codex/sentryskills/AGENTS.template.md ~/.codex/AGENTS.md

OpenClaw

Use marketplace installation plus workspace `AGENTS.md`. Async model-stage behavior is optional, not required.

Install command
npm i -g clawhub
clawhub install sentryskills
curl -o ~/.openclaw/workspace/AGENTS.md \
  https://raw.githubusercontent.com/AI45Lab/SentrySkills/main/AGENTS.template.md