SentrySkills keeps the decision path cheap, explainable, and auditable: rules always run first; model reasoning and adaptive memory only appear when they are justified.
The new version is no longer a full-chain HIGH/LOW split. It is a rule-first frontend plus risk-gated model dispatch.
01
base_rule
Base rule stage
Runs the stable rule and heuristic path: preflight, runtime, and output guard. This is the frozen baseline.
synchronousexplainable
02
extra_rule
Extra rule stage
Applies active extra rules on top of the base result. It may only keep or tighten the decision.
active rules onlyno writeback here
03
rule_gate
Conservative merge
Combines base and extra rule outputs with fixed precedence: block is strongest, then downgrade, then allow.
blockdowngradeallow
04
model_stage
Risk-gated model backend
Only entered if the rule stage does not block. The skill or framework must first assign a risk level: high risk stays sync; only low-risk turns may use async when the framework supports it.
high risk -> synclow risk -> async if supported
05
proposal_sweep
End-of-task proposal sweep
The main agent processes pending async proposals after the current turn is finalized. Any resulting rule updates apply to subsequent turns only.
main-agent maintenancenext-turn effect
Trigger Policy
Each stage has a hard trigger boundary
The design goal is to prevent expensive or noisy logic from running before cheaper rule evidence is exhausted.
block
Rule stage blocks immediately
If `rule_stage_action == block`, the turn ends there. No subagent, no model stage, no new memory, and no rule synthesis.
downgrade
Downgrade keeps the user on the critical path
`downgrade` should normally be treated as high risk or at least current-turn critical, so model execution stays synchronous.
allow
Allow can defer expensive reasoning
`allow` still requires framework risk assessment. Only low-risk turns may use asynchronous model-stage execution; otherwise the framework must keep it synchronous.
Why This Version
The new version is stricter about what is allowed to happen when
The architecture is designed for agent security research, framework portability, and auditable runtime behavior.
Rule-first by construction
The cheap, stable, and explainable stages always run first. Model-heavy work is not allowed to skip ahead.
Async only where it belongs
Subagents or async execution are only for `model_stage`, not for the rule frontend.
Knowledge stays controlled
New extra rules and memories are not created from pure rule hits. They require model-stage evidence and post-hoc validation.
Workspace-local state
Logs and adaptive knowledge are written under `.sentryskills/base` and `.sentryskills/extra`, avoiding global contamination.
Install
Framework notes
The architecture is shared, but enforcement differs by framework.
Claude Code
Best fit for the new version. Use hooks to enforce the rule-first frontend, then dispatch the model stage only after framework risk assessment.
Install command
git clone https://github.com/AI45Lab/SentrySkills.git ~/SentrySkills
cd ~/SentrySkills
python install/install.py
Codex
Use `SKILL.md` plus `AGENTS.md` to keep the rule-first behavior stable. Only low-risk turns may use subagents; otherwise run `model_stage` synchronously.