Research
How do humans make permission decisions for AI agents under time pressure?
Pilot Experiment: Permission Event Classification
I built Clide, a containerized execution environment where 17 AI operators coordinate in real-time — writing code, managing infrastructure, and dispatching tasks. Every action requires a permission decision: allow or deny.
Over 6 days, the system generated 15,459 permission events. Key findings:
- 96% accuracy with a 2-feature decision tree (destructive operations + command length)
- 100% deny recall — every dangerous command was caught
- 75.8% of requests timeout with an average wait of 6.7 minutes
- 3 distinct operator archetypes emerged: coordinators, builders, and infrastructure operators
- A regex classifier deployed in production matches 99.3% of 678 real decisions
The week I ran that experiment, Anthropic shipped auto mode — an LLM-based classifier solving the same problem. That wasn't a setback. It was validation. The question I'm pursuing: when does a lightweight, interpretable classifier suffice for safe automation, and when do you need the full power of an LLM-based approach?
Research Direction
The human bottleneck is the core problem. People approve commands they don't fully understand because the timeout is approaching. Operator behavior clusters into distinct archetypes, each with different risk profiles. This means one-size-fits-all auto-approval is wrong — classifiers should be role-aware.
This sits at the intersection of AI control, scalable oversight, and security. It's empirical, measurable, and I have a working system to study it.
Publications
- MODSIM World 2020 — First-author research on data de-identification and synthetic data security, including analysis of re-identification attacks.
Code
- Clide — Containerized execution environment for studying AI coding-agent behavior under configurable constraint policies.