Agents are
dynamical systems
We study how AI agents behave under uncertainty and build the systems that make that behavior calibrated, verifiable, and trainable.
Read our researchResearch
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Four frontier models achieve 94% alert detection but 45-82% false positive containment rates. The calibration gap is not in detection -- it is in restraint.
arXiv:2601.21083Jan 2026
Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation
Test-time training degrades to below random sampling (equivalent K < 1). Selecting by surprisal -- the model's least confident correct solutions -- matches oracle performance from existing logprobs.
PreprintFeb 2026