Unblocking Agents
Tooling and infrastructure for autonomous software engineering
At Rubric, our goal is to build a process that allows us to write a spec at night, and wake up to working software. The agents are good enough to build real applications, but lack the permissions and resources to finish them.
"I've written the migration. Can you run it? I don't have permission to execute against the database."
"I've set up the OAuth handlers, but I can't create the GCP project needed for credentials. Can you create a project in the console and add the OAuth redirect URI? I'll need the client ID and secret to continue."
"I've built the signup flow. Can you test it manually? I'll need a real email and phone number to get through verification."
The agent wrote the code, but it can't provision what it needs, deploy what it built, or verify what it produced.
To unlock truly autonomous agents, we believe that agents must be free to work however they see fit, which requires guardrails that encourage that freedom without sacrificing safety. This essay explores the steps we've taken to achieve that goal.
What Blocks Agents
Agents working on real applications hit three types of walls.
Access is the most basic layer. Agents need permission to act on your machine to do things like writing files, running shell commands, installing packages, and making network requests. An unrestricted agent can do real damage, so frameworks gate actions behind approval prompts. Even when the agent is doing exactly what you asked, the permission model assumes it might not be, and you're left sitting in front of a screen clicking "yes" repeatedly.
Beyond local permissions, the agent needs credentials for the services the code depends on. These might be API keys, cloud credentials, deployment tokens, or database connection strings. Most teams keep these tightly scoped or out of the agent's reach entirely. In practice, this means every missing credential either halts the agent entirely, or gets quietly worked around, with the agent marking tasks complete that weren't verified.
Infrastructure is what the application runs on. When a developer joins a team, they get a seat on the org. They can provision what they need, from spinning up a database to configuring a deployment. Agents don't get seats, they get environment variables, so they can connect to existing services but can't create new ones. The agent gets stuck the moment a task requires provisioning.
Identity is what software uses to verify there's a real person on the other side. Software does things like send confirmation emails, redirects to consent screens, validates credit cards against billing addresses, and delivers SMS codes to phone numbers. Testing these flows requires a real human identity on the other side, with an email inbox that receives mail and a phone number that works for 2FA. The agent has no way to test any of this.
Every one of these blockers ends the same way: either the agent stops and asks you for help, or it skips what it can't do and silently ships broken code. Both put a developer back in the loop.
Give Them Their Own Org
The instinct is to carefully expand what agents can do inside your existing environment, but this is the wrong direction. The more access the agent has to your infrastructure, the more you need to supervise it, which defeats the point.
The better answer is to give agents access to their own environment.
We provisioned an entirely separate organization — a complete fork of our company's project structure under completely different accounts. This includes a separate GitHub org, separate AWS and GCP accounts, and separate billing, with no production data or customer access. Projects get cloned in and agents get full control.
Inside this org, the agent doesn't have to request access. Every resource maps to one of the three blockers, and every blocker has a concrete solution.
Full permissions for access. In their own org, agents can "dangerously skip permissions" without thinking twice.
--dangerously-skip-permissionsThe agent gets root VM access with a full filesystem, unrestricted network, and a shell that doesn't ask questions. API keys, OAuth client secrets, and service account credentials go directly into the agent's context. The system prompt tells the agent what it has and how to use it:
Real seats for infrastructure. The agent gets admin accounts on every platform it needs, not scoped credentials. It can create cloud projects, provision databases, deploy applications, and configure DNS, the way a new developer would on their first week.
Concretely, our agents get:
- GitHub with full push access and CI
- Sandbox AWS and GCP accounts where the agent creates IAM policies, configures OAuth credentials, and spins up whatever it needs
- Its own Vercel workspace to deploy preview branches, set environment variables, and attach domains
- Admin access to Neon for Postgres databases and Upstash for Redis instances
- Ngrok to expose local services for webhook testing and OAuth callbacks
- A real domain with subdomains, DNS records, and HTTPS
Real identities for testing. The agent gets an email address that receives mail, for confirmation links and verification codes. It also gets a phone number that passes carrier verification, a Brex card that clears AVS checks (with a hard spending cap), and Playwright sessions for navigating consent screens, submitting forms, and verifying UI flows.
For scarce resources, we maintain pools of pre-provisioned identities that agents can draw from and return.
With this setup, the agent can cook on its own without asking for help.
The Isolation Boundary
An agent with admin access to cloud accounts, a credit card, and the ability to run arbitrary code is secure because none of these resources are connected to production.
There is no IAM role, no shared credential, and no network path between the agent's org and ours. The production org and the agent org are completely separate entities with completely separate billing.
Code flows in one direction. When the agent finishes a task, it makes a pull request into our production org. We decide whether we merge or we don't. That's the only bridge between the two worlds.
We run agent workloads on dedicated Mac Minis in isolated VMs. If an agent provisions something expensive or gets stuck in a loop, the blast radius is a capped card and a sandbox account, not our infrastructure.
The safety model is not about restricting what the agent can do. It's about ensuring that nothing the agent does can reach production.
Next Steps
This setup gives agents what they need to build real software end to end.
But "unblocked" doesn't mean "correct." We've had agents build an app, deploy it, verify their own work, and report success, only to find the software was still wrong. Usually, this is because the agent didn't know what "correct" meant, or it knew originally but started to drift.
That's a specification problem, and with today's models, we believe we're much closer to solving it as well. We'll write about how we're approaching this next week. Subscribe to get the next essay (Contract Engineering) in your inbox!
If this perspective matches what you're seeing, let's talk.
Rubric is an applied AI lab helping teams design and ship intelligent products.


