Unblocking Agents

Tooling and infrastructure for autonomous software engineering

March 31, 2026Dexter StoreyEssay

What Blocks Agents
Give Them Their Own Org
The Isolation Boundary
Next Steps

At Rubric, our goal is to build a process that allows us to write a spec at night, and wake up to working software. The agents are good enough to build real applications, but lack the permissions and resources to finish them.

"I've written the migration. Can you run it? I don't have permission to execute against the database."

"I've set up the OAuth handlers, but I can't create the GCP project needed for credentials. Can you create a project in the console and add the OAuth redirect URI? I'll need the client ID and secret to continue."

"I've built the signup flow. Can you test it manually? I'll need a real email and phone number to get through verification."

The agent wrote the code, but it can't provision what it needs, deploy what it built, or verify what it produced.

To unlock truly autonomous agents, we believe that agents must be free to work however they see fit, which requires guardrails that encourage that freedom without sacrificing safety. This essay explores the steps we've taken to achieve that goal.

What Blocks Agents

Agents working on real applications hit three types of walls.

Access is the most basic layer. Agents need permission to act on your machine to do things like writing files, running shell commands, installing packages, and making network requests. An unrestricted agent can do real damage, so frameworks gate actions behind approval prompts. Even when the agent is doing exactly what you asked, the permission model assumes it might not be, and you're left sitting in front of a screen clicking "yes" repeatedly.

claude — ~/project

❯Deploy the OAuth login flow and verify it works end to end.

I'll scaffold the auth routes and configure Google OAuth.

→Write src/auth/google.ts

✓ OAuth config with PKCE flow

→Write src/auth/callback.ts

✓ Callback handler with session management

→Bash gcloud projects create oauth-test-47

PERMISSION_DENIED: caller does not have permission

I built the OAuth login flow, but I can't create the GCP project needed for credentials. Could you create it in the console and share the client ID and secret so I can continue?

The agent writes the code, then yields. It can't create the GCP project it needs to continue.

Beyond local permissions, the agent needs credentials for the services the code depends on. These might be API keys, cloud credentials, deployment tokens, or database connection strings. Most teams keep these tightly scoped or out of the agent's reach entirely. In practice, this means every missing credential either halts the agent entirely, or gets quietly worked around, with the agent marking tasks complete that weren't verified.

Infrastructure is what the application runs on. When a developer joins a team, they get a seat on the org. They can provision what they need, from spinning up a database to configuring a deployment. Agents don't get seats, they get environment variables, so they can connect to existing services but can't create new ones. The agent gets stuck the moment a task requires provisioning.

Identity is what software uses to verify there's a real person on the other side. Software does things like send confirmation emails, redirects to consent screens, validates credit cards against billing addresses, and delivers SMS codes to phone numbers. Testing these flows requires a real human identity on the other side, with an email inbox that receives mail and a phone number that works for 2FA. The agent has no way to test any of this.

Every one of these blockers ends the same way: either the agent stops and asks you for help, or it skips what it can't do and silently ships broken code. Both put a developer back in the loop.

Give Them Their Own Org

The instinct is to carefully expand what agents can do inside your existing environment, but this is the wrong direction. The more access the agent has to your infrastructure, the more you need to supervise it, which defeats the point.

The better answer is to give agents access to their own environment.

Unblocked

Write webhook handler

Provision database

Expose endpoint via ngrok

Deploy to staging

Send test event

Verify receipt end-to-end

Open PR

Blocked

Write webhook handler

Need database access

Waiting for human

Human shares credentials

Need deploy token

Waiting for human

Same agent, same task. The difference is the environment.

We provisioned an entirely separate organization — a complete fork of our company's project structure under completely different accounts. This includes a separate GitHub org, separate AWS and GCP accounts, and separate billing, with no production data or customer access. Projects get cloned in and agents get full control.

Inside this org, the agent doesn't have to request access. Every resource maps to one of the three blockers, and every blocker has a concrete solution.

Full permissions for access. In their own org, agents can "dangerously skip permissions" without thinking twice.

--dangerously-skip-permissions

Bypasses tool approval prompts in Claude Code

The agent gets root VM access with a full filesystem, unrestricted network, and a shell that doesn't ask questions. API keys, OAuth client secrets, and service account credentials go directly into the agent's context. The system prompt tells the agent what it has and how to use it:

// SYSTEM PROMPTYou are set up for high-agency, human-out-of-the-loop software development. Do not ask for permission before using any of the resources below. Provision, deploy, test, and iterate freely. Test everything end to end, don't assume anything works without proof.When you're done, your work should be deployed and smoke tested end to end in an actual staging env.You are operating on a dedicated Mac Mini with full file access, unrestricted networking and Chrome installed. You are logged in via CLI and browser to GCP, AWS, Github, Vercel, Neon, Upstash, Resend, Ngrok and Stripe. You have your own dedicated orgs and admin permissions on everything.You have exclusive use of the domain "rubrot12.xyz" (managed on Vercel) and can point DNS wherever you need, or buy additional domains via CLI.Identity:Email: twelve@rubric.bot (resend)Phone: +1 (415) 830-7142 (openphone)CC: 5142 0034 8271 6953, 03/28, 219 ($250 cap)Billing address: 1209 N Orange St, #2212, Wilmington, DE 19801API keys:OPENAI_API_KEY=sk-proj-a8Kj2mNx9pL4vR7wQ3bYtF6uH0dS1eA5iO8nM2kP4xR7ANTHROPIC_API_KEY=sk-ant-api03-7kN2vXq8mP1rT5wL9bY4uH0dS6eA3iO8nM2kF7pJQUO_API_KEY=pk_live_8mN2kP4xR9qL1vT5wY7bU3eA0dS6iO...

Example system prompt — identities and keys are illustrative

Real seats for infrastructure. The agent gets admin accounts on every platform it needs, not scoped credentials. It can create cloud projects, provision databases, deploy applications, and configure DNS, the way a new developer would on their first week.

Concretely, our agents get:

GitHub with full push access and CI
Sandbox AWS and GCP accounts where the agent creates IAM policies, configures OAuth credentials, and spins up whatever it needs
Its own Vercel workspace to deploy preview branches, set environment variables, and attach domains
Admin access to Neon for Postgres databases and Upstash for Redis instances
Ngrok to expose local services for webhook testing and OAuth callbacks
A real domain with subdomains, DNS records, and HTTPS

Real identities for testing. The agent gets an email address that receives mail, for confirmation links and verification codes. It also gets a phone number that passes carrier verification, a Brex card that clears AVS checks (with a hard spending cap), and Playwright sessions for navigating consent screens, submitting forms, and verifying UI flows.

For scarce resources, we maintain pools of pre-provisioned identities that agents can draw from and return.

With this setup, the agent can cook on its own without asking for help.

The Isolation Boundary

An agent with admin access to cloud accounts, a credit card, and the ability to run arbitrary code is secure because none of these resources are connected to production.

There is no IAM role, no shared credential, and no network path between the agent's org and ours. The production org and the agent org are completely separate entities with completely separate billing.

rubric

handler.ts

async function send(draft) {

const msg = await gmail.send(draft)

return msg.id

}

rubrot

A project forks into the sandbox, gets modified and verified, then returns as a pull request. Nothing else crosses the boundary.

Code flows in one direction. When the agent finishes a task, it makes a pull request into our production org. We decide whether we merge or we don't. That's the only bridge between the two worlds.

We run agent workloads on dedicated Mac Minis in isolated VMs. If an agent provisions something expensive or gets stuck in a loop, the blast radius is a capped card and a sandbox account, not our infrastructure.

The safety model is not about restricting what the agent can do. It's about ensuring that nothing the agent does can reach production.

Next Steps

This setup gives agents what they need to build real software end to end.

But "unblocked" doesn't mean "correct." We've had agents build an app, deploy it, verify their own work, and report success, only to find the software was still wrong. Usually, this is because the agent didn't know what "correct" meant, or it knew originally but started to drift.

That's a specification problem, and with today's models, we believe we're much closer to solving it as well. We'll write about how we're approaching this next week. Subscribe to get the next essay (Contract Engineering) in your inbox!

If this perspective matches what you're seeing, let's talk.

Rubric is an applied AI lab helping teams build and ship intelligent products.

Keep reading

Reliability, orchestration, and enterprise hardening for an agent platform

Partnering with Sligo AI

April 2, 2026

Case study

Give agents modular functions instead of prescriptive workflows

Primitives over Pipelines

March 4, 2026

Essay