How We Built AI Safety Controls That Actually Work

The Trust Problem

Here is the fundamental tension of AI agents: the more capable they are, the more damage they can do.

An AI that can only answer questions is safe but limited. An AI that can create Jira issues, post to Slack channels, update Salesforce records, and fill out web forms is genuinely useful. But it can also create garbage tickets, spam your exec channel, corrupt your CRM data, and submit incorrect forms to government portals.

When we started building Quickly, we talked to a lot of operations teams about what would make them comfortable letting an AI take actions. The answers were remarkably consistent:

"I want it to do the routine stuff automatically, but I want to approve anything important."

"I need to know what it did and be able to undo it."

"Different people on my team should have different levels of trust."

"I do not want to configure 50 rules. Just give me sensible defaults."

These conversations shaped our safety architecture. We did not want a binary "on/off" switch for AI autonomy. We wanted a graduated system that teams could tune to their comfort level.

Three Autonomy Modes

Every Quickly workspace has an autonomy mode that controls the default behavior for all actions:

Conservative. Quickly asks for explicit approval before any action that writes, updates, or deletes data. Read-only operations (searching, querying, looking up information) proceed automatically. This is the default for new workspaces.

In practice, Conservative mode means you see a confirmation message in Slack for every Jira issue creation, every Confluence page draft, every Salesforce update. You tap Approve or Reject. It feels like having a very diligent assistant who checks with you before doing anything.

Balanced. Quickly asks for approval on high-risk actions (deleting, bulk updates, posting to public channels) but handles low-risk writes automatically. Creating a Jira issue from a thread? Goes through. Deleting a Confluence page? Asks first.

Most teams that have used Quickly for a few weeks switch to Balanced. They have seen enough correct actions to trust the AI for routine tasks, but they want a gate on anything destructive.

Autonomous. Quickly handles everything independently and reports results after the fact. The only exception is DESTRUCTIVE actions (deleting resources, revoking access), which still require confirmation.

This mode is for teams that have high confidence in their playbook configurations and want maximum speed. We find it works well for well-defined playbooks that have been tested and refined over time.

Per-Tool Overrides

Autonomy modes are a good starting point, but teams quickly want more granularity. A property manager might trust Quickly completely for creating maintenance tickets (it has done hundreds correctly) but want a confirmation gate before it sends emails to tenants.

Per-tool overrides let you set each tool individually to one of three settings:

AUTO - Run without asking, regardless of the workspace autonomy mode. CONFIRM - Always ask for approval, regardless of the workspace autonomy mode. BLOCK - Never run this tool, regardless of what anyone asks.

These overrides stack on top of the workspace mode. So in Balanced mode with delete_jira_issue set to BLOCK, Quickly will never delete a Jira issue even if someone explicitly asks it to. In Conservative mode with create_jira_issue set to AUTO, Quickly will create issues automatically but still ask before doing anything else.

We also separate overrides by role. Admins and Members can have different tool policies. A common pattern is: Admins get Balanced with broad AUTO overrides, Members get Conservative with no overrides. This means your power users can move fast while new team members have guardrails.

Human-in-the-Loop Gates

When Quickly needs approval, it presents a structured confirmation in Slack (or Teams) with:

A clear description of what it is about to do ("Create a P2 bug in PROJECT-X: Login button unresponsive on mobile")

The key parameters (project, priority, assignee, summary)

Approve and Reject buttons

If you tap Approve, the action proceeds and the message updates to show the result. If you tap Reject, Quickly cancels the action and asks if you want to modify the request. If you do not respond within 5 minutes, the action times out and Quickly lets you know it did not proceed.

This gate is the same mechanism used in playbooks. When a conversational playbook reaches the REVIEW phase, it presents the confirmation. When a visual workflow hits a Human Input node, it presents the confirmation. The safety mechanism is consistent regardless of how the action was triggered.

One design decision we debated internally: should Quickly explain why it is asking for approval? We decided yes. The confirmation message includes a tag like "Requires approval: WRITE action in Conservative mode" or "Requires approval: DESTRUCTIVE tool override." This helps users understand the safety system and know what to change if they want to adjust the behavior.

Real Scenarios

Here are some situations where our safety controls made a real difference:

The accidental bulk update. A user typed @Quickly mark all open bugs as resolved intending to close out a sprint. In Balanced mode, Quickly flagged this as a high-risk bulk operation and showed the list of 47 issues it was about to update. The user realized they only meant the 6 bugs in the current sprint, not all 47 open bugs across the workspace. They rejected, refined the request, and resolved just the right ones. The new team member. A Member with Conservative mode asked Quickly to "delete the old postmortem page." Because delete_confluence_page was set to BLOCK for Members, Quickly responded: "I am not able to delete pages with your current permissions. An admin can remove the page or adjust your tool policy." The member asked their admin, who handled it intentionally. The autonomous playbook. A property management team runs a maintenance triage playbook in Autonomous mode. It processes 30-40 requests per week without any manual intervention. But when a request came in mentioning "gas leak," the playbook's urgency detection bumped it to P0, and even in Autonomous mode, P0 issues trigger a confirmation gate. The property manager got an urgent notification and dispatched emergency services. The system was autonomous for routine work but escalated when it mattered.

What We Learned

Building these controls taught us a few things:

Defaults matter more than options. We spent a lot of time tuning Conservative mode to feel helpful rather than annoying. If the defaults are too restrictive, people turn off the safety system entirely. If they are too loose, people do not trust the tool. Conservative mode approving every write but auto-handling reads turned out to be the right balance for new teams. Transparency builds trust faster than control. Users who can see exactly what Quickly is about to do (and why it is asking) become comfortable with higher autonomy faster than users who just see "Approve this action?" Showing the parameters, the tool name, and the policy that triggered the gate makes the system feel predictable rather than opaque. Per-tool granularity is essential. We almost shipped with just the three modes and no overrides. In testing, every single team wanted to customize at least one tool. The most common request was "let it create but not delete." Per-tool overrides were not a nice-to-have. They were table stakes. Role-based policies prevent a lot of headaches. When everyone on the team has the same autonomy level, either the power users feel slowed down or the new hires have too much freedom. Separating Admin and Member policies solved both problems. Admins configure the system and get broader access. Members get sensible guardrails that protect them (and the team's data) while they learn the tool.

The goal was never to build a safety system that prevents all mistakes. That would mean the AI never does anything. The goal was to build a safety system where the risk of each action is proportional to the approval required. Routine, low-risk tasks flow automatically. High-risk tasks get a human check. Truly dangerous operations are blocked entirely. Teams can tune the thresholds as their trust grows.

That is the system we shipped. It is not perfect, but it is the system that teams actually use, and they use it because it stays out of their way until it matters.

How We Built AI Safety Controls That Actually Work

The Trust Problem

Three Autonomy Modes

Per-Tool Overrides

Human-in-the-Loop Gates

Real Scenarios

What We Learned

Related Posts

Why Browser Automation Changes Everything for AI Assistants

How We Built Quickly with Claude AI

Ready to try Quickly?