Why Browser Automation Changes Everything for AI Assistants

The API-Only Wall

When we started building Quickly, we focused on the tools with great APIs: Jira, Confluence, Salesforce, Google Calendar. These integrations are clean. You authenticate via OAuth, call well-documented endpoints, and get structured JSON back. Everything is predictable and testable.

But we kept running into the same conversation with prospective customers:

"Can Quickly fill out forms in [some industry-specific portal]?"

The answer was always no. These portals do not have APIs. They were built ten or fifteen years ago, they require a browser login, and the only way to interact with them is to open a web page and click through a form. Property management systems, government filing portals, MLS listing platforms, insurance quoting tools, legacy CRMs that predate REST. The long tail of business software lives behind login screens, not API endpoints.

We estimated that for our target customers, APIs covered about 30% of the tools they use daily. The other 70% were browser-only. If we wanted to be a complete automation platform, we had to figure out the browser.

Browser Automation as the Answer

Browser automation is not a new idea. Selenium has been around since 2004. Puppeteer since 2017. Playwright since 2020. QA teams have been writing browser tests for decades.

What is new is combining browser automation with an AI agent that can understand context, make decisions, and handle the unexpected. A traditional Selenium script breaks the moment a button moves three pixels to the right. An AI-powered browser agent can adapt.

But here is what we learned the hard way: you do not actually want the AI making decisions about which button to click most of the time. Not because the AI is bad at it, but because it is slow and expensive. Every time you ask a large language model to look at a screenshot and decide what to click, you are spending tokens, adding latency, and introducing non-determinism into a process that should be predictable.

Our Approach: Playwright + Stagehand

We ended up with a two-tier architecture.

Tier 1: Deterministic form filling. For portals we know well, we pre-map the form selectors. We know exactly where the "property address" field is, where the "submit" button lives, and what the success confirmation looks like. Playwright drives the browser with surgical precision. No AI needed for the navigation itself. This is fast (seconds, not minutes), cheap (no LLM calls for clicking), and 100% reliable as long as the portal has not changed its layout. Tier 2: AI-assisted navigation. For portals we have not pre-mapped, or for steps that require judgment (like choosing the right dropdown option from a list of similar entries), we use Stagehand. Stagehand sits on top of Playwright and uses vision models to understand the page. It can identify form fields, read labels, and figure out which element corresponds to which piece of data. This is slower and costs tokens, but it works on sites we have never seen before.

The key insight is that most real workflows are a mix. You might use deterministic filling for the first 90% of a form and then hand off to the AI for one ambiguous dropdown. Or you might use the AI for initial exploration of a new portal, then codify the selectors once you know the layout.

In practice, our production workflows are about 85% deterministic and 15% AI-assisted. That ratio matters a lot for cost and reliability.

Real-World Examples

Here are three workflows we have built using browser automation:

Real estate listing submission. A real estate team needs to submit property listings to their local MLS portal. The portal has a 40-field form with dropdowns, checkboxes, and file uploads. Our playbook collects the listing data through a Slack conversation, then Playwright fills the form field by field, uploads photos, and submits. What used to take an agent 25 minutes takes 90 seconds. Maintenance request routing. A property management company receives maintenance requests through a legacy portal that has no API and no webhook support. We set up a browser automation that logs into the portal every 15 minutes, checks for new requests, extracts the details, and creates Jira tickets. The property manager sees new requests in Slack within 15 minutes of submission. Government form filing. A compliance team needs to submit quarterly reports through a government portal. The portal requires multi-page form entry with calculated fields. Our playbook reads the source data from a Google Sheet, calculates the required values, and fills the form across all pages. The team reviews a summary before we click submit.

When to Use It

Browser automation is powerful but it is not always the right tool. Here is our decision framework:

Use an API integration when: The target tool has a documented API, you need real-time responses, or you are building something that runs hundreds of times a day. APIs are faster, cheaper, and more reliable. Use browser automation when: The target tool has no API, the API does not cover the feature you need, or you are dealing with a legacy system that will never get an API. Browser automation is also the right choice when you need to interact with a portal exactly as a human would, for compliance or audit trail reasons. Do not use browser automation when: The target website changes its layout frequently (weekly or more), you need sub-second latency, or you are trying to automate something that violates the site's terms of service.

The Engineering Challenges

Building reliable browser automation at production quality is harder than it looks. Some of the challenges we wrestled with:

Session management. Browser sessions need to handle login, session expiry, CAPTCHA, and multi-factor authentication. We store credentials encrypted with AES-256-GCM and inject them at runtime. For MFA, we pause the playbook and ask the user to complete the challenge manually, then resume. Error recovery. When a form submission fails, you need to figure out why. Did the page timeout? Did a validation error appear? Did the portal show an unexpected dialog? We capture screenshots at every step so the user can see exactly what happened. The playbook retries transient failures and surfaces persistent ones for human review. Isolation. Every browser automation task runs in its own isolated browser context. No shared cookies, no shared state. This is important for multi-tenant safety. One workspace's credentials and session data never leak to another. Cost control. Browser sessions are expensive compared to API calls. A Playwright session running for 60 seconds costs more than hundreds of API calls. We aggressive about closing sessions when they are no longer needed, and we cache session state when it is safe to reuse.

What's Next

We are working on a few improvements to our browser automation stack:

Selector learning. When the AI-assisted tier identifies the right elements on a page, we save those selectors for next time. Over time, Tier 2 sites graduate to Tier 1 automatically as we accumulate reliable selector maps. Parallel execution. Right now, each browser task runs sequentially. We are building support for running multiple browser sessions in parallel, which will matter for batch operations like submitting 50 listings at once. Visual debugging. We are adding a step-by-step replay viewer in the dashboard so admins can watch exactly what the browser did during a playbook run. Think of it as a screen recording with annotations showing which element was clicked and what data was entered.

Browser automation is not glamorous. It is not the kind of feature that makes headlines. But for the teams that need it, it is the difference between "nice AI toy" and "tool that actually does my job." And that is the gap we are trying to close.

Why Browser Automation Changes Everything for AI Assistants

The API-Only Wall

Browser Automation as the Answer

Our Approach: Playwright + Stagehand

Real-World Examples

When to Use It

The Engineering Challenges

What's Next

Related Posts

How We Built AI Safety Controls That Actually Work

How We Built Quickly with Claude AI

Ready to try Quickly?