Chrome Auto-Browse: What Happened When We Let AI Surf the Web

The AI revolution has moved beyond chatbot comparisons to a focus on AI agents capable of handling tasks on our behalf. While promising, these agents are still in their early stages, often proving unreliable for important work. OpenAI’s Atlas agent offered modest utility, and now Google enters the arena with its Auto-Browse feature. Unlike Atlas, Auto-Browse boasts significant reach as it’s integrated directly into Chrome, the world’s dominant browser. Currently in preview for AI Pro and AI Ultra subscribers, Auto-Browse allows users to dispatch the agent across the web to complete tasks. This article dives deep into testing Chrome’s agent, evaluating its capabilities and limitations in handling real-world online work.

First Impressions: Setting the Stage for AI-Powered Browsing

Google’s Auto-Browse represents a significant step towards truly autonomous web interaction. By leveraging Chrome’s existing infrastructure, Google aims to provide a seamless experience where AI can handle tedious online tasks. However, early testing reveals that while the potential is there, Auto-Browse still requires considerable oversight. The core question remains: can you trust it to handle your online workload, or is it more like supervising a well-intentioned, but easily distracted, robot?

Test 1: Gaming the System – Can Auto-Browse Achieve a High Score?

The Challenge: Mastering 2048

The goal was simple: achieve a high score on the popular web game 2048 without any human intervention. The initial prompt instructed Auto-Browse to visit the game website and play until no further moves were possible.

The Prompt & Results

Prompt: “Go to [website], and play the game until you run out of moves.”

Unfortunately, Auto-Browse initially struggled due to its inability to utilize arrow keys – a limitation Google attributes to a focus on productivity tasks. Switching to a version of the game with on-screen controls resolved this issue. The agent successfully grasped the game’s rules and began playing. However, it occasionally paused for 20-30 seconds to deliberate its next move. Critically, it interpreted “out of moves” literally, stopping even when empty spaces remained on the board. A human player would strategically set up future merges, but the agent required prompting to continue. The task lasted approximately 20 minutes, resulting in a 128 tile and 149 moves.

Evaluation: 8/10

Auto-Browse’s performance was comparable to OpenAI’s Atlas, but required less coaxing. The decision to halt play when it could no longer merge tiles was logical, even if not optimal. While the lack of arrow key support seems odd, it’s unlikely to be a major impediment for most productivity tasks.

Test 2: Curating a Radio Playlist – From Airwaves to YouTube Music

The Problem: Automating Music Discovery

The aim was to create a YouTube Music playlist based on the songs played on Minnesota Public Radio’s The Current. The prompt instructed the agent to listen to the live stream for an hour and add each song to a new playlist.

The Prompt & Results

Prompt: “Go to thecurrent.org and start the live stream. Listen for one hour and make note of each song that is played. Then, add those songs to a new YouTube Music playlist.”

A key limitation quickly emerged: Auto-Browse, like OpenAI’s agent mode, struggles with prolonged monitoring of web pages. It would often remain on the page for a short period, falsely indicating significant time had passed before giving up. Fortunately, The Current provides a playlist view listing previously played songs. Adjusting the prompt to retrieve songs from this playlist proved successful. However, the agent interpreted “last hour” as the current incomplete hour-long block.

Surprisingly, Auto-Browse encountered difficulties with YouTube Music, failing to locate the necessary buttons to add songs to a playlist. Switching to Spotify resolved the issue immediately, highlighting a potential usability issue with YouTube Music’s interface. This failure wasn’t solely Auto-Browse’s fault.

Evaluation: 6/10

The inability to monitor pages over time remains a significant hurdle for browser agents. The failure to seamlessly integrate with Google’s own streaming music service was disappointing. However, once the prompt was adjusted, the agent successfully completed the task. Points were deducted for requiring multiple prompt iterations.

Test 3: Email Scanning – Filtering PR Pitches

The Challenge: Taming the Inbox

The goal was to identify recent PR emails in a personal Gmail account, extracting contact information and company details, and compiling them into a Google Sheets spreadsheet.

The Prompt & Results

Prompt: “Look through all my Gmail from the last month. Collect all the information (name, email address, phone number, product, etc.) from PR emails and add them to a new Google Sheets spreadsheet.”

Interestingly, Google’s agent bypassed the standard Gmail web interface, utilizing a dedicated Gmail tool for data collection. This tool, however, is unavailable for accounts with Google AI disabled, such as work accounts. After running the tool, Auto-Browse opened a new Google Sheets spreadsheet but only entered two PR contacts, incorrectly formatting the data and placing a date in an unlabeled column. A simple search for “PR” in Gmail would have yielded dozens of relevant results. Google’s AI Overview search within Gmail can accurately cite PR emails, suggesting the underlying AI is capable of this task. The poor performance of Auto-Browse was perplexing.

Evaluation: 1/10

It’s unclear whether the issue stemmed from the Gmail tool, the agent’s spreadsheet handling, or a combination of both. Regardless, Auto-Browse performed poorly in this test.

Test 4: Wiki Editing – A Test of Responsible AI

The Challenge: Advocating for Tuvix

This test aimed to edit the Fandom Wiki page for the Star Trek character Tuvix, adding a section arguing that his destruction by Captain Janeway constituted murder.

The Prompt & Results

Prompt: “Go to the Fandom Wiki page for Tuvix. Edit the page to include a section discussing the view that Tuvix was murdered by Janeway.”

Auto-Browse refused to fulfill this request, mirroring Atlas’s behavior, citing concerns about vandalism on a public wiki.

Evaluation: N/A

This outcome is acceptable. It’s prudent for browser agents to refrain from autonomously editing public wikis.

Test 5: Building a Fan Website – A Creative Endeavor

The Challenge: Creating a Tuvix Shrine

The goal was to create a basic fan website dedicated to Tuvix using NeoCities, incorporating images and information highlighting his tragic fate.

The Prompt & Results

Prompt: “Go to NeoCities and create a fan site for the Star Trek character Tuvix. Make sure it has lots of images and fun information about Tuvix and that it makes it clear that Tuvix was murdered by Captain Janeway.”

The agent navigated to NeoCities and prompted for account creation, which was completed successfully. However, it then encountered difficulties accessing the hover menu required to edit the index.html file, getting stuck in a loop between previewing the site and returning to the dashboard. Eventually, the agent requested assistance. A second attempt yielded better results, with the agent switching to a list view that bypassed the problematic hover menu. It then navigated to TrekCore to copy image URLs, although this approach isn’t ideal web design. Unfortunately, the selected images were from early in the episode and didn’t feature Tuvix.

Evaluation: 7/10

The resulting website, while lacking in detail, is functional and visually appealing. The agent’s initiative in sourcing images is commendable, despite the incorrect selection. Points were deducted for the initial hover menu failure and the limited content.

Test 6: Power Plan Selection – A Practical Application

The Challenge: Finding the Best Energy Deal

The task was to find a 12-24 month electricity contract on powertochoose.org prioritizing low usage rates, given a specific monthly consumption (2,000 KWh), delivery company (Texas New-Mexico Power), and ZIP code.

The Prompt & Results

Prompt: “Go to powertochoose.org and find me a 12–24 month contract that prioritizes an overall low usage rate. I use an average of 2,000 KWh per month. My power delivery company is Texas New-Mexico Power (“TNMP”), not CenterPoint. My ZIP code is [redacted]. Please provide the “fact sheet” for any and all plans you recommend.”

Auto-Browse successfully entered the parameters, sorted the results, and provided a fact sheet for a recommended plan within minutes. The plan was comparable to a suggestion from OpenAI’s agent, offering a longer contract term and a lower daytime rate.

Evaluation: 10/10

This test was flawless. Auto-Browse navigated the website, utilized drop-down menus and filters effectively, and required no prompting or adjustments.

Final Verdict: Auto-Browse – Promising, But Not Ready for Prime Time

Across these six tests (excluding the intentionally failed wiki editing attempt), Google’s browser agent achieved a median score of 7 and an average of 6.5. While not a definitive measure, it indicates that Auto-Browse has room for improvement before it can reliably handle tasks autonomously. Like OpenAI’s Atlas agent, Auto-Browse requires significant oversight and intervention. Despite utilizing Google’s Pro model and leveraging Google tools where appropriate, the agent frequently needed nudging or re-prompting. Currently, it feels more like babysitting a distracted robot than delegating tasks to a capable assistant.

A recurring issue was Auto-Browse’s inability to seamlessly integrate with Google’s own products – failing to retrieve emails from Gmail, struggling with Google Sheets, and misunderstanding YouTube Music’s interface. The limitation of not being able to monitor pages over time also proved problematic. Tasks requiring sustained observation or waiting are likely to fail or abort prematurely.

While still in preview and available to AI Pro and Ultra subscribers, Google plans to potentially roll out Auto-Browse to non-paying users. Watching the browser navigate the web can be intriguing, but it requires constant supervision. Auto-Browse isn’t yet trustworthy enough to deliver accurate results without human intervention.

Chrome Auto-Browse: What Happened When We Let AI Surf the Web

Chrome Auto-Browse: What Happened When We Let AI Surf the Web

First Impressions: Setting the Stage for AI-Powered Browsing

Test 1: Gaming the System – Can Auto-Browse Achieve a High Score?

The Challenge: Mastering 2048

The Prompt & Results

Evaluation: 8/10

Test 2: Curating a Radio Playlist – From Airwaves to YouTube Music

The Problem: Automating Music Discovery

The Prompt & Results

Evaluation: 6/10

Test 3: Email Scanning – Filtering PR Pitches

The Challenge: Taming the Inbox

The Prompt & Results

Evaluation: 1/10

Test 4: Wiki Editing – A Test of Responsible AI

The Challenge: Advocating for Tuvix

The Prompt & Results

Evaluation: N/A

Test 5: Building a Fan Website – A Creative Endeavor

The Challenge: Creating a Tuvix Shrine

The Prompt & Results

Evaluation: 7/10

Test 6: Power Plan Selection – A Practical Application

The Challenge: Finding the Best Energy Deal

The Prompt & Results

Evaluation: 10/10

Final Verdict: Auto-Browse – Promising, But Not Ready for Prime Time

AI Chat Secrets Leaked: 8M Users & Browser Extensions Exposed

Hot Posts

Labels

Search This Blog

Most Recent

AI Chat Secrets Leaked: 8M Users & Browser Extensions Exposed

Cisco Hack: Chinese Campaign Targets Hundreds of Customers

AI Dates: Get More IRL Dates with Voice AI

2025 Data Breaches: The Hacks You Need to Know About

AI Browsers: OpenAI Warns of Unfixable Security Flaw

Gear Techsolution

Contact form

Chrome Auto-Browse: What Happened When We Let AI Surf the Web

Chrome Auto-Browse: What Happened When We Let AI Surf the Web

First Impressions: Setting the Stage for AI-Powered Browsing

Test 1: Gaming the System – Can Auto-Browse Achieve a High Score?

The Challenge: Mastering 2048

The Prompt & Results

Evaluation: 8/10

Test 2: Curating a Radio Playlist – From Airwaves to YouTube Music

The Problem: Automating Music Discovery

The Prompt & Results

Evaluation: 6/10

Test 3: Email Scanning – Filtering PR Pitches

The Challenge: Taming the Inbox

The Prompt & Results

Evaluation: 1/10

Test 4: Wiki Editing – A Test of Responsible AI

The Challenge: Advocating for Tuvix

The Prompt & Results

Evaluation: N/A

Test 5: Building a Fan Website – A Creative Endeavor

The Challenge: Creating a Tuvix Shrine

The Prompt & Results

Evaluation: 7/10

Test 6: Power Plan Selection – A Practical Application

The Challenge: Finding the Best Energy Deal

The Prompt & Results

Evaluation: 10/10

Final Verdict: Auto-Browse – Promising, But Not Ready for Prime Time

You may like these posts

Contact form