Google Unveils Gemini 2.5 Model With Advanced Browser Interaction Capabilities

Google's Gemini 2.5 AI model can navigate and act within web browsers like a human, supporting various interactive tasks without API access, surpassing competitors in performance benchmarks.

    Key details

  • • Gemini 2.5 enables human-like browser interaction using visual recognition and logical reasoning.
  • • It supports 13 fundamental actions, including clicking, scrolling, and typing, but does not control desktop OS.
  • • The model performs web tasks in API-less environments such as filling forms and UI testing.
  • • Gemini 2.5 outperforms competitors in multiple web and mobile benchmarks.
  • • Developers can access it via Google AI Studio and Vertex AI; a public demo runs on Browserbase.

Google has announced its latest AI model, Gemini 2.5 Computer Use, designed to interact with web browsers in a way that mimics human users. This model uses advanced visual recognition and logical reasoning to perform tasks within user interfaces without relying on APIs, enabling it to fill out forms, scroll, click, type, and navigate websites autonomously. It is particularly useful for environments where API access is restricted, serving functions like website navigation, form submission, and UI testing. Gemini 2.5 currently supports 13 core actions including opening tabs, dragging items, and interacting with browser elements but does not extend its control to operating system level, unlike some competitors.

Google has demonstrated the model’s impressive performance through demo videos, showing it completing tasks such as playing the 2048 game or searching discussions on Hacker News. According to the previews, Gemini 2.5 outperforms rivals such as OpenAI’s ChatGPT apps and Anthropic’s Claude model across various web and mobile benchmarks, showcasing superior speed and accuracy. The new model’s browser interaction distinguishes it in the competitive AI landscape, where full OS-level access remains limited to some competitors.

Developers can access Gemini 2.5 through Google AI Studio and Vertex AI platforms, and a public demo is available on Browserbase. The announcement closely follows OpenAI’s introduction of app integration for ChatGPT during its annual Dev Day and builds on Google’s ongoing efforts with its prior Gemini releases and Project Mariner, the latter employing AI agents for autonomous task execution on the web.

This development reaffirms Google’s strategic emphasis on AI models capable of natural user interface interaction, positioning Gemini 2.5 as a significant step forward in browser-based AI agents. Its unique combination of visual understanding and logical reasoning for API-less web use marks a notable advancement in AI-driven automation and user assistance technologies.