The AI Thinker
The AI Thinker Podcast
📰 Industry memo: the last two weeks in AI and why the AI agent is now the new OS
0:00
-5:17

📰 Industry memo: the last two weeks in AI and why the AI agent is now the new OS

Deconstructing the last two weeks of release notes from major AI players to reveal the strategic pivot to autonomous execution.

So, I finally came up for air after a busy couple of weeks and spent the morning going through the firehose of AI release notes. To be honest, in my defense, the landscape felt relatively calm. There were no shocking, foundational model drops that would force you to rethink everything overnight.

But one move really stood out to me, and that was Google’s release of their Gemini CLI.

For this edition, I’ve gathered everything from the last two weeks, major and minor, and grouped it all together to try and make sense of the underlying currents. We’ll cover all the updates, but pay close attention to how these developer tools and new economic models are becoming the real strategic arena.


1. The proactive shift: AI graduates from assistant to autonomous engine

It’s official: your AI just got a promotion. It’s no longer just a helpful assistant that finds things for you. The latest wave of updates has turned it into an autonomous engine that gets real work done. We’re now seeing toolsets that let AI perform deep research, handle financial analysis, and build software, often without any direct input.

This isn’t just another feature update; it’s a fundamental shift in what we’re building. The goal is no longer to create a smart “tool” for users, but to build an independent “digital teammate” that can execute complex work on its own.

OpenAI releases agents SDK to automate high-stakes research

OpenAI has introduced a powerful suite of tools, including a Deep Research API and an Agents SDK, engineered for the development of sophisticated research applications. This move empowers developers to construct multi-agent systems that can autonomously deconstruct high-level questions, conduct rigorous analysis, and synthesize findings into structured, citation-backed reports. In essence, OpenAI is no longer just providing answers; it is now shipping the automated analyst. The inclusion of the Model Context Protocol (MCP) provides a secure gateway for these agents to access private knowledge bases, transforming them into formidable, context-aware research partners for the enterprise.

Google embeds Gemini AI agent directly into the command line

Google has launched the Gemini CLI, an open-source AI agent that integrates the Gemini 1.5 Pro model and its one-million-token context window directly into the developer terminal. This represents a strategic push to embed powerful AI into the native environment where software is built, streamlining tasks like coding, debugging, and research. Google is not asking developers to come to its AI; it is delivering its AI directly to the developer’s most essential workspace. By offering a generous free usage tier and building in extensibility via the Model Context Protocol (MCP), Google aims to make its agent an indispensable and deeply customizable component of the daily developer workflow.

Microsoft Copilot gains vision, research, and task automation capabilities

Microsoft has executed a significant upgrade of its Copilot assistant, embedding it more deeply into user workflows with a focus on visual comprehension and task execution. The new Copilot Vision allows the AI to perceive a user’s screen or camera to provide real-time, voice-guided assistance, fundamentally altering the user interaction model. For its paid subscribers, the integration of Deep Research and the expansion of Copilot Actions transforms the assistant from a passive information source into an active digital partner. Copilot is learning to see, research, and act, a trifecta of capabilities that pushes it squarely into the realm of a true agent.

Anthropic’s Claude becomes an application development platform

Anthropic has updated its Claude application, enabling any user to build and share interactive AI-powered apps, termed “artifacts,” using only natural language. This is a landmark maneuver to democratize AI development, positioning Claude not merely as a chatbot but as an accessible, integrated development environment. The strategic brilliance is embedded in its economic model: API usage is charged to the end-user’s subscription, not the creator’s. This single decision removes the primary barrier to innovation, creating a powerful incentive for the viral propagation of an entirely new tool ecosystem.

Anysphere enhances its cursor editor for complex coding tasks

Anysphere has released Cursor v1.2, upgrading its AI code editor with features that enhance its agent’s capacity for long-horizon, complex programming challenges. The introduction of “Agent To-dos” provides a structured planning and tracking system, rendering the AI’s process more transparent and reliable. This is a deliberate strategy to mature the AI from a simple autocomplete function into a resilient and predictable software development partner. Combined with codebase “Memories” and the ability to resolve merge conflicts, the focus is squarely on improving the agent’s ability to manage complexity over time.

Windsurf AI evolves from code editor to collaborative “thought partner”

Windsurf’s latest platform updates strategically expand its AI’s function beyond the editor and into the complete development lifecycle. The introduction of “planning mode” enables the AI agent to co-author and adapt a project plan with the user, while the new “Windsurf Browser” extends the AI’s reasoning to a user’s web activity. This evolution is designed to create a deeply integrated environment where context is shared across long-term planning, research, and coding. The goal is no longer just about writing code together; it is about thinking together.

ElevenLabs launches 11.ai, a voice-first assistant designed for action

ElevenLabs has entered the AI assistant market with the alpha launch of 11.ai, a voice-first tool strategically designed to execute tasks, not merely answer questions. By leveraging the Model Context Protocol (MCP) to integrate with external applications like Perplexity, Slack, and Notion, 11.ai aims to overcome the functional limitations of traditional voice assistants. The launch is a direct challenge to the incumbent model, proposing that the most natural human interface, voice, should lead directly to meaningful digital outcomes.


2. The platform offensive: competing for a foundational role

The competition in AI has shifted. It’s no longer just about who has the smartest model; it’s a full-on land grab for the loyalty of developers.

The playbook is simple: become the essential “building blocks” for everyone else. We’re seeing the major players aggressively push this strategy by open-sourcing powerful models, releasing specialized toolkits, and offering high-speed infrastructure.

They aren’t just being generous; they’re racing to become the foundational platform, the “AWS” of the AI world. The goal is to get the next generation of AI applications built on their turf, because once you’re built into their ecosystem, it’s very hard to leave.

Google and Hugging Face open-source the highly efficient Gemma 3.0 models

In a significant open-source maneuver, Google, in partnership with Hugging Face, has released its Gemma 3.0 family of models. These models are specifically engineered for on-device, multimodal applications, offering the performance of large parameter models within a radically smaller memory footprint. This efficiency breakthrough, achieved through a novel architecture, makes advanced AI accessible for local hardware deployment. Google is strategically decentralizing AI power, seeding a broad developer community to build real-time experiences far from the cloud.

Google simplifies access to massive datasets with new data commons library

Google has launched a new V2 Python client library for Data Commons, substantially improving programmatic access to its extensive open-source knowledge graph. A key strategic feature of the update is robust support for custom Data Commons instances, allowing organizations to integrate their private datasets with the public graph. By lowering the barrier to entry, Google is positioning Data Commons not just as a repository, but as the connective tissue for public and private data.

SGLang and Hugging Face merge high-performance inference with development flexibility

The SGLang project has announced a critical backend integration with the Hugging Face transformers library, bridging the gap between cutting-edge model development and high-performance production deployment. The update allows the high-speed SGLang inference engine to run any model from the vast transformers library out-of-the-box. This move eradicates a significant workflow bottleneck, effectively creating an express lane from experimentation to production for AI developers.


3. The geopolitical imperative: AI as national and economic strategy

So, AI is no longer just on the CEO’s agenda; it’s officially a topic for presidents and prime ministers. It’s clear that nations now see AI as a fundamental part of their economic and geopolitical strength, right up there with trade policy and military power. Governments are actively creating their own national AI strategies, trying to figure out how to boost their country’s productivity while keeping their data secure and under their own control.

And, as always, this creates a huge new opportunity. A whole market is opening up for AI companies that can provide customized, nationally-focused solutions. It’s a sign that the next big AI customer might not be a corporation, but an entire country.

OpenAI presents A$115 billion economic blueprint for Australia

OpenAI, in collaboration with Mandala Partners, has published a detailed report urging Australia to shift from cautious observation to active AI investment. The blueprint forecasts that a focused national strategy could inject A$115 billion annually into the Australian economy by 2030. OpenAI is not just selling technology; it is selling a national economic vision with itself positioned as a primary architect. By providing a 10-point action plan, the company is aiming to become a strategic partner in shaping sovereign policy.

Mistral AI launches “AI for citizens” to empower sovereign nations

Mistral AI has introduced its “AI for Citizens” initiative, a direct strategic challenge to the dominance of monolithic Big Tech offerings. The program is designed to equip governments with the tools to build their own sovereign AI capabilities, addressing critical concerns around vendor lock-in and data privacy. Mistral is positioning itself as the arsenal for technological independence. By providing technology for self-hosted deployments and co-training models on local data, it aims to be the key enabler for nations seeking control over their digital destiny.


4. Vertical ascent: specializing AI for high-value industries

The era of the generalist AI is winding down. The new trend is to send AI to grad school to become a specialist. Instead of one model that can do a bit of everything, companies are now building highly specialized versions designed to become experts in one specific, high-stakes field, think life sciences, robotics, or finance.

They’re doing this by fine-tuning these models on niche, industry-specific data and workflows. This unlocks a whole new level of performance, creating an AI that can solve the kind of complex problems a general model could never touch.

Google’s Gemini 2.5 gains spatial awareness for advanced robotics

Google has updated its Gemini 2.5 models with sophisticated capabilities tailored for robotics and embodied intelligence. The models now possess spatial and semantic scene understanding, allowing a robot to interpret complex visual commands and generate control code on the fly to execute physical tasks. Google is providing the foundational brain and nervous system for the next generation of intelligent machines. A new “Live API” enables real-time, voice-driven interaction, giving developers the core toolset to build truly interactive robots.

Google DeepMind unveils AlphaGenome to decode non-coding DNA

Google DeepMind has introduced AlphaGenome, a powerful AI model designed to predict the function of the 98% of the human genome that does not code for proteins. By analyzing vast DNA sequences at single-letter resolution, the model achieves state-of-the-art performance in predicting molecular properties. AlphaGenome represents a unified platform designed to accelerate the very pace of biological discovery. Available to non-commercial researchers, it aims to speed investigation into the genetic basis of disease.

Perplexity integrates real-time financial data to power market analysis

Perplexity has significantly upgraded its research platform by integrating real-time financial data, including stock prices and financial statements. This strategic move grants all users direct access to sophisticated financial information and analysis capabilities that were previously gated behind expensive, specialized tools. Perplexity is effectively democratizing the analyst’s toolkit, positioning itself as an indispensable knowledge engine for the high-value finance sector.


5. The vanishing point: AI’s radical redesign of the user interface

It looks like the user interface as we know it is starting to melt away. For the last few decades, we’ve been trained to click on static buttons and navigate menus that someone else designed. That entire way of interacting with computers is being rewritten by AI.

Now, instead of clicking through forms, we’re getting interfaces generated on the fly. Instead of searching for stock photos, we’re creating hyper-realistic images from a single line of text. And instead of menus, we’re starting to have conversations with lifelike digital avatars. It all points to a future where technology finally adapts to us, becoming more natural, personalized, and intuitive.

Google prototypes a “generative OS” that creates interfaces in real-time

Google has revealed a research prototype for a generative operating system that creates its user interface dynamically in response to user interaction. This groundbreaking system uses a low-latency Gemini model to generate each screen on the fly, moving beyond the constraints of a pre-built UI. This research signals a potential paradigm shift where software is no longer a static construct, but a fluid medium that adapts perfectly to a user’s immediate intent.

Meta’s new AI models generate realistic non-verbal cues for avatars

Meta’s AI research labs have announced a new family of models capable of generating realistic, two-person conversational behaviors for digital avatars. Underpinned by the release of a massive 4,000-hour dataset of human interactions, these models can create lifelike facial expressions, gestures, and turn-taking cues. Meta is systematically deconstructing and replicating human interaction to solve the “uncanny valley” problem that has long plagued virtual social experiences.

Google releases Imagen 4, its most powerful text-to-image model

Google has launched Imagen 4, its latest and most advanced text-to-image model, raising the industry standard for quality and prompt accuracy. The release strategically includes two tiers: the flagship Imagen 4 for high-quality generation and Imagen 4 Ultra for high-precision alignment with complex prompts. By providing these powerful new tools via API, Google is aiming to fuel the next wave of creative applications while setting the terms for responsible use.


6. The unifying core: AI as the central nervous system for data

The next big move in AI isn’t just about making the model smarter; it’s about what it can connect to. The goal is to turn AI into the central hub for your entire digital life.

The strategy we’re seeing now is to plug the AI directly into all the separate places you store your stuff, your Google Drive, your work OneDrive, your Dropbox. By getting a secure key to these services, the AI can suddenly search, reference, and synthesize information across all of your private files.

This transforms it from a standalone tool you visit into a deeply personalized “second brain” that understands the full context of your work and life. It’s the first real step toward a single, unified interface for all your information.

OpenAI connects ChatGPT Pro directly to users’ cloud storage

OpenAI has deployed “chat search connectors” for ChatGPT Pro subscribers, enabling direct, secure integration with major cloud storage platforms like Dropbox, Google Drive, and Microsoft OneDrive. This feature allows the AI to search and reference a user’s private files, making it a more personalized and contextually aware work tool. This is a direct strategic play to make ChatGPT the single, indispensable hub for a user’s entire universe of information.

Perplexity launches “Max” tier to monetize power users and gate early access

Perplexity has introduced Perplexity Max, a premium subscription tier designed to serve its most demanding users while creating a new revenue stream. The plan’s key strategic incentive is offering early access to new products, beginning with “Comet,” Perplexity’s own browser engineered to be a “powerful thought partner.” This is a classic SaaS maneuver to segment and monetize its user base, using exclusive product access as a powerful lever to deepen its competitive moat.


So, what’s the real takeaway from all these updates?

It’s pretty clear that AI has stopped being a simple “feature” we bolt onto our products. It’s now in a race to become the new operating system for how work actually gets done. This is happening on three fronts at once: AI is learning to do complex tasks for us, the big players are fighting over who owns the developer platforms, and the models themselves are becoming hyper-specialized for specific industries and even entire countries.

My final take is this: just “using AI” isn’t a strategy anymore. The real advantage will come from building your products around an AI-native core. We’re officially moving from the business of selling tools to selling autonomous outcomes.

This leaves us with one big question for the next roadmap meeting:

Are we still building products that just use AI, or are we building the essential AI agents that will define the future of our industry?

Discussion about this episode

User's avatar