May 2026: What Actually Changed in AI Tools
May is the month where spring conference season meets reality. Every AI company held a keynote, dev day, or "special event" between mid-April and late May. The demos were impressive. The shipping was not. Here's what actually survived the demo-to-delivery gap, what didn't, and what moved while everyone was watching the stage.
What Actually Shipped
OpenAI Codex landed as a real product. After months as a research preview and then a quiet rebrand, OpenAI's cloud-based coding agent shipped as a generally available product inside ChatGPT's interface. You describe a task, it spins up a sandboxed environment, writes the code, runs the tests, and hands you back a result. The execution is genuinely good for self-contained projects — scripts, utilities, data transformations. It falls apart on anything that needs deep context about an existing codebase, because it can't see your codebase. That's a fundamental architectural constraint, not a bug they'll fix next sprint. Codex is a good tool for a specific use case. The marketing implies it's a general-purpose coding agent. It is not.
Google shipped Gemini in Android Studio. Google's I/O keynote promised Gemini integration across every product. The one that actually shipped in May and actually works is Gemini in Android Studio [VERIFY]. Code completion, inline explanation, and — the genuinely useful part — automated migration assistance for API version bumps. If you're an Android developer, this immediately saves time on the drudge work of staying current with SDK changes. If you're not an Android developer, the interesting signal is that Google is competing on vertical integration (their model, their IDE, their platform) rather than trying to build a general-purpose coding tool. That's the smart play for a company that owns the platform.
Anthropic released MCP 1.1. The Model Context Protocol got its first minor version bump, and the changes are more interesting than a point release usually warrants [VERIFY]. Streaming support for long-running tools, better error propagation, and — the one that matters — a standardized authentication flow for MCP servers. The auth story was the biggest blocker for enterprise adoption. You can't ask a Fortune 500 company to pipe their data through a protocol that handles auth through "whatever the server implementer felt like." MCP 1.1 doesn't solve every enterprise concern, but it removes the excuse. Adoption should accelerate through summer.
Runway shipped Gen-4 Turbo. Runway's Gen-4 Turbo dropped the generation time for 10-second clips from roughly two minutes to under thirty seconds. Quality is marginally worse than full Gen-4 — softer details, occasional temporal artifacts. But the speed change is not incremental. At two minutes per generation, you plan your shots. At thirty seconds, you iterate. The workflow shift from "careful prompting" to "fast iteration" changes who uses the tool and what they use it for. Video generation just became a sketch tool instead of a rendering pipeline.
What Didn't Survive the Demo-to-Delivery Gap
Google's "Project Astra" real-time assistant. Google showed a stunning live demo of a multimodal assistant that could see your screen, understand spatial context, and respond conversationally in real time. May ended without a shipping date, an API, or a beta signup. The demo was real — there's no reason to doubt the underlying capability exists in a lab. But "exists in a lab" and "ships to users" are separated by a gap that has eaten more Google products than any competitor has.
Microsoft's Copilot Vision. Announced at Build with a live demo showing Copilot understanding and interacting with any webpage you're viewing. The waitlist opened. The waitlist stayed. People who got access reported it worked on some sites, crashed on others, and couldn't handle anything behind authentication. The demo showed it reading a bank statement. The reality is it struggles with a complex GitHub PR. This is the classic demo-to-delivery pattern: show the best case, ship the average case, hope nobody compares the two.
Stability AI's "Stable Virtual Studio." Stability demoed a complete creative suite — image, video, 3D, audio — unified under one interface. What shipped was a landing page with a waitlist and an announcement that the video component would launch "later this year" [VERIFY]. Stability has an increasingly severe credibility gap between what it announces and what it delivers. At some point, the market stops watching the demos.
What Went Dark
Inflection AI's Pi. Remember Pi, the empathetic AI chatbot that was going to revolutionize personal AI? After Microsoft hired most of Inflection's team in March 2024 [VERIFY], Pi continued operating but updates slowed to a trickle. In May 2026, the app hasn't had a meaningful update in months. It still works. It still responds. It's just not going anywhere. Pi is the textbook case of a product that lost its team and entered hospice care without anyone making the announcement.
Hugging Face's Assistants API. Hugging Face's experiment with an Assistants-style API — meant to compete with OpenAI's offering — quietly disappeared from the documentation [VERIFY]. The endpoints still respond but the page linking to them doesn't exist in the nav anymore. This is the standard graceful death for API products: remove the marketing, leave the endpoints up, let usage decay naturally. Nobody complains because nobody files a ticket for a product they found in the docs three weeks ago.
What Got Leapfrogged
Suno by Udio (again, and for real this time). The AI music generation race has been Suno vs. Udio for a year, with Suno generally ahead on polish and Udio ahead on raw quality. In May, Udio shipped a full song arrangement feature — not just "generate a clip" but "arrange an intro, verse, chorus, bridge, outro with coherent structure" [VERIFY]. The output isn't professional-grade, but it's the first AI music tool that understands song structure rather than just generating two minutes of plausible audio. Suno is reportedly working on something similar. But "reportedly working on" is exactly the gap this column exists to measure.
Notion AI by Coda AI. Notion's AI features have been "summarize this page" and "help me write" for over a year. Coda shipped an AI layer in May that does something Notion doesn't: it operates on structured data, not just text [VERIFY]. You can ask Coda's AI to "find all tasks assigned to engineers that are overdue and draft a status update for each one" and it actually queries the underlying database, filters, and generates contextual output. Notion's AI writes prose about your data. Coda's AI queries your data and then writes. The difference sounds subtle. It's not.
What AI Was Confidently Lying About
The post-conference season produces a special flavor of AI misinformation: models trained on demo coverage that describe announced features as if they're shipping products. Ask Claude or GPT about Google's Project Astra and you'll get a detailed description of its capabilities, presented in present tense, as though you can go use it right now. You cannot. The model isn't lying in the way humans lie — it doesn't know the difference between "Google announced this" and "Google shipped this." But the effect is the same. Users reading the output assume the feature exists.
This is the specific risk of using AI to research AI. The models are trained on the same hype cycle they're being asked to evaluate. They can't discount conference demos the way an experienced user can because their training data treats demos and launches with equal weight. Until models get better at temporal reasoning — knowing that a May 2026 announcement isn't the same as a May 2026 product — treat any AI-generated claim about "current" tool capabilities with the same skepticism you'd apply to the press release.
Sleeper Pick of the Month
Zed's AI integration. Zed, the Rust-based code editor that's been quietly building a user base on raw speed, shipped a significant AI upgrade in May [VERIFY]. The integration lets you pipe any model (Claude, GPT, local models via Ollama) through the editor with codebase-aware context — similar to what Cursor does, but in an editor that opens instantly and doesn't lag on large files. Zed isn't trying to be an "AI editor." It's trying to be the fastest editor, and it added AI as a feature rather than a identity. For developers who left VS Code for speed and left Cursor for bloat, Zed is suddenly the option that doesn't make you choose.
Month-Over-Month Trend Check
May was noisier than April but less productive. The conference season inflated the announcement count while the actual shipping count stayed roughly flat. The pattern: companies that were already shipping (Anthropic, Cursor, Runway) continued to ship. Companies that were announcing (Google, Microsoft, Stability) continued to announce. The gap between these two groups widened in May.
The consolidation signal from April intensified. More features moving into base platforms, more standalone wrappers losing their reason to exist. The AI tool market is not growing anymore — it's compressing. The number of viable, independent AI tools will be smaller in December than it is now. That's not a doomer take. That's what markets do when the underlying technology gets absorbed into platforms.
The space is not accelerating. It's maturing. Those look similar from the outside and feel very different from the inside.
This is part of CustomClanker's Monthly Drops — what actually changed in AI tools this month.