Andrew Mudd

Google's AI Swallowed 38% of Organic Clicks. Here's Where They Went.

Andrew Mudd — Wed, 27 May 2026 15:10:59 GMT

You probably saw the headline yesterday: DuckDuckGo app installs jumped 30% in a week, peaking at nearly 70% growth on iOS in a single day.

The user backlash to Google's I/O 2026 overhaul has been fast and loud. DuckDuckGo CEO Gabriel Weinberg put it plainly: "Google is force-feeding AI with no way to opt out. Their results are getting worse, not better."

That quote is interesting. But the number that actually matters to operators is 38.

A randomized field study published by Search Engine Journal found that Google's AI Overviews reduce outbound organic clicks by 38% on the queries where they appear. Researchers randomly assigned users to see or hide AI Overviews during real searches and measured actual clicks. Remove the AI Overview and outbound clicks nearly double.

Zero-click searches now account for 58.5% of all US Google queries according to SparkToro and Datos. More than half of every search session ends without a single outbound click to any website. And the Google I/O 2026 announcements added AI information agents that run 24/7 in the background on users' behalf, before the user even types a query.

This is the context your content strategy was not built for.

The model that used to work

For the last three years I've watched operators treat Google like a meritocracy. Build good content, earn a ranking, get traffic, convert customers. That model worked. A lot of businesses I see built real customer acquisition engines around it, and they were right to.

What I keep seeing now is a version of that model running quietly on fumes. Search Console shows impressions holding flat or even rising. Clicks and CTR are falling. Teams are chasing ranking positions that deliver a fraction of what they used to, because the traffic they were winning is being absorbed by an AI layer sitting on top of the results they worked to earn.

Search volume is enormous and growing. Google AI Mode crossed one billion monthly users in its first year, with queries doubling every quarter. The issue is that the rules governing what earns traffic inside all that volume just changed significantly.

What Google I/O 2026 actually changed

On May 19, Google redesigned the search box for the first time in 25 years. The traditional list of ten blue links is being replaced, query by query, with AI agents that synthesize answers, execute tasks, and now monitor topics in the background continuously on users' behalf. Personal Intelligence, which connects AI Mode to Gmail, Google Photos, and soon Google Calendar, is now live in 200 countries and 98 languages. Two people searching the same phrase from the same city can receive meaningfully different results.

Here is what that means for traffic mechanics, specifically:

AI Overviews appear above the traditional blue links, which still show below. If you rank well you still appear below the fold. But position one organic CTR has dropped from 27% to as low as 11% on queries where AI features appear, according to SISTRIX's March 2026 data. A position one ranking that used to drive a meaningful click share now drives less than half of what it did.

AI Mode replaces the traditional results page entirely. You are cited in the response, or you receive no visibility at all. There are no blue links below the fold as a fallback.

AI Overviews now appear on 48% of all Google queries as of March 2026, up from 34.5% in December 2025. Gartner projects that 25% of organic search traffic will shift to AI chatbots and voice assistants before the end of the year.

The part most operators are missing

What gets me about this data is not the traffic losses. It is what happens on the other side.

Brands cited inside AI Overviews earn 35% more organic clicks and 91% more paid clicks compared to non-cited competitors on the same queries, according to research published by Digital Applied in March 2026. Ryan Law, Director of Content Marketing at Ahrefs, observed the same pattern on LinkedIn, where his post on AI Overviews generated over 8,200 reactions from marketers who recognized exactly what they were seeing in their own Search Console data.

Citation is the new position one for queries where AI Overviews trigger. The operators who understand this are not trying to recover what they had. They are chasing a different outcome, and it is more commercially valuable than the ranking they used to optimize for.

The gap that makes this complicated: in mid-2025, roughly 76% of AI Overview citations came from pages ranked in the top ten. By early 2026, that overlap had dropped to somewhere between 17% and 54%, depending on the study, according to research aggregated from Profound, SE Ranking, and Ahrefs. Google's AI pulls from a much wider pool than the visible top ten. Reddit threads, niche authority sites, and structured data sources that may never appear in a standard top-ten result are being cited regularly.

Ranking optimization and citation optimization are now separate disciplines. Both need to be tracked. Most operators are only tracking one of them.

How to check where you stand right now

Open Google Search Console. Filter your top 20 queries by impressions. Sort for queries where impressions are rising or flat but clicks and CTR are falling. That gap is where AI Overviews are absorbing your traffic right now.

For each of those pages, check one thing: does the page open with a direct, clear answer to what someone searching that query actually wants to know, within the first 100 words? If not, that page is not structured to earn a citation.

Tanner Medina, Co-Founder and Chief Growth Officer at Launchcodex, described what this looks like in practice: "The teams adjusting fastest right now are not panicking about traffic declines. They are pulling the impression data, finding which queries trigger AI Overviews, and restructuring those pages to lead with a direct answer. That is where the citation opportunity sits."

Three concrete things to do with this

1. Run the Search Console audit this week. Pull your top 20 queries by impressions, filter for rising impressions with falling CTR. That is your exposure list. If you are not yet seeing the gap, you will see it grow over the next two quarters as AI Mode expands to more query types.

2. Restructure your highest-exposure pages to lead with the answer. The AI system reads your page and decides whether to cite you. Pages that bury key information inside a long narrative introduction are less likely to earn a citation. Write the conclusion first. Add structured data where it applies: FAQ schema, HowTo markup, Organization and Product markup. These are citation signals as much as they are ranking signals, and they serve both purposes now.

3. Add AI citation tracking alongside your rank tracking. Semrush, Ahrefs, and BrightEdge all include AI Overview appearance data now. You want to know which queries your brand is being named in, separately from which queries your pages rank for. Those are different data sets, and both matter for understanding your real search presence.

Where this gets complicated

Traffic will not recover to pre-AI baselines on queries where AI Overviews dominate. That is a known outcome to plan for, not a temporary penalty to recover from. The Gartner 25% organic traffic shift is a floor estimate.

The DuckDuckGo backlash is real, but DuckDuckGo holds about 2% of the US search market. A 30% install surge off a 2% base is notable. It is not a counterforce to what Google has built across the 90-plus percent share it holds.

Personal Intelligence adds another layer of complexity: uniform search rankings no longer represent a uniform experience for your audience. Ranking reports based on a single tracked position become less meaningful as personalization scales.

Citation optimization and ranking optimization also require different investments. Citation comes from content that only your team can produce: original research, named case studies, practitioner-led analysis, things the AI cannot synthesize from generic sources elsewhere. Generic content optimized primarily for keyword density is rapidly becoming invisible across both surfaces.

What I tell operators who ask about this

The businesses doing well in this environment have two things working for them. First, topical authority that is earned, content reflecting genuine expertise that an AI cannot reconstruct from an average across a hundred other sites. Second, accurate, machine-readable pages: current information, clean page structure, no JavaScript-locked key content that an AI agent cannot access.

This is exactly the kind of shift where mapping your specific situation matters more than a general framework. If you want to understand where your search presence actually stands against these changes, that is the kind of work we do in an AI Clarity Call: muddventures.com/book.

Andrew

Google's Persistent AI Agent Just Dropped to $100. Here's What Operators Need to Know.

Andrew Mudd — Tue, 26 May 2026 15:21:39 GMT

This week’s Google I/O 2026 had roughly 100 announcements. I want to save you from reading all of them.

The three that actually matter for operators are: Gemini Spark (a persistent, cloud-based AI agent that works in the background of your digital life), Docs Live (voice-to-structured-document inside Google Docs and Gmail), and Google Pics (AI image generation and editing built directly into Drive and Slides). We will cover all three. But Spark is the one worth spending real time on, because the architecture is fundamentally different from anything that has shipped before it.

What Gemini Spark Actually Is

Standard Gemini ends when you close the tab. Spark runs on dedicated Google Cloud virtual machines around the clock, continuing to work when your laptop is shut and your phone is locked. CEO Sundar Pichai described it at the I/O keynote as “your personal AI agent that helps you navigate your digital life, taking action on your behalf and under your direction.”

The underlying infrastructure is Google’s Antigravity 2.0 agent harness, running on Gemini 3.5 Flash. Spark operates closer to a background service than a chat window: it holds standing access to your data sources and executes work on schedules you define, without you needing to be present for each step.

That last part is the meaningful change. Every AI assistant since 2023 has required a human to prompt it. Spark is designed to prompt itself, based on conditions and schedules you set once.

Three Interaction Modes That Matter for Daily Operations

Google built three ways to work with Spark, and understanding the distinction between them matters more than any headline.

Tasks are multi-step assignments you delegate once. “Find and track interior design internships in New Orleans for this summer.” Spark breaks that down, executes the steps, and surfaces results over time without you re-prompting it.

Schedules are recurring automations you fire and forget. Tell Spark to scan your inbox every Monday morning, generate a prioritized list from it, and block focus time on your calendar. For operators who currently do that sequence manually on Monday at 8 AM, this is a straightforward trade.

Skills are the interaction mode I keep coming back to. A Skill is a reusable instruction set you build once that Spark invokes automatically going forward. The example Google published: ask Spark to read your last 50 outgoing emails, derive a personal writing style guide from them, and apply that guide automatically whenever Spark drafts something new for you.

The operators I work with who are pulling ahead in 2026 have one thing in common: they treat AI style calibration as infrastructure, not a one-time exercise. For years, getting an AI to write one good client email meant losing the context and getting a generic next one. The Skill model is the architectural fix for that problem. The style guide lives in Spark and activates on every future draft, without re-prompting.

The Data Moat Argument

Karan Girotra, a professor of operations, technology and innovation at Cornell University, told CBS News: “It knows more about you than many others because it connects to Gmail and other apps, so personal intelligence will come through in the agent.”

The implication for operators is specific. Claude Cowork is a desktop agent. ChatGPT Agent operates through a browser. Microsoft Copilot is grounded in Office 365 data. Spark is the only persistent AI agent that reads natively from Gmail, Calendar, Drive, Docs, Sheets, and Slides, without simulating user actions. It accesses that data at the source.

If your business runs primarily on Google Workspace, that is a real, non-replicable advantage. If it does not, the data moat argument does not apply to you, and the decision to try Spark looks different.

Beyond Google’s own apps, Spark launches with Model Context Protocol connections to Canva, OpenTable, and Instacart. Adobe, GitHub, Notion, and Slack are confirmed for summer 2026. Because MCP is the open standard Anthropic introduced in November 2024 and that every major AI platform has now adopted, any developer can make their product Spark-compatible without writing Google-specific integration code. The integration surface will grow faster than proprietary frameworks would allow.

Two Other Things from Google I/O That Are Worth Your Attention

Docs Live is a voice-to-document feature rolling out this summer to Google AI Pro and Ultra subscribers, with preview access for Google Workspace business customers. You speak a rough brain dump, Docs Live turns it into structured prose. For operators who use Loom videos, voice memos, or verbal briefs to capture ideas before writing anything down, this closes a step in the workflow: the transcription-to-structure gap.

Google Pics is an AI image generation and editing tool built directly into Drive, Docs, and Slides. It handles object segmentation, background replacement, text editing, and new asset creation from scratch. It is launching first to a limited group, with rollout to Pro and Ultra subscribers this summer. For operators who currently pay a separate tool or a designer to produce basic marketing images, presentation graphics, or social visuals, Google Pics will be worth watching when it hits general availability. The fact that it lives inside Drive means no export, no import, no context switching.

The Honest Tradeoffs

Spark is in beta, and Google’s own product page says so directly: “check responses, supervise closely, interrupt when needed.” That language from the company itself is a useful signal about where the product is in its maturity curve.

Before the I/O keynote, a pre-release version of the Gemini app surfaced an onboarding screen disclosing that Spark “may do things like share your info or make purchases without asking.” Google softened that language before the final launch, but as of the keynote date, no Spark-specific privacy policy existed. The EU AI Act’s Article 50 transparency requirements take effect August 2, 2026, meaning that gap is on a regulatory clock.

There is also a pending class-action lawsuit, Thele v. Google LLC, filed in federal court in November 2025, alleging that Google secretly enabled Gemini across Gmail, Chat, and Meet accounts without user consent in October 2025. The case has not been resolved.

Spark is US-only at launch. EU and UK access is pending regulatory review, with a Q3 2026 timeline projected by analysts.

For operators in healthcare, legal, or financial services who handle privileged client data, the absence of a Spark-specific privacy policy is a stop for now.

Clarence Lee, a tech entrepreneur and visiting lecturer at Cornell’s SC Johnson College of Business, gave CBS News the framing I would give any operator evaluating this today: “The first time you onboard an assistant, you don’t know how good they are, so you try them out a little bit before you hand over your credit card. You might have them draft emails or create a grocery list, so I recommend that users start that way.”

Start with tasks where the worst-case outcome is a bad draft, not a sent message or a charged card.

The $100 Question

Google AI Ultra dropped from $250 to $100 per month at I/O 2026. The plan includes Spark in beta, five times the usage limits of the $20 AI Pro tier, 20 terabytes of Google One storage, and YouTube Premium. A $200 tier exists for higher usage limits.

At $100, Google AI Ultra sits in the same bracket as Claude Max and ChatGPT Pro. It is the only plan in that range that includes a persistent cloud agent with native access to your Google data.

The math is straightforward for Google Workspace users: if you have regular workflows around inbox triage, document creation, and client communication, the Skills and Schedules features alone make $100 worth testing for one month. If you are not in the Google ecosystem, the data advantage does not apply and the right move is to wait for the integration surface to expand.

For the last three years I have worked with operators who use AI well and operators who still run their business with the same stack they had in 2023. The dividing line is almost always whether they have built AI into the workflow at the infrastructure level, or whether they are still prompting one question at a time. Spark, if it delivers on the architecture Google described, closes that gap at the platform level for anyone already in Google Workspace.

That is not hype. It is a logical consequence of having a persistent agent with standing access to the data sources your business already runs on. The beta status and the privacy gap are real, which is why the move is to test it deliberately, not trust it blindly.

What to Do With This

First, if you are a Google Workspace user and you want to try this, activate a Google AI Ultra trial and build your first Skill before you do anything else. Ask Spark to read your last 50 outgoing emails and generate a personal writing style guide. Run that style on one draft. The output will tell you within a single use whether this is ready for your workflow.

Second, set one Schedule on day one. Start with something low-stakes: a Monday morning inbox summary that flags anything with a deadline or action item. Watch how it handles your actual inbox for two weeks before you expand access to anything external.

Third, hold off on connecting Spark to anything with financial transactions, sensitive client data, or external-facing communications until the beta period matures and Google publishes a Spark-specific privacy policy. The architecture is correct. The timing is early. Those are two different things and both are true at the same time.

When you are ready to map out how tools like Spark fit into a broader AI system for your business, muddventures.com/book is where that conversation starts.

Andrew

Anthropic Embedded 15 AI Workflows Into the Software You Already Pay For

Andrew Mudd — Fri, 22 May 2026 16:21:29 GMT

On May 13, Anthropic launched Claude for Small Business: 15 ready-to-run agentic workflows and 8 connectors built directly into QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365, and Slack. The package adds $0 to your AI bill beyond an existing Claude subscription.

Those are accurate numbers. What they don’t capture is why this product is different from every other AI tool most small business owners have been quietly ignoring for two years.

For the last three years, the most common question I’ve heard from small business owners trying to get AI into their operations is some version of: where do I even start? That question reflects a real configuration problem. Every use case seemed custom. Every integration required a developer or a consultant. And the blank-page problem was real enough to stop most operators before they started.

Claude for Small Business is the first product from a major AI lab that answers that question without making you figure it out yourself. It doesn’t give you a smarter chat window. It gives you 15 specific jobs with defined inputs, approval gates, and outputs. You pick the job. Claude does the work. You approve before anything sends, posts, or pays.

That is a meaningfully different product than what’s existed before. And it matters a lot for how operators should be thinking about AI adoption right now.

What actually shipped

The package runs through Claude Cowork, Anthropic’s agentic desktop workspace, and connects to eight named partner tools at launch: Intuit QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365, and Slack. It also ships with 15 reusable skills (repeatable task components that Claude can call without rebuilding the logic from scratch), in addition to the 15 full workflows.

The workflows break into three categories. Finance and operations includes invoice chasing, month-end close, cash-flow view, payroll planning, margin analyzer, and tax-season organizer, all drawing from QuickBooks and PayPal. Sales and marketing covers lead triage, campaign analysis, Canva asset generation, content strategist, and customer pulse, all routed through HubSpot and Canva. Contracts and admin handles contract reviewer, DocuSign follow-through, business pulse, and weekly commitments, pulling from DocuSign, Google Workspace, and Microsoft 365.

The ones worth understanding first:

Invoice chasing. Claude reads open invoices in QuickBooks, cross-references payment status against PayPal settlements, drafts personalized follow-up messages, and queues them for your approval before anything goes out. For an owner managing 20+ open invoices, the realistic time return is 2-4 hours per week.

Month-end close. Claude pulls categorized expenses from QuickBooks, flags anything unusual, and drafts the summary your accountant is going to ask for anyway. Lina Ochman, who leads SMB product at Anthropic, described this workflow category as addressing the most time-intensive recurring burden for small business owners. Realistic time savings are 3-6 hours per close cycle.

Margin analyzer. Claude reads your revenue and cost data and surfaces margin by product or service line. Most small business owners make pricing and product-mix decisions on intuition rather than data. This workflow changes the information available for those calls without requiring a finance hire or a new reporting tool.

Lead triage. Claude reads inbound HubSpot data, scores leads against your actual historical close patterns, and drafts initial outreach for your review. Useful for businesses receiving 15+ leads per week where manual sorting falls behind.

A detail worth flagging about the Canva partnership specifically: Canva exceeded $500M in B2B revenue and built this integration with Anthropic specifically for the SMB bundle. The Canva asset generation workflow takes a campaign brief and produces fully editable, on-brand creative assets. For a solo operator with no design budget and no creative team, that workflow may deliver more per dollar than anything else in the package.

What it actually costs

The “$0 incremental” framing is accurate and worth unpacking.

Claude for Small Business adds nothing to your AI bill beyond your existing Claude subscription. Claude Pro at $20/month (or $17/month on annual billing) is the practical floor and covers most SMB workloads. Claude Max at $100/month only matters if you’re running intensive Claude Code work or multiple background agents concurrently. For pure workflow automation through the SMB package, Pro is sufficient.

The real stack cost, per a May 2026 review by Danar Mustafa at The Tech Society, runs $1,500-5,940 per year depending on team size and which connector tools you already pay for. For a solo owner already running QuickBooks ($30/month), HubSpot Starter ($45/month), Canva Pro ($15/month), and DocuSign Standard ($15/month), that’s roughly $125/month in existing software spend that Claude can now work inside. No new line items.

This is the business model detail that matters for operators: Anthropic is not trying to add to your software bill. The bet is that Claude becomes the intelligence layer inside the tools you already pay for. That’s structurally different from any AI product asking you to cancel something else and replace it.

Where the gaps are

This product fits a specific profile of small business and is a pass for others. Worth naming clearly.

Shopify is absent at launch, which excludes most independent retailers and e-commerce operators. Amazon Seller Central is missing. NetSuite, Xero, and QuickBooks Desktop users are second-class citizens. Zoho and Pipedrive users are out entirely. There are no industry-specific connectors at launch: no Toast for restaurants, no ServiceTitan for trades, no practice management software for legal or accounting professionals.

Availability is U.S.-focused. The 10-city workshop tour covers Chicago, Tulsa, Dallas, Baton Rouge, Salt Lake City, Baltimore, San Jose, Indianapolis, Birmingham, and a New Jersey location. International connector availability and workflow compatibility are uneven by region.

Karo Zieminski, an AI product manager and writer who published a detailed implementation breakdown of the launch, flagged the most important caution for operators considering adoption: “If your data is messy, don’t connect QuickBooks yet. Messy data doesn’t become clean data when you add AI access. It becomes messy data with an audience.”

That’s worth keeping front of mind. The margin analyzer is only useful if the cost and revenue data it reads is accurate. The invoice chaser is only trustworthy if your QuickBooks records are current. The workflows amplify whatever is already in the source system, including the errors.

The operator-level story

The pattern I keep watching in this market is the compression of agency-level work into software-layer products. A year ago, the jobs these 15 workflows cover, specifically month-end close prep, invoice management, campaign asset creation, lead scoring, and contract first-pass review, required either a configured custom AI pipeline that took weeks to build, a specialized SaaS tool for each function, or a service provider billing hourly for work that felt expensive relative to the outcome.

What gets me about this particular launch is how specific it is. Anthropic didn’t ship a vague “AI for small business” positioning statement. They shipped 15 defined jobs with named data sources and approval gates. That specificity is what separates a product operators will actually use from one that sits open in a browser tab.

Brian Ludviksen, COO of Purity Coffee, was one of the early partners Anthropic featured, and he described the outcome clearly: “Not only could it problem-solve for me, it also showed me problems I didn’t know I had.”

That second clause is the part worth sitting with. The margin analyzer doesn’t just surface information faster. It surfaces information that wasn’t being surfaced at all. Most small businesses aren’t analyzing margin by product line monthly. They’re running on feel. That workflow changes the decision information available without adding a new hire or a new piece of software.

The operators I see pulling ahead in 2026 are consistently the ones who have connected AI to their actual business data. Using it as a faster way to draft emails is the baseline. What separates the operators gaining ground is getting AI working inside the systems where the business actually lives. Claude for Small Business is the most accessible on-ramp to that habit I’ve seen for owners who haven’t built it yet.

Mike Beckham, CEO of Simple Modern, described what operational adoption actually looks and feels like: “What we used to think were the constraints are just not constraints anymore. Hours of looking at stuff that doesn’t matter are gone. I want an entire organization where everybody is using these tools daily.”

That’s not a marketing quote. That’s what happens when AI stops being a chat window and starts being a data layer.

Anthropic is also running a free on-demand course, AI Fluency for Small Business, co-developed with PayPal and available here. It walks through the basics of safe AI delegation for owners new to this. Worth an hour if you’re just getting started.

What to do with this

If you’re already running QuickBooks, HubSpot, PayPal, Canva, or DocuSign, here’s the sequencing that makes sense.

The first move is getting Claude Cowork installed if you don’t have it. The SMB workflows run through the desktop app, not the web interface. Claude Pro at $20/month covers most operators here. You don’t need Max to start.

The right first workflow is a read-only one: Business Pulse or Cash-Flow View. Both pull live data from connected sources without any write risk, and both give you calibration on how Claude interprets your actual business data before you turn on anything that drafts or sends. That calibration period matters.

Invoice chasing is the highest-return early workflow for most operators. The time savings become obvious within the first week, the approval flow is simple, and the worst-case error is a mildly awkward follow-up email you catch before it goes out.

If you want to work through which of the 15 workflows actually fits your business model, or figure out how to integrate this into what you’re already running, that’s what the AI Clarity Call is for: muddventures.com/book.

Andrew

Notion Just Made Zapier Optional. Here's the Build.

Andrew Mudd — Thu, 21 May 2026 15:33:18 GMT

One pattern I keep seeing with operators who are actually pulling ahead in 2026: they have fewer software subscriptions, not more.

Over the past three years, I’ve watched categories of tools get compressed into platforms operators already pay for. Design tools got absorbed into Canva and then into every AI image generator on the market. Copywriting tools got pulled into any halfway decent chat window. Transcription tools showed up as built-in features across video platforms, meeting tools, and phones. Each category went from “you need a subscription for this” to “it comes with the thing you already have.”

Automation middleware has been the holdout. If you run Notion as your central workspace and you want it to talk to anything else (your CRM, your support platform, your payment processor), you’ve needed Zapier or Make or n8n sitting in the middle.

Notion just changed that equation.

On May 13, Notion shipped its Developer Platform: Workers, database sync, external agent support, and a CLI. It’s the most significant thing Notion has built in years, and a lot of the operators I talk to haven’t looked at it yet. Here’s what actually shipped and what to do with it.

What Notion Workers Actually Are

Workers are small pieces of code that run on Notion’s own servers. They can connect to any external API, trigger on events in your workspace, sync data on a schedule, and execute logic that Notion’s native automations have never been able to handle.

Think of the lightning bolt automations you already use in Notion databases: when a status changes, do this. Workers do that, with no ceiling. They can call Salesforce. They can pull from Zendesk. They can hit Stripe. They can check for duplicates across thousands of records. Any API that exists is now reachable from inside your workspace.

Three ways a Worker gets triggered:

As a tool for a Custom Agent: your Notion AI teammate calls the Worker when it needs data from an external system. As a webhook: something changes in your Notion database, the Worker fires automatically, no AI tokens involved at all. On a sync schedule: pull data from an external system every 15 minutes, every hour, or once a day, into a Notion database that stays live.

You deploy Workers through the Notion CLI. Run npm install -g ntn in your terminal. Authenticate once, write your Worker code (or have Claude Code write it from a plain-English description), and deploy it to Notion’s sandbox. It runs on their infrastructure, not yours.

Ivan Zhao, Notion’s co-founder and CEO, described the vision during the launch: “Any data, any tool, any agent” in one workspace.

Three Automations Worth Building First

The use cases that Notion and early practitioners are demoing are practical enough that I’d build them this week.

Company enrichment for your CRM. When you add a new company and fill in the domain, a Worker fires and auto-fetches the company logo, industry, size, and latest news from the web, filling in those properties automatically. Zero manual data entry at scale. This is the kind of thing operators were either doing by hand, paying a separate enrichment tool for, or running through a Zapier workflow that costs you a task every time it runs.

Status-triggered workflow automation. A project status changes to “offer signed” in your database. A webhook-triggered Worker fires: creates the onboarding document from a template, pings the relevant Slack channel, and creates the first task set in your project database. Fully automated, no AI tokens burned, pure code executing in under a second. This is the Zapier workflow that used to cost you 5-10 tasks per run, now running inside your existing Notion plan.

Live external data sync. Zendesk tickets, Stripe payment statuses, Shopify order updates: all pulled into dedicated Notion databases on a schedule and kept current. Synced properties lock automatically so nobody overwrites live data. You can add your own Notion-native columns on top of them. This is the integration that used to require a $40-80 per month middleware subscription.

The Cost Math

This is the part worth sitting with before your next billing cycle.

Matthias Frank, who runs one of the larger Notion consulting operations in Europe and has been building with Workers since the private beta, published his cost calculations after early testing:

“Polling Salesforce tickets every 15 minutes: 86 cents per month. Pulling from Jira once per day: 1 cent per month. Heavy usage at 9,800 runs per month: roughly $13 per month.” (matthiasfrank.de)

Compare that to what most operators currently pay for Zapier: $19 to $69 per month depending on volume. Make runs $9 to $29. Over a year, that’s $228 to $828 for middleware that Notion Workers could now replace for a meaningful chunk of standard use cases.

Workers are free to use through August 11, 2026, then shift to Notion’s credit system. Three months of free runtime to build and test at no additional cost.

Frank’s own verdict: “Internally, we’ve already stopped building no-code automations for our own workflows... Will we still be building with Make or N8N in six months? Honestly, probably not for most use cases.”

What’s Still Rough

Workers require Notion’s Business plan or above to deploy. That’s $16 per user per month billed annually. If you’re on Free or Plus, this isn’t available to you yet.

You have to use the CLI (ntn). There’s no UI-only path to deploying Workers. That’s a real barrier for operators who have never opened a terminal. The practical workaround: describe what you want in plain English to Claude Code or Cursor, let it write the TypeScript, then deploy it through the CLI. Early practitioners report doing this with no prior coding experience.

Webhooks only trigger on property changes in Notion databases, not on content edits inside pages. Automations based on someone typing inside a doc aren’t there yet.

External Agents (which would let Claude Code and Cursor operate natively inside your Notion workspace as first-class participants) are still on a waitlist, currently in private beta. Ramp, the fintech company, has already built over 300 automations on the platform. But for most operators, the best part of the announcement is still gated.

Workers also have runtime caps, and complex multi-step workflows that chain many operations together still have limitations. n8n and Make have more raw capability for sophisticated automation infrastructure. Workers are the right replacement for medium-complexity automations you’re currently running through middleware, not a full rebuild of a Zapier setup you’ve spent five years constructing.

The credit pricing structure after August is also not fully spelled out yet. Notion says it will follow the same system as Custom Agents, and early cost examples look favorable. But building critical automations on a credit system where pricing isn’t fully locked in carries some risk.

What This Means for Operators Who Are Paying Attention

For the last three years, I’ve watched the tools that used to require a separate vendor get pulled, one by one, into the platforms operators already pay for. The pattern is consistent. It starts with “this is a specialized tool, you need a subscription.” Then someone builds a version of it into the platform you’re already on. Then the specialized tool has to justify its existence with features you’ll never use.

The operators I work with who are consolidating their stacks the fastest share one trait: they’re building new automations on their existing platforms before they reach for a new tool. They evaluate what they already pay for first.

Notion Workers is the first credible signal that automation middleware has a compression timeline. It won’t complete in six months. But if you’re a Notion Business user and you’re currently running ten Zapier workflows that trigger on database property changes, a meaningful portion of those could probably live as Workers today, at a fraction of the cost, inside a platform you’re already paying for.

What to Do With This

If you’re on Notion Business or above, three steps:

Open your terminal and run npm install -g ntn. That installs the Notion CLI. Takes two minutes.

Go to Settings, then Features, then Workers, and activate it. Free through August.

Pick one Zapier or Make workflow you run every week. Write a plain-English description of what it does. Open Claude Code, paste the description, and ask it to write you a Notion Worker. Deploy it through the CLI and watch it run inside your existing workspace.

If you’re on a lower Notion plan, the useful move right now is an audit. List every automation you’re currently running through middleware, what triggers each one, and what it does. Most SMB operators who do this audit discover that 40-60% of their active Zaps are database event triggers. That’s exactly what Workers cover. You’ll be ready to consolidate when you move up.

If you want to think through where automation fits in your current stack, and what’s worth building versus buying, that conversation is something I have with clients regularly: muddventures.com/book.

And if you want to be in the room where operators are sharing actual builds like this in real time, that’s what the Abra AI community is for.

Andrew

Your Show Rate Moves on a Decay Curve. You've Never Watched It Move.

Andrew Mudd — Wed, 20 May 2026 13:45:29 GMT

Most operators look at show rate as a single number. Your team has a 67% show rate. Last campaign hit 71%. Your competitor claims 80%. The number lives in a spreadsheet, gets reported in a weekly stand-up, and moves up or down a couple of points every month for reasons nobody on the team can isolate.

The reason nobody can isolate it is that show rate moves on a decay curve, and the curve has shape that’s both real and measurable, sitting right there in your CRM data, waiting for somebody to actually look at it.

The math underneath

The longer a lead sits between booking the call and the call itself, the lower the chance they actually show up. The decay is real, it’s measurable, and most operators have never plotted it.

A lead books today and the call is in the next hour, you’re at roughly 90% chance they show, about as high as the math will let you get. The emotional commitment that drove the booking is still hot, and nothing has happened in their week to crowd it out yet.

Move the call out to 24 hours, the chance drops into the low 80s. 48 hours, low 70s. 72 hours, low 60s. After three days, the curve goes vertical and show rate cuts in half or worse, because by then the emotional commitment that drove the booking has dissolved. The reasons they booked are now competing with twenty other reasons that came up in their week.

So when you tell me your show rate is 67%, what you’re really telling me is the average gap between your bookings and your calls. The number is a downstream effect of the timing.

Why most operators have the wrong mental model

Two businesses with identical sales pages and identical closers can have a 30-point show rate gap just based on how fast they’re getting leads onto a call. Same offer, same close rate, same ad spend, completely different show rate, because one of them books leads into a call inside 24 hours and the other books into next week.

Your overall show rate is mostly a measure of how fast you’re getting leads onto a call. The sales page matters, the closer matters, the offer matters, but the timing variable dominates everything else when you actually plot the data.

A 24-to-72-hour drift in your average booking-to-call gap costs you about 15 points of show rate. On 100 booked calls a month, that’s the difference between 80 calls that actually happen and 65 calls that actually happen. Multiply that fifteen-call gap by your average ticket times your close rate. That’s the number every operator should put on the wall.

Why the decay happens

Three things are working against you in those 72 hours.

Emotional cooldown. The booking happened in a specific emotional state. They were on your sales page, the pitch was fresh, the offer was right in front of them. The further the call sits from that moment, the further they are from the state that drove the booking. By Friday afternoon they barely remember what made them book on Tuesday morning.

Competing context. Real life floods the gap. A work crisis pops up on Wednesday, the kid has something on Thursday, and by the time your closer dials Friday morning, your call is competing with whatever else became the loudest thing in their week.

Doubt accumulation. The longer they have, the more time they have to second-guess. They read another sales page, they talk to their spouse, they Google your reviews, and every hour of available second-guessing time works against you.

None of these are fixable through coaching the closer, they’re structural problems sitting upstream of the call. The fix lives in the calendar settings, the booking flow, and the post-booking sequence, all the parts of the funnel that happen before the closer ever picks up the phone.

Where the math gets messier

A few places this gets less clean.

Lead source matters. The decay curve is steeper on paid traffic than on referral traffic. Paid leads booked the call from a single ad click in a hot moment. Referral leads have more baked-in trust and a more grounded reason to attend, so they decay slower. If your funnel is mostly referral, the slope is gentler. If you’re cold paid, the math hits harder.

Ticket size matters. A $497 offer that someone booked impulsively decays the same way as anything else. A $5,000 offer they booked deliberately has more cognitive investment behind it, so the lead can survive a longer gap. But it’s still decaying, just on a flatter curve.

Some leads need the gap. A small percentage of high-intent leads actually convert better with a 24-to-48-hour window because they use the time to do their own due diligence, and the people who run that process and still show usually become your best-fit closes. So squeezing every booking into a same-day slot isn’t the goal. The goal is reducing the AVERAGE gap, not the gap on every individual booking.

You can’t just shorten calendar availability. If you only offer same-day slots, you lose qualified leads who genuinely can’t make it. The fix is denser availability, more daily slots, more closer coverage, plus a post-booking sequence that keeps the lead warm through the gap when same-day isn’t possible.

Three things to do with this this week

One: pull your last 30 days and plot it. Export your booked calls and show outcomes from your CRM. Put booking timestamp in one column, call timestamp in another, hours between them in a third, show or no-show in a fourth. Sort by hours. Look at the show rate at every gap tier. The curve will be obvious. If it doesn’t look like a curve, your sample is too small, run it on 90 days instead.

Two: calculate your average gap. Add up the hours between booking and call for every booked call in the last month, divide by the number of calls. That number is the lever you’re moving. If it’s over 48 hours, you have meaningful show-rate upside available without changing anything else about your funnel.

Three: pick the single biggest fix and ship it. Calendar density (more daily slots), closer coverage (more closers available across the week), confirmation page (does it offer a faster reschedule if same-day works), pre-call sequence (does anything actually fire in the 72-hour window). Pick the one with the lowest cost to ship this week, ship it, measure for two weeks, then pick the next one.

If you want to look at your specific funnel and figure out where the biggest show-rate lift is sitting in your math, muddventures.com/book is the fastest way to map it.

And if you want a place where operators are actively running these post-booking plays with AI in the background every week, the Abra AI community is where that conversation lives. Join at whop.com/abra-ai.

Andrew

Google Rewired Its Productivity Suite. Most Businesses Haven't Turned It On.

Andrew Mudd — Mon, 18 May 2026 17:50:24 GMT

Google I/O kicks off tomorrow. Tech media will spend the next 48 hours covering model benchmarks, demo videos, and developer keynotes.

Most of it won’t matter to you if you run a real business.

But here’s what does: the most significant change Google made to its productivity suite in years already shipped three weeks ago, buried inside a Google Cloud conference announcement. Almost no one in the SMB world noticed.

On April 22, Google introduced Workspace Intelligence. It’s a new AI layer that connects every app in your Google Workspace account into a shared context system. Your Gmail knows about your Drive files now. Your Chat knows about your Docs. Gemini can search and reason across all of it at once.

If your team lives in Google Workspace, most of what I’m about to describe is already available to you, inside the plan you’re already paying for.Why this is different from the AI in Workspace you’ve tried before

For years, Gemini in Google Workspace operated app by app. Gemini in Gmail knew about your inbox. Gemini in Drive knew about your files. But they didn’t talk to each other. If you asked Gemini in Gmail “what did we decide about the Harrison project?”, it could only search your email. It had no idea what was in the Drive folder or the Chat thread.

Workspace Intelligence changes the underlying architecture. It creates what Google is calling a “semantic layer” that maps emails, chats, files, calendar events, and collaborators into one shared knowledge graph. Every Gemini interaction inside Workspace now has access to the full picture.

Yulie Kwon Kim, VP of Product for Google Workspace, described it this way in the April 22 announcement: “Workspace Intelligence delivers unified, real-time understanding to power agentic work. It is a secure, dynamic system that inherently understands complex semantic relationships within your Workspace apps.”

The operational translation: Gemini can now answer complex questions about your business by pulling context from multiple apps simultaneously, without you specifying where to look.

For the last three years, the operators I work with have been asking the same question: “How do I get AI that actually knows my business?” The answer has usually involved building custom integrations, training workflows, or stacking multiple tools. Workspace Intelligence is the first time Google has shipped something that actually moves toward that answer for the people who already live in their ecosystem.

The four features worth turning on now

Ask Gemini in Chat

This is the most immediately useful thing to explore. Google Chat now has an “Ask Gemini” command interface that functions as a unified launchpad for work across your entire Workspace.

From Chat, you can ask for a daily briefing that surfaces your urgent emails, unread threads, and priority action items. You can ask it to find any file in Drive using a plain English description. You can ask it to schedule a meeting for everyone on a thread, or generate a doc or slide deck directly from the conversation.

The feature also connects to Asana, Jira, and Salesforce. If your team uses any of those tools, Gemini can now pull external project context into a conversation alongside your Workspace content.

The pattern I keep seeing with operators who are actually saving meaningful time with AI tools: they pick one command interface and build the habit of going there first before opening six tabs. Ask Gemini in Chat is the strongest version of that habit Google has ever shipped.

Drive Projects

Drive Projects is a new organizational layer that lets you bundle related files and emails into a shared context space. When you create a Drive Project, everything inside it becomes a unified context for Gemini.

The practical value is that Gemini stops giving you surface-level summaries based on whichever document you happen to have open, and starts synthesizing everything related to a project.

Design firm WATG used this kind of setup to cut proposal generation from days down to minutes, according to Google’s case study coverage from Cloud Next. That’s not a company with 10,000 employees. That’s an architecture and design firm that decided to build AI into an existing workflow.

The move is to create one Drive Project this week for your most active client or internal project, add the relevant files and emails, then ask Gemini a specific operational question about it. That single test will tell you whether the context-sharing is working for your setup.

AI Inbox and AI Overviews in Gmail

AI Inbox gives you a filtered view of what actually matters in your inbox, based on your work patterns and active projects. AI Overviews in Gmail search synthesizes multiple email threads into one summary answer, rather than surfacing individual messages.

If you’ve been spending time hunting through old Gmail threads for decisions and context, AI Overviews turns search from “find the email” to “tell me what was concluded.” For operators managing high email volume across multiple clients or projects, this is a real hours-per-week change.

Gemini in Sheets, Docs, and Slides

Sheets now builds or edits entire spreadsheets from natural language descriptions, including data imports from HubSpot and Salesforce. Docs generates infographics pulled from your business data, handles comment triage, and can edit a document based on feedback left in comment threads. Slides will soon generate full editable decks in one pass using your company templates.

None of this is magic. But for operators who spend significant blocks of time in these three apps every week, these are real reductions in setup and formatting time.

What operators who are using this are saying

Sharon Prosser, VP of SMB sales at Google, spoke at Cloud Next about where things actually stand for small businesses: “We’ve got double-digit millions of SMBs using Google Workspace, with AI at their fingertips. Customers can use Gemini Enterprise as an AI agent platform and they can get it up and running in minutes.”

She also said something that maps directly to what I watch happen with real operators: “The light bulb has to go off quickly for an SMB. Every minute counts. Being open to helping us understand their pain points where workflows are maybe a little bit more arduous or paper intensive, and with Gemini Enterprise, it’s a perfect way to kick off the AI experience.”

The operators I’ve talked with who’ve started using Workspace Intelligence describe the same experience: the first week, they’re skeptical because the answers aren’t that different. The second week, after they’ve organized their Drive Projects and connected their key tools, the answers start pulling from the right context. The third week, they stop thinking about it as a feature and start thinking about it as part of how work flows.

Tirol, a 52-year-old Brazilian dairy company, used Gemini Enterprise to build an interactive knowledge bank that democratized access to supply chain data for workers across the organization. That’s not a tech startup. That’s a half-century-old food business using a Google Workspace add-on to solve a real operational problem.

The honest tradeoffs

Workspace Intelligence is useful in practice, but it comes with real limitations.

The quality of Gemini’s answers is directly proportional to how organized your files actually are. If your Drive has thousands of loosely named documents with no folder structure, Gemini will pull messy context and return messy answers. This is less of a product limitation and more of a forcing function: Workspace Intelligence makes the disorganized-Drive problem expensive in a new way.

Some features, including Slides deck generation and parts of the Sheets integration, are still rolling out and availability varies by plan. The rollout has been uneven, which means checking your specific plan’s feature list before building workflows around something is the right move.

The Asana, Jira, and Salesforce connectors are new. They’re useful for surface-level questions but won’t replace a well-configured standalone query against those tools. Treat them as context supplementation, not full replacement.

And if your team operates on Microsoft 365 rather than Google Workspace, the honest comparison is real: Microsoft Copilot is doing similar things at the Office layer. I covered the Office integration in Issue #018. If your team splits between both, a decision about which ecosystem to standardize on is more urgent than it was six months ago. Hybrid isn’t a good long-term position as these tools get more capable, because context doesn’t transfer well across ecosystems.

What to do before Google I/O closes out tomorrow

Three concrete steps for this week:

One: Go into Google Chat today and find the Ask Gemini button. Ask it for a daily briefing, then ask it to find a specific file you know exists somewhere in Drive. This is the fastest test of whether the context layer is actually working for your setup.

Two: Create one Drive Project for your most active current project or client account. Pull in the relevant files and recent email threads. Then ask Gemini a specific question about the project status. Compare the answer you get to what you would have gotten from Gemini in Gmail alone. That comparison tells you what the unified context is buying you.

Three: Watch the Google I/O keynote tomorrow at 10 AM PT. Expected announcements include new Gemini model capabilities and additional agentic features for Workspace. If you’ve already started using Workspace Intelligence, the new features will layer in cleanly. If you haven’t started yet, I/O will add more reasons to.

If you’re not sure where to start with what’s already in your Google Workspace account, that’s the kind of clarity session I run with operators. Book a time at muddventures.com/book and we can map out exactly what to turn on for your specific setup.

Andrew

Your AI Gets Dumber the Longer You Use It in a Single Session

Andrew Mudd — Wed, 13 May 2026 16:34:56 GMT

Something happens in most AI work sessions around the 30 to 40 minute mark. The output quality slips. Responses get vaguer. The model starts missing things that were stated clearly earlier, or contradicts something it said 10 messages ago. Most people hit reprompt a few times and eventually open a new chat.

The cause has a name: context rot.

Chroma's research team spent 2025 running the most systematic test of long-context AI performance published to date. They tested 18 frontier models, including GPT-4.1, Claude Opus 4, Gemini 2.5, and Qwen3. Their finding: “models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.” Every model. Every length increment tested.

For operators running AI-assisted workflows daily, this is the mechanism behind quality problems that usually get blamed on prompting, model choice, or user error. The model is working as designed. What is breaking is how sessions are structured.

Why the quality drops

When you send text to a language model, it does not read linearly. It computes relationships between every token and every other token simultaneously. At 10,000 tokens, that is 100 million pairwise computations. At 100,000 tokens, where a 40-minute AI session often lands, it is 10 billion.

As context grows, attention dilutes. Each relevant piece of input receives proportionally less computational weight against all the other tokens in the window. The model still processes everything. It just cannot attend to any of it as precisely.

Researchers at Stanford documented a second mechanism in a 2024 paper: the “lost-in-the-middle” effect. Models follow a U-shaped attention pattern across long inputs. They are reliable at the start, reliable at the end, and measurably weaker in the middle. In multi-document tests with 20 documents, accuracy dropped by more than 30 percent when relevant information was placed in middle positions versus the beginning or end of the input. Same model. Same information. Different position. Thirty-plus percent accuracy gap.

The practical implication: anything you add mid-session is processed at a fraction of the quality it would receive at position 1. Your system prompt at the top gets solid attention. A key piece of context you paste at message 15 competes with everything that came before it.

Chroma also documented a third mechanism called distractor interference. Semantically similar but irrelevant content degrades model performance beyond what context length alone explains. Every time you paste in background material that is related to the topic but not the specific task, you are adding noise the model has to compete against. The Chroma team found this effect compounds: four semantically close distractors hurt more than one, and the damage is not predictable in a straight line.

The spec sheet number is the wrong number

Every frontier model advertises a context window. That number is the technical ceiling: the maximum the API accepts without error. Almost no lab publishes the effective context length, the point at which the model still reasons reliably over what it received, not merely accepts tokens without error.

A May 2026 benchmark analysis tested multi-needle retrieval tasks (the kind that resembles real business work, finding and integrating multiple pieces of information across a long document) and found: “the gap between what a model accepts and what a model can reliably use is enormous, and almost nobody in marketing pages is being honest about it.”

The numbers from those benchmarks: Claude Opus 4.6 at 128K tokens with 8 retrieval targets scores 93 percent. At 1 million tokens, it drops to 76 percent. That is the category leader. Most other models sit in the twenties and thirties at 1 million tokens.

Llama 4 Scout launched in April 2025 with a 10 million token context window as its headline feature. On long-context reasoning benchmarks, it scores 15.6 percent. The model with the largest advertised context window posted the worst long-context reasoning numbers.

The honest translation of most context window spec sheets: the effective working range is roughly 50 to 70 percent of the advertised number, with continuous quality degradation starting from the first token, well before you near the ceiling.

For three years I have watched operators select models based primarily on advertised context window size, treating it as a proxy for capability. It is a marketing specification. The effective context is what you are actually working with, and almost no one publishes it in any useful way.

The 35-minute wall

Research on long-running AI agents identified a consistent threshold: agent success rates drop after approximately 35 minutes of continuous operation. The failure relationship is non-linear. Double the session length and the failure rate quadruples, because context rot is self-reinforcing.

The loop works like this: a longer session means more accumulated context. More accumulated context means worse output quality. Worse output quality means corrections and re-prompting. Corrections add more context. The cycle accelerates. Sessions tend not to fail gradually. They hold up reasonably well for a while, then drop.

Cognition measured that in long-running AI agent tasks, over 60 percent of the first turn is spent just retrieving context, not reasoning, not producing output. Retrieval. Every search result, every file read, every exploration that turns out to be a dead end stays in the context window for the rest of the session, accumulating like sediment.

For operators using AI for proposals, research, long-form content, or multi-step client deliverables, the 35-minute threshold is a real operational constraint. The sessions that feel like they went off the rails usually hit this point. The model held up as long as it could given what was in the context. The session structure created the conditions for failure.

The assumption that makes this worse

The pattern I keep seeing with operators who use AI every day is a belief that more context makes for a smarter session. Feed the model the full background document, prior conversation history, everything potentially relevant, and the model has more to work with. That feels right.

What actually happens: every piece of context you add lowers the signal-to-noise ratio on the input that matters. A crisp system prompt at the start of a session competes with 15,000 tokens of accumulated exchange for the model’s attention by message 25. The noise floor rises. The model is getting more distracted as the session continues, not smarter.

Anthropic's engineering team has published on what they call “context engineering,” specifically the discipline of curating the optimal token set during inference rather than maximizing what gets loaded in. Their research shows 80 percent of performance variance in long-context tasks comes down to how well the context is managed, not which model is being used.

That reframing is worth sitting with. The capability gap operators try to solve by switching models or paying for larger context windows is often a context management problem. The model has the capability. The session design is limiting what the model can do with it.

Where this legitimately breaks down

Context rot is architectural in current transformer models. Chroma ran their tests specifically to find exceptions and found none across 18 models. This will likely improve as AI architecture evolves, but there is no model on the market today that is immune to it.

RAG (retrieval-augmented generation, where you pull only the relevant documents rather than loading everything into context at once) helps substantially. A well-built RAG setup means the model gets a small, precise slice of relevant information rather than a haystack it has to search through. The catch: most SMB operators are not running RAG pipelines. That requires developer setup and ongoing maintenance, and the gap between what the vendor marketing implies and what it actually takes to implement is real. It is not the quick fix it is often described as.

Starting fresh sessions resets the rot but creates a continuity cost. The model loses the prior conversation. Sometimes that means re-establishing context that took time to build. Short sessions solve one problem and introduce another. There is a real trade-off and no workaround that fully eliminates it in today’s tools.

There is also no single number from a model card you can trust for your specific use case. Effective context length varies by task type, input structure, and what is actually in the context window. Until labs publish multi-needle retrieval benchmark scores at multiple context lengths as a standard part of their model documentation (most do not), you are working with incomplete information on a variable that directly affects your output quality.

Three adjustments worth making

Work in shorter sessions. Thirty minutes of focused AI work in a clean session, carrying only the output into the next session rather than the full conversation, consistently produces better results than one long session where context accumulates. This is the highest-leverage adjustment most operators can make without changing tools, prompting strategy, or workflow.

Front-load what matters. Critical constraints, the specific task, key requirements: those go at the top of the prompt every time. The lost-in-the-middle research is precise on this. Whatever you place in middle positions of a long context processes at a fraction of the attention it gets at position 1. If something matters to the quality of the output, it belongs at the top, not buried after background context.

Narrow the context deliberately before each session. Before you start, ask what the model actually needs to know for this specific task. Paste in that. Leave out the background material that might be useful. Every additional document or prior message you add is competing with the signal that actually matters for the task at hand.

If you are building anything on AI APIs, customer-facing agents, internal tools, automated outbound, and you are not currently measuring output quality as session length or context size increases, that is the first metric worth adding. Build a simple test with your actual output type and run it at 10 minutes, 20 minutes, and 35 minutes. See where your specific workflow starts to slip.

If any of this maps onto how you are using AI in your business and you want to work through your specific workflows, that is what I do in an AI Clarity Call at muddventures.com/book. And if you want to go deeper with a community of operators working through the same problems every day, whop.com/abra-ai is where that conversation lives.

Andrew

Claude just patched the biggest hole in every agent workflow ever built

Andrew Mudd — Tue, 12 May 2026 13:58:52 GMT

Your agent runs great for a week.

Then it starts failing. Not catastrophically, just quietly. It forgets a file format quirk it handled fine on day three. It stops following a tone preference that worked in session twelve. You go back in, reteach it, it runs clean for a few days, then it forgets again.

You are not fixing a bug. You are refilling a leaky bucket. For the last three years, this has been the hidden tax on building with AI agents. The tools get smarter. The models get better. But the maintenance load stays the same, because nothing carries forward between sessions except what you manually wrote into the system prompt when you set the thing up.

That changed on May 6.

Anthropic ran Code with Claude in San Francisco. Most of the coverage went straight to the model news. The part worth paying attention to as an operator was buried under it. Three features shipped the same day for Claude Managed Agents. Dreaming, outcomes, and multiagent orchestration. Two are in public beta. One is in research preview. Together they cut the maintenance cost on running an agent that actually keeps working.

What “Dreaming” Actually Does

The name sounds like marketing. The mechanic is actually straightforward.

Dreaming runs in the background between sessions. It reads what the agent did in recent jobs and scans for three kinds of patterns. Recurring mistakes the agent keeps making. Workflows it has converged on across different jobs. Preferences that show up across your team of agents. Then it rewrites the agent’s memory store based on what it finds. Old notes get condensed. Important ones get promoted. The next session starts with a curated set of notes from the agent’s own past instead of a blank slate.

A few things worth flagging.

Model weights don’t change. Dreaming is structured note-taking applied to an agent’s persistent memory, not training. The agent gets a text summary of what worked and what failed. You can let dreaming update memory automatically, or you can require human review before any change lands. On anything high-stakes, the latter is the right default.

Harvey, the legal AI startup, ran dreaming in pilot before the public launch. According to Anthropic’s announcement, Harvey saw task completion rates rise roughly 6x in internal testing. The root cause was mundane. Their agents kept forgetting filetype quirks and tool-specific workarounds between sessions, so the same legal-drafting jobs failed in the same way over and over. Dreaming made the workarounds stick.

That’s one data point and Anthropic didn’t ship an independent benchmark next to it. Harvey’s workflow is long-form legal drafting, which is the exact shape of problem where persistent memory pays off most. On simpler stateless tasks, the lift is going to be smaller.

Outcomes: The Feature With the Most Immediate Use

Dreaming is a research preview. Access is by request and it’s not production-ready for most operators yet. Outcomes is in public beta and worth using right now.

Here’s how it works. You write a rubric in plain language describing what a good output actually looks like. The agent does its work. A separate grader runs in its own context window, scoring the output against your rubric without picking up the agent’s reasoning along the way. When something falls short, the grader tells the agent exactly what to fix and sends it back for another pass. You can wire this to a webhook too, so the agent runs, the grader signs off, and you get notified only when the output meets your criteria.

Anthropic’s internal benchmarks show outcomes improved task success by up to 10 points over a standard prompting loop. File generation quality rose 8.4% on .docx outputs and 10.1% on .pptx outputs. Those are Anthropic’s own numbers so apply the appropriate discount, but the direction is consistent with what practitioners are reporting externally.

Wisedocs, a document-review startup, built a quality check agent using outcomes to grade each review against their internal guidelines. They reported reviews now run 50% faster while staying aligned with their team’s standards. The speed gain comes from cutting the back-and-forth. The agent self-corrects before a human reviewer ever sees the output.

What gets me about outcomes is how much of the current agent-quality problem it addresses without technical complexity. The reason most operators are still manually checking agent output isn’t because they distrust Claude. It’s that there’s no way to tell Claude what “good” looks like in a way that persists across runs. A rubric is that mechanism. Writing it forces you to get specific about your standard, which surfaces assumptions you probably hadn’t made explicit.

Multiagent Orchestration: When a Single Agent Hits Its Ceiling

Multiagent orchestration is for workflows that have grown beyond what one agent can handle well.

Here’s the setup. A lead agent breaks a complex job into pieces and delegates each piece to a specialist subagent with its own model, prompt, and tools. Up to 20 specialists run in parallel on a shared filesystem. The lead agent can check back in with subagents mid-workflow. Every agent’s activity is individually traceable in the Claude Console.

Netflix’s platform team is already using it. They built an analysis agent that processes build logs from hundreds of source repositories. Multiagent orchestration lets the lead agent fan the batch out to subagents scanning in parallel, reporting back only the patterns worth acting on. What would have been a serialized scan across a huge codebase runs in a fraction of the time now.

Spiral, a writing tool built by the publication Every, runs the same structure differently. A Haiku-based lead agent handles incoming requests and delegates drafting to Opus-based subagents. When a user asks for multiple drafts, the Opus subagents run side by side. Drafts only reach the user if they clear an outcomes rubric scored against Every’s editorial principles.

That combination shows how these features layer. Multiagent orchestration handles the work distribution. Outcomes handles the quality check. You get parallel speed without dropping the bar.

For most SMB operators, multiagent orchestration is a month or two away from being the right next step. It makes sense once you have a workflow that’s consistently producing clean output but taking too long because it’s processing things one at a time. If your current agent is still producing inconsistent output, fix that first with outcomes before adding parallel complexity.

The Honest View on What Is Still Rough

A few things to hold in mind before going deep on any of these.

Dreaming is a research preview. You can request access but it’s not production-ready, and Anthropic has flagged the security risk clearly. Giving agents structured persistent memory expands the attack surface for prompt-injection. If a malicious input convinces an agent that a wrong instruction is correct, dreaming can consolidate that wrong instruction into long-term memory where it applies to future sessions. Human review on memory updates is the right default for any workflow where that risk matters.

Harvey’s 6x number is compelling and should be treated skeptically. It’s Anthropic’s customer publishing a result on Anthropic’s platform at Anthropic’s developer conference. The result might hold up broadly. It might also be specific to the legal-drafting workflow where Harvey had an unusually clear before-and-after. Real-world results will vary by workflow type.

Outcomes and multiagent orchestration are in public beta. Things will break. Edge cases will surface. The docs are solid but still evolving.

Managed Agents pricing runs at $0.08 per session-hour plus token costs. For light workloads that’s basically nothing. For heavy parallel orchestration across a fleet of agents, it adds up fast. Run the math before you scale that.

What To Do With This

First: Set up an outcomes rubric for one existing agent workflow this week. Pick a workflow where you’re currently reviewing output manually before using it. Write 3 to 5 plain-language criteria describing what “good” actually looks like. Attach it to your Managed Agent. The rubric forces you to articulate your standard, which is useful even before the grading loop shows results.

Second: Request dreaming access if you have an agent that runs the same workflow repeatedly. The access form is in the Claude Managed Agents docs. If your agent keeps failing in familiar ways between sessions, dreaming is built exactly for that problem. Use human review on memory updates until you have enough confidence in what patterns it’s surfacing.

Third: Skip multiagent orchestration for now unless you already have a clean, high-volume workflow. The complexity overhead isn’t worth it until your single-agent setup is producing consistent output. Build the foundation with outcomes first.

The pattern with operators who pull ahead isn’t that they chase every new feature. It’s that they find the specific friction point in their current workflow and fix that one thing. Right now, for most people running agents, that friction point is output quality variability and session-to-session memory loss. Outcomes and dreaming are the most direct tools that have ever shipped for both of those.

If you want to think through whether Managed Agents are the right architecture for what you’re building, that’s a good conversation to have before you start wiring things together. muddventures.com/book

Andrew

The Sales Layer Almost No One Is Using AI to Build

Andrew Mudd — Mon, 11 May 2026 14:48:30 GMT

Most operators I talk to are using AI in their sales process the same way.

They use it to write emails. They use it to score leads. They use it to summarize call transcripts. They use it to draft proposals.

All of that is sales execution. AI writing the outputs that humans used to write.

That is a useful use case. But it is not the biggest one.

The biggest one is sales development. And almost no one is using AI for it yet.

What sales development actually means

Sales development is the layer between getting a lead in your funnel and closing them.

For a booked call funnel, sales development is everything that happens after someone fills out an application but before the actual call.

It is the confirmation page they see when they book.

It is the email and SMS sequence that fires across the 72 hours before the call.

It is the brief your closer reads the morning of the call.

It is the recovery flow that fires if they cancel or no-show.

And it is the weekly review that tells you which piece of the system is leaking.

That layer is the most under-built layer in most call funnels. Operators obsess over the offer, the page, the closer, and the ads. The development layer in between gets duct tape and one-off prompts.

Walk through the gaps

Picture the moment after a high-intent prospect clicks “book.”

They have watched your ad three times. They have read your application carefully. They are at peak interest.

Then they land on a confirmation page that says “thanks, see you soon.”

The hottest moment in your funnel and they hit a brick wall.

For the next 72 hours they sit in silence. Their motivation cools. A competitor’s offer shows up in their feed on Wednesday. A life event eats their Thursday. By Friday morning they barely remember booking the call.

Friday morning the call is at 10am. Your closer opens the calendar event. Name and email. Maybe a few questions from the application. They hop on Zoom and wing the rest.

If the prospect doesn’t show, the deal gets dropped into the no-show pipeline stage and sits there for 60 days. Your CRM might fire a basic rebooking email. The prospect that cost you $100 to $200 in ad spend to book becomes a dead deal.

Then Monday morning you check show rate. It moved 5 to 10 points compared to last week. You can’t tell why. Was it the page, the closer, the cadence, a holiday Tuesday, a Wednesday spike in cancels? You don’t have signal to isolate the cause.

That is the sales development layer. And almost every operator I know is leaving it for last.

How most people use AI in this layer today

If they use AI here at all, they use it to write outputs.

ChatGPT drafts the confirmation page copy. A custom prompt writes the pre-call email sequence. Claude summarizes the application notes into a one-line brief.

Each of those is fine in isolation. But it is still sales execution. AI writing pieces of the system. The operator still has to glue the pieces together, run the workflow, and iterate when something breaks.

The bigger unlock is using AI to build the system itself. Not the output, but the scaffolding around the output.

AI as build partner

This is the shift.

You stop using AI to write a confirmation page. You start using AI to architect the entire confirmation page rebuild process, plug it into Webflow or GHL, and stay there as a collaborator while you iterate.

You stop using AI to draft a single brief. You start using AI to build the briefing system that runs on every booked call and lands in your closer’s Slack the morning of.

You stop using AI to write one email. You start using AI to scaffold the 72-hour pre-call sequence as an asset that lives in your existing email and SMS platforms and runs forward on every new booking.

You stop using AI to handle one no-show. You start using AI to build the recovery sequence that fires automatically the second any prospect cancels or no-shows.

You stop using AI to summarize one application. You start using AI to scaffold the weekly review process that ranks every leak in your funnel and recommends the fix.

The output stops being a one-time artifact. It becomes infrastructure that lives in your stack.

That is the difference between AI doing the work and AI building the system that does the work.

What this can look like in practice

For the last few weeks I have been refining a five-part build that does exactly this.

The first part is a confirmation page rebuild. AI architects the warming surface. Hero video, breakout videos mapped to your top call questions, full copy in your voice. Then loads it into whatever stack you already use.

The second part is the 72-hour pre-call sequence. AI builds the email and SMS cadence into your existing platforms so the silent window stops being silent.

The third part is the morning-of closer brief. AI builds the briefing system that lands a structured note in Slack the morning of every call. Identity, application answers, objection map, opening frame, confidence score.

The fourth part is the no-show recovery flow. AI builds the trigger from your CRM, the reschedule path, and the coordinated email and SMS so a cancelled booking either comes back warm or routes into a long-term nurture.

The fifth part is the weekly leak review. AI builds the ranked report that lands in your inbox Monday morning telling you what moved show rate week over week and what to focus on.

All of it is built in collaboration with you, across a series of sessions inside Claude Cowork or Codex. Not a one-shot prompt. Not a single email draft. A long-term integration that scaffolds your sales development layer and stays there.

The economics

Building a layer like this manually takes about a quarter of focused dev time. Hiring an agency to do it starts around $5,000 a month and runs 3 to 6 months to deploy.

Using AI as your build partner to scaffold it as a long-term integration in your own stack costs an order of magnitude less and stays available the next time you want to iterate.

That is the leverage shift. The system gets built in days instead of months. It gets iterated in real time instead of via change request. And the operator stays in control of every decision the system makes.

The bigger frame for offer owners and business leaders

Sales execution AI is everywhere. Sales development AI is wide open.

If you are an offer owner or a business leader running a sales operation, the question to ask yourself is not “how can AI write my emails faster.”

It is “how can AI build the system that scaffolds my entire sales development layer.”

Different question. Different leverage.

The operators who figure this out first are the ones who compound their show rate gains, recover the no-shows their peers are losing, and walk into every call with a closer who is briefed and ready before the prospect even pulls up Zoom.

The operators who keep using AI for one-off outputs will keep getting one-off outputs.

What to do this week

If you want to talk through where AI could plug into your sales development layer, book an AI Clarity Call at muddventures.com/book.

And if you want the actual five-part build I have been describing throughout this issue, it is live at Showtime.muddventures.com for the launch window.

It is called the Showtime Skill Pack. It plugs into Claude Cowork or Codex, scaffolds the post-booking layer across a series of build sessions, and stays in your stack as a long-term integration.

$27 once. Updates included when future versions come out.

Claude Now Runs Inside Excel, Word, and PowerPoint. Here's the Workflow Worth Setting Up First.

Andrew Mudd — Thu, 07 May 2026 16:19:47 GMT

Tuesday got a lot of AI coverage because of the GPT-5.5 Instant swap. Buried in the same news cycle, Anthropic confirmed something that matters more for most operators: Claude now covers the full Microsoft Office suite.

Excel. Word. PowerPoint. All three have native Claude add-ins. All three share context with each other in a single conversation. You analyze a spreadsheet in Excel, switch to PowerPoint, and Claude already knows the data. Switch to Word, and it knows both. No copy-pasting. No re-explaining. One continuous session across three applications.

Outlook is next.

Most operators who use Microsoft 365 every day have no idea this exists. This is worth three minutes of your time.

What each add-in actually does

Claude for Excel shipped in beta in October 2025 and became available to all Pro subscribers on January 24, 2026. The key capability: Claude reads your spreadsheet at the workbook level, not cell by cell. It understands multi-tab workbooks, explains calculations with cell-level citations, and writes formulas from plain English. You describe what you want to calculate. It writes the formula. You keep your formula dependencies intact.

That last part matters more than it sounds. Most AI tools that touch spreadsheets either break formulas or work in isolation from the workbook’s structure. Claude reads the model as a whole.

Claude for PowerPoint launched March 11, 2026, with something most slide tools get completely wrong: it reads your entire template: the slide master, the layouts, the fonts, the colors. When you ask it to build slides, the output follows your brand guidelines and uses your actual layout structure. The visuals it produces are native editable PowerPoint objects, not static images.

Claude for Word landed April 13, 2026. The standout: edits appear as tracked changes. If you’re using Word for contracts, proposals, or any document where version control matters, this is how it should work. Claude marks what it changed. You decide what to accept.

The part most operators are sleeping on

The three add-ins share context.

That sounds like a minor technical detail. In practice it changes the workflow entirely.

Here’s what I mean. For three years, the pattern with AI and Office work looked like this: you open Claude in a browser tab, paste in your data, get output, copy the output into your document, repeat. Every step breaks the thread. Every switch between the AI window and your document loses context. You spend half your time re-explaining what you already told the model.

The cross-app context closes that loop. When you’ve been analyzing a financial model in Excel and you switch to PowerPoint and say “build a five-slide summary of this analysis,” Claude already knows the model. It understands which numbers matter and why. When you then open Word and ask for the accompanying written brief, it knows both the data and the slides.

The workflow I keep seeing operators get the most leverage from: data review in Excel, executive summary deck in PowerPoint, client-facing report in Word. Previously, this was a three-session task with a lot of copy-paste in between. Now it’s one session.

The Anthropic team describes the cross-app behavior this way on their support page: “If you’ve been building a financial model in Excel and ask Claude to create a summary deck or draft an investment memo, Claude already understands the model’s structure and key outputs, so you don’t need to re-explain.”

That’s the right framing. The add-ins don’t just put Claude inside your tools. They let Claude hold the full context of your work across multiple tools at once.

There are also Skills

The update that came alongside the March cross-app release introduced reusable Skills: saved prompt workflows that apply automatically when Claude is working in any of the apps.

If you’ve built a prompt that enforces your team’s financial modeling conventions in Excel, and a separate prompt that matches your pitch deck template structure in PowerPoint, you can save those as Skills. When you run a cross-app session, Claude uses the right Skill in the right app as it moves through the workflow.

For teams doing the same type of work repeatedly, this is where the real time savings are. The first time you run an analysis-to-deck workflow, you set it up. The fifth time, it runs with almost no instruction overhead.

What’s honest about the current state

These add-ins are still maturing. There are things worth knowing before you commit.

The Excel add-in handles most standard workbooks well, but very large or complex multi-tab models with unusual structure can still trip it up. If you’re working with thousands of rows or non-standard table structures, test before you depend on it for client deliverables.

The PowerPoint template reading is genuinely good for standard corporate templates. Custom or highly complex slide masters may produce outputs that need adjustment. The output is always editable, so this is a fixable problem, but not a zero-effort one.

The Word add-in’s tracked changes feature is useful, but if you’re used to a specific review workflow with legal or compliance teams, verify how the tracked changes export and display before building a process around it.

And the cross-app context is still in beta. Anthropic labels it as such on the support page. For most business documents it works exactly as advertised. For unusually large files or very long cross-app sessions, it can occasionally lose thread. Worth testing with your actual file types before using it for high-stakes work.

Availability is on Pro ($20/month), Teams ($25/month per seat), and Enterprise plans. Free accounts don’t get the add-ins.

How to set it up in five minutes

Go to Microsoft AppSource and search for “Claude by Anthropic for Excel.” Install it. Do the same for PowerPoint and Word. Activate each from your add-ins menu. Sign in with your Claude credentials. That’s the full setup.

For Teams or Enterprise, an admin deploys the add-ins via the Microsoft 365 Admin Center under Settings, then Integrated Apps, then Add-ins. Individual users can activate after that without any additional steps.

To use cross-app context: just work in sequence. Analyze in Excel, then open PowerPoint with Claude active, and it picks up the session. No settings toggle required.

Three workflows to run first

One: open a spreadsheet you regularly pull numbers from for reporting, and ask Claude to explain the key trends in plain language. This tells you how well it reads your specific file structure before you try anything more complex.

Two: run the full Excel-to-PowerPoint flow on something low-stakes. A weekly report, an internal summary, anything where a rough first draft is still useful. This gives you a real read on how much editing the output needs before you trust it for client work.

Three: if your team does contract reviews or proposal editing in Word, run one document through the Claude for Word add-in with tracked changes. See how the output lands with your review process. The tracked changes behavior is the most useful part of the Word add-in for business teams.

The pattern I keep seeing with operators who are getting the most out of AI right now: they’re not adding new tools. They’re installing AI into the tools they’re already inside every day. The Microsoft 365 suite is where most business operators spend the majority of their working hours. Putting Claude there, with cross-app context, is a different kind of leverage than spinning up a separate AI workflow.

If you want to think through which of your recurring document workflows make the most sense to route through the Office add-ins, that’s the kind of thing we work through in an AI Clarity Call. You can book one at muddventures.com/book.

Andrew

Full archive at muddventures.substack.com.

Building something with AI and want to be in a room with operators doing the same: whop.com/abra-ai.

ChatGPT's New Default Model Just Shipped. Here Is What Changes for Operators.

Andrew Mudd — Wed, 06 May 2026 15:18:37 GMT

I have been using ChatGPT every day for a little over three years. Long enough that the cracks in the default model are part of the rhythm of the work. You ask it for a stat, you double-check the stat. You ask it to summarize a doc, you skim the doc. You ask it about a contract clause, you do not actually trust the answer until you read the contract. The friction is constant. It is the tax you pay for using a tool that is fast but not always honest.

Yesterday OpenAI shipped a new default model for ChatGPT called GPT-5.5 Instant. It is rolling out to everyone now. And the headline number is the one operators should care about.

52.5 percent fewer hallucinated claims on high-stakes prompts. Medicine, law, finance.

The exact areas where the old model would confidently invent a citation, a statute, or a number, and where you would have to slow down to verify everything before you could act on it.

That is a real change in the daily reality of using ChatGPT for operator work.

What just shipped

Three pieces are worth knowing.

The model itself. GPT-5.5 Instant replaced GPT-5.3 Instant as the default model in ChatGPT starting May 5. Free, Plus, Pro, Go, Business, and Enterprise are all rolling in. You do not need to do anything to get it. Open the app. You are using it.

Accuracy gains. OpenAI says 52.5 percent fewer hallucinations on high-stakes prompts and 37.3 percent fewer inaccurate claims on the conversations users had previously flagged for factual errors. Independent benchmark coverage put AIME 2025 math at 81.2, up from 65.4 on the prior model.

Memory sources. This is the one most people will miss. When ChatGPT gives you a personalized response, you can now click in and see the exact context it pulled from. Saved memories. Past chats. Connected Gmail. You can mark each piece as relevant or not, edit it, or delete it. So if you told ChatGPT six months ago that your business was in the home services niche and you have since pivoted to SaaS, you can find that stale context, kill it, and stop having every response pulled toward the wrong industry.

The model is also tighter on output. 30 percent fewer words and lines on average. Less filler. Less of the over-formatted bullet-everything style that made the old responses feel like a corporate memo.

Why operators should pay attention

The accuracy gain matters most where you are using ChatGPT to lean on a fact and act on it. Three real cases I see operators do every day:

Reading a contract or proposal and asking ChatGPT to flag risky clauses.
Pulling competitor pricing or product details from a public page.
Drafting a client email that references regulations, deadlines, or numbers.

The old model would get one of those wrong often enough that you had to verify everything. Practically, it meant the AI was an outline tool, not an answer tool. The new model is not perfect, and you should still verify anything you would put your name on. But the verification tax went down. You can move faster on the same workflows you were already running.

The memory sources change is the bigger operational shift. If you use ChatGPT as a working partner on multiple businesses, multiple clients, or multiple projects, the prior memory system was a black box. It would pull “context” you could not see and steer the response based on it. Now you can audit it. That is a small UI change with a real effect on output quality, especially for anyone running more than one thing.

Independent take

The New Stack covered the launch and put it bluntly: the new default model is “tighter and more to-the-point without losing substance.” Decrypt’s coverage focused on the personalization side, noting that the model is faster at searching uploaded files, connected Gmail, and prior ChatGPT conversations to ground responses in your own context.

In other words, OpenAI is pushing the default model in two directions at once. More factual on what it tells you. More grounded in your stuff when it is being personalized. Both directions favor operators who use ChatGPT as part of a daily workflow, not as a curiosity.

What to do this week

Three concrete moves. None of them require a new tool. They all use the ChatGPT account you already have.

1. Audit your memory sources. Open ChatGPT. Find a recent personalized response. Click into the memory sources view. Read every entry that ChatGPT thinks is “you.” Delete anything stale. If you have been using ChatGPT for more than a year, you almost certainly have outdated context shaping your responses right now. Cleaning that up is a 10-minute job that pays back every prompt for the next year.

2. Move one verification-heavy task into ChatGPT and time the difference. Pick something you currently do in a slower tool. Reading a 12-page contract for risky clauses. Pulling pricing details off five competitor pages. Drafting a client memo that references specific regulations. Run it through GPT-5.5 Instant. Verify the output the way you used to. Note how much of the verification was actually unnecessary. That is your new operating speed on that task.

3. Update your “house prompt” or system prompt template. If you have a saved instruction that tells ChatGPT how to act for your business, make sure it still applies. The new model follows instructions more cleanly and uses fewer words by default, so old prompts that said “give me a detailed bullet-pointed answer” might now over-format unnecessarily. Strip the formatting commands and let the new model handle output style. Keep the role and the rules.

The compression continues

I have been writing about this thread for a while. Every quarter, more of the agency stack of 2024 gets absorbed into the AI layer of the tools operators already pay for. The dedicated fact-checker becomes a feature inside the default model. The dedicated context-management tool becomes a feature inside ChatGPT itself. The dedicated personalization service becomes a memory sources tab.

You are still going to need humans for judgment, taste, relationships, and decisions. None of that is going away. What is going away is the cost of the rough draft, the rough research, and the rough first pass. Those just got cheaper and more accurate at the same time. Operators who notice this in May 2026 are going to spend the next quarter putting that compounding savings into customer work, into selling, into building, instead of into “let me chase down whether ChatGPT made up that source.”

That is the actual unlock. Less verification time. More time on the work that moves the business.

Closing

If you want a hand auditing your AI workflow and figuring out where the new model and the memory sources feature can save you the most time in the next two weeks, book an AI Clarity Call. 30 minutes, no pitch, free, you walk out with a one-page audit of where to put the savings.

And if you want more breakdowns like this, what just shipped, what it means for operators, and the templates to actually deploy it, the Abra AI community is where I drop these patterns first, alongside the skills and the group of operators running them.

Andrew

Mudd Ventures

Stop Chaining Your AI Workflows. Build a Board Instead.

Andrew Mudd — Tue, 05 May 2026 13:32:03 GMT

Just spent the last week building a 30-day post-purchase sequence in GoHighLevel for a new product launch.

24 touchpoints over 30 days. Welcome email, install help, skill-by-skill education, soft pitch into the next tier, all of it. Standard work. Took a couple of long sessions to get right.

Then I had to build three more workflows.

Not because the sequence wasn’t working. Because the platform has no clean way to PAUSE a contact in the middle of a sequence and resume them later. So when a buyer upgrades to the next tier, books a clarity call, or hits a refund window, I needed a side workflow to catch the tag, remove them from the main flow, and in one case re-add them seven days later.

Four workflows to do what should be one.

That is the daily reality of building automations on operator-grade tools right now. The chain is the only primitive. You go from step one, to step two, to step three. If anything happens off the rails, you handle it as a separate workflow firing on a separate trigger.

Then yesterday, Nous Research shipped something that breaks that pattern wide open.

It is called Hermes Agent v0.12.0, and the headline feature is a Kanban board for AI agents. Tasks go on the board. Agents claim tasks. Agents work on tasks in parallel. Agents hand off to other agents when blocked. You can drop a comment on any task at any time and the next worker picks it up. Everything is logged to a database file on disk, so a crash does not lose state.

The operator version of that sentence is short: imagine if your nurture sequence was a board, not a chain.

What just shipped

The release is called Hermes Agent v0.12.0, “the Curator Release.” Here is what is in it:

A SQLite-backed task board you can run on your laptop or your server.

A live dashboard showing every task and its status (todo, ready, running, blocked, done).

Multiple AI agents running as separate processes, claiming tasks off the board, working in parallel.

A comment thread per task that humans and agents both write into.

Heartbeat monitoring so workers can signal “still alive” during long jobs.

Crash recovery: if a worker dies, the dispatcher reclaims the task and another worker picks it up.

Slash commands from inside any chat surface (Telegram, Discord, Slack, your CLI) to read, write, or unblock tasks.

That last one is the part operators should sit up for. You can be on your phone, away from your laptop, see a notification that a task is blocked because the AI hit something it does not have context for, and unblock it with a one-line text message. The next time a worker spawns on that task, it reads your comment as part of its context.

The mindset shift

A practitioner blogging about Hermes this week put it into one sentence:

“Multi-Agent Kanban changes your role from worker to manager. With a board, you can start assigning outcomes instead of controlling every step.”

Three things change when you flip from chain to board:

You stop deciding the order. In a chain, you write the if-then-else in advance. In a board, the dispatcher promotes any task whose dependencies are done, and whichever agent has capacity claims it. The order emerges from the work, not from your spec.

You stop losing state. Every handoff is a row in a file on your disk. Crash the worker, kill the laptop, restart the dispatcher. The work resumes from where it stopped. You do not lose progress because a process died.

You stop being the bottleneck. A chain that needs human input stops cold until you are at your desk. A board lets a worker post “blocked: need decision on rate-limit key” and keep all the OTHER workers running. You unblock from your phone when you get to it.

Why this matters for operators

I do not expect a marketing consultant or an agency owner to install Hermes Agent and wire up SQLite tomorrow.

That is not the lift. The lift is recognizing that the pattern, the board with claims and handoffs and human-in-the-loop, is the same pattern you can build on top of n8n, Make, Zapier, GoHighLevel, or any operator tool you already pay for. The pieces all exist. They have just not been wired up the way Hermes wired them up.

Three of the eight collaboration patterns in the Hermes docs are the ones operators should steal first:

Fan-out, fan-in. You drop one big request, the system kicks off three parallel research tasks, all three feed into one synthesis task. Today most operator tools do this in series. The board version does it in parallel and finishes in roughly one third of the time.

Pipeline with gates. Scout finds the leads, editor cleans the data, writer drafts the outreach, reviewer approves before send. Today most automation tools wire this as a single linear flow with approval steps tacked on. The board version lets the reviewer hold a card, comment on it, and pass it back to the writer for one rev without rebuilding the whole flow.

Human-in-the-loop. This is the one I want most for operator workflows. Every chain I have built in the last year has at least one place where the right answer is “ask the operator and wait.” Today that breaks the chain. The board version lets a worker post a question, freeze just that task, and keep the rest running.

The compression thesis

If you have been paying attention, the agency stack of 2024 is getting compressed.

The dedicated retention tool is becoming a feature inside your CRM. The dedicated lead-research tool is becoming a feature inside your AI assistant. The dedicated workflow orchestrator (Zapier, Make) is becoming a feature inside your model context. And now, the dedicated multi-agent project management layer is something you can run on your laptop with a SQLite file.

The pattern is not subtle. Every layer of the agency stack is being absorbed into the AI layer of the tools operators already pay for. The board pattern is the same story applied to multi-step automation.

Operators who recognize this in May 2026 are going to spend the rest of the year compressing four-workflow setups into one and freeing up the cognitive load they have been spending on “what order should this run in” toward “what outcome am I trying to produce.”

What to do with this in the next seven days

You are not going to install Hermes Agent. You probably do not need to. Here is what I would do this week if I were running operator-grade automations and wanted the board pattern without the SQLite.

Pick one chain that already exists in your stack. Does not matter which tool. n8n, Make, Zapier, GHL, all have a flagship workflow you have been duct-taping for a year. Pick that one.
Map every stop point in that chain. Every place a human currently has to step in. Every place that breaks if a step takes longer than 60 seconds. Every place where you wish a different specialist could pick up and the rest could keep moving. Those are the places a board pattern would help.
Replace the linear chain with a queue table. This is the operator-grade move. You do not need SQLite. You need a Google Sheet, an Airtable, a Notion database, anything with rows and a status column. Status is column one. Assigned-to is column two. Outcome is column three. Now your “chain” becomes “look at the next row whose status is ready and run it.” You have just rebuilt the kanban primitive on top of your stack.

You will discover within a week that 60% of your automations were doing things the chain only does because the chain is the only thing your tool gives you. Once the chain is replaced with a queue, you can fan out, you can route to a different agent based on the row, you can pause one row without breaking the rest, and you can hand off to a human without rebuilding the whole flow.

That is what just shipped. Not a tool. A pattern. The tool is one implementation of it.

If you want a hand mapping your flagship chain into a board pattern, book an AI Clarity Call at muddventures.com/book. 30 minutes, no pitch, free, you walk out with a one-page audit of where the board pattern would save you the most time in your business.

And if you want more breakdowns like this one, what just shipped, what it means for operators, and the templates to actually deploy it, the Abra AI community at whop.com/abra-ai is where I drop these patterns first, alongside the skills, templates, and the group of operators running them.

Andrew Mudd

Mudd Ventures

Vibe coding just moved to your phone. Here's what that actually changes.

Andrew Mudd — Mon, 04 May 2026 15:07:39 GMT

Lovable shipped a mobile app on April 27. Android and iOS. You can now describe a web app from your phone, watch the build run on Lovable's infrastructure in the background, and pick up the result on your laptop later.

The official announcement post frames it simply: "Your ideas don't wait for you to sit down at a desk. They show up on the bus, in the coffee line, at 2am. Now you can act on them directly."

The tactical version: prompt queueing, push notifications when builds complete, cross-device session sync. You can stack four or five ideas in the morning and review the results over coffee.

Below is a clear-eyed account of what shipped, the practitioner reactions worth your attention, and what an operator should actually do with this.

What changed in 7 days

For the last three years I've watched the gap between "I have an idea" and "I have a working prototype" compress in a way that almost nobody outside the AI build space really tracks. In 2023 you needed an engineer to ship a web app. In 2024 you needed a developer-shaped human comfortable with Cursor or Replit. In 2025 you needed a desktop and a quiet hour. As of last Monday, Lovable says you need a phone and the discipline to type a clear prompt.

That is a real product change.

The Android app already crossed 100K+ downloads in its first week. Lovable hit roughly $400M in ARR in February with 146 employees, per Sacra's tracking, so this is the most well-funded vibe coding platform betting that the next workflow shift is mobile-first.

Chris Kernaghan, who covers the founder tools beat, had the cleanest read of the launch: "Most SaaS mobile apps are companion experiences. You log in to check a dashboard, approve something, or scroll a notification feed. The actual work still happens on desktop. Lovable's mobile app is doing something different. The work is the prompt."

That distinction matters. The build runs server-side regardless of whether the prompt was typed on a 27-inch monitor or a phone screen on the train. There is no stripped-down mobile mode, because the product was already async from day one.

The lens I bring to this

The buyers I work with who are pulling ahead in 2026 share one trait. They have stopped treating AI tools as a category they research and started treating them as a workflow they iterate. Their question is operational: where in the day are you losing 90% of your ideas because the friction to act is too high.

For most operators I talk to, the answer is the gap between a meeting ending and being back at a laptop. You hear something on a call. You see a competitor do something on your phone. A small internal tool comes to mind that would save the team three hours a week. Then a Slack ping happens, then a kid needs picking up, and the idea is gone by Tuesday.

A real practitioner already running this play is Lazar Jovanovic, who posted on X that he is now spending two to three hours every weekend building his first monetizable side business "100% built only on Lovable, using our Cloud, AI, email and payments... mostly on my phone too, using the Lovable Android app."

The workflow change here is closer to "stop losing ideas to your inbox" than "build production software from your phone."

What to actually build first

Three concrete first builds that fit the mobile-prompt-and-queue rhythm. None of these have to be real production apps. All of them give you a feel for whether this fits your team's workflow.

First build: an internal lookup tool. Something your ops person currently fields questions about. Pricing tiers, SLA terms, onboarding steps, common objections. Prompt: "Build me an internal web app that lets my team search our pricing tiers by industry and customer size. No login required, just a search box and a results table." Plan on roughly 30 minutes from prompt to shareable URL. Useful even if it never goes anywhere.

Second build: a lightweight client-facing form that does something downstream. Capture input and trigger a real action. Prompt: "Build a discovery form for new client intakes. After submission, format the answers as a markdown brief and email it to me." This forces you to think about Lovable Cloud, Supabase, and webhooks. The platform's actual edges become visible in one build.

Third build: a throwaway calculator or quote tool. Something an SMB owner would normally pay an agency $4K for. "Build a pricing calculator where users select their headcount, current AI spend, and goals, and it returns three packages with prices." Ship it. Embed it. See what it does to inbound. The calculator I'd build using that prompt pattern would take roughly two hours and cost $9.99 in subscription credits. The same scope quoted to a freelance dev shop runs $3,500 to $6,000.

That last point is the commoditization story. The lowest tier of bespoke web work just dropped its price floor. What an SMB owner used to outsource for thousands now lives in a phone app for ten bucks a month, plus your judgment.

Honest tradeoffs

A few things to know before this lands on your home screen.

This is not native iOS or Android development. The apps you build with Lovable are still web apps. To put something on the App Store or Google Play, you still need a wrapper service or an export-and-rebuild workflow. Per TechCrunch's coverage of the launch, Apple recently restricted what vibe coding apps can do inside their host apps, so Lovable's launch is web-app-only by design.

The mobile UX is for capture, not heavy editing. Reviewing complex multi-component changes on a phone screen is rough. The realistic loop is: prompt on mobile, review on desktop, ship from desktop.

Security is not a settled story. Lovable published a postmortem on April 24 about a backend regression that, between February 3 and April 20 of this year, made chat history and source code on public projects accessible to any authenticated Lovable user. Private projects and Lovable Cloud were not affected, and the patch shipped within two hours of the public researcher report. Worth reading the postmortem in full before you put anything client-sensitive into a prompt.

Don't ship production from your phone. The temptation to push a fix from a Lyft will be real. Push from a desk where you can actually see what shipped before it goes live.

What I tell clients about this

The shift from "describe what you want" to "describe it from anywhere" is bigger than it sounds, but it is not the headline most operators should focus on.

The headline is that the price of a small web app is no longer "what an agency charges." It is the cost of one well-written prompt, one round of editing, and the discipline to actually publish. That is the change worth tracking. Mobile is just the latest tool that lowers the friction to participate.

If you have never tried vibe coding, this is a clean entry point. Free download, $9.99 a month if you actually use it, no setup needed, no tutorials to watch. The first build will probably embarrass you a little, but by the third one you'll have something useful, and the fifth might quietly replace a SaaS subscription you are already paying for.

If your business runs on a constant flow of small bespoke tools (forms, calculators, internal dashboards, micro-portals), this is worth a serious look this week. If you are still trying to figure out where AI fits at all, start with the simpler stuff first. Build context before you build apps.

Either way, the next time an idea hits between meetings, the option to act on it now exists in your pocket. That is a real change.

If you want help thinking through where the AI layer fits in your business, what to build yourself, what to keep buying, and where the actual leverage is, book a free AI Clarity Call at muddventures.com/book. 30 minutes, no pitch, just a tactical conversation.

If you want to be in the room with other operators figuring this out daily, there's the Abra AI community. Tools, builds, conversations, all in one place.

Andrew

Tell Claude What You Want. n8n Builds the Workflow.

Andrew Mudd — Sat, 02 May 2026 20:02:50 GMT

Last Tuesday, April 29, n8n quietly made one of the most consequential shipments of the year for anyone running a small business on automation.

The headline is simple. n8n's MCP server can now build workflows from a prompt. Previously, the same server could only execute workflows that already existed. Now you describe what you want in plain English to Claude, ChatGPT, Cursor, or any AI client that speaks MCP, and a working workflow shows up inside your n8n account a few minutes later. Validated and test-executed before it lands, with the model self-correcting if the first attempt fails.

Here's the example n8n's own team published in the launch post:

"I want you to create an n8n workflow that once a day at 7am sends me an email with today's forecast. Use my gmail account to send it. I live in New York city. Put the workflow in the MCP Server testing project."

A few minutes later they had a daily weather email running in their instance.

This is the part the AI Twitter crowd will under-cover, so let me be direct about what just changed.

For the last three years I've watched the same story play out across the marketing and operations stack. A capability that used to require a $4,000 implementation gets quietly compressed into the AI layer of a tool the operator already pays for. Meta Ads bidding logic. Customer support triage. CRM enrichment. Now: building an n8n workflow.

The buyers I work with who are pulling ahead in 2026 share one trait. They notice the capability shift early. Then they install the thing themselves on a Saturday. By Monday they have a working version of what they would have paid an automation agency $4K to build over four weeks.

This launch is exactly that pattern.

What it actually is

The MCP server is built into every edition of n8n: Cloud, Enterprise, and the free self-hosted Community Edition. There is no third-party service to spin up. If your n8n instance is running, the MCP server is running. You enable it in Settings, generate an access token, and point your AI client at it.

The mechanic that makes this reliable is small but matters. The MCP server generates a TypeScript representation of the workflow rather than raw JSON. Per n8n's docs, the model has to produce something that type-checks and compiles before anything touches your instance. That's why the validate, execute, fix loop actually closes, instead of producing the broken JSON files that earlier community MCP attempts kept coughing up.

The supported clients today are Claude Desktop, Claude Code, Codex CLI, Lovable, and any Google ADK agent. Anything that speaks MCP can plug in.

What it looks like for an operator (not a developer)

Felix at EasyBits, a community member running real production workflows on top of n8n, posted a writeup of his setup. The example he gave is the kind of thing every SMB operator hits:

"Look at the upload workflow. I need a sub-workflow that takes raw supplier files with 13 columns and maps them into our 47-column BSMS_ITEM_UPLOAD format. The January file is the reference."

His verdict: "Claude returned file ingestion, column normalization, lookup nodes, output formatter. ~80% correct. The 20% was edge cases. Treat it like a senior engineer on their first week, technically excellent, needs context about your setup."

That framing is the right way to think about this. Picture a strong senior engineer on day five at your company. They know JavaScript cold. They know n8n's node library cold. They are fast. The thing they have not learned yet is how your business works. So the prompt has to carry the business context.

The pattern that works, based on Felix's writeup and n8n's own internal usage:

Describe the why behind the task. "Build a CRM automation" gets you garbage. "Look at the Lead Capture workflow. I need a sub-workflow that fires when a Calendly booking comes in, pulls the company data from Clearbit, scores it 1 to 10 using these criteria, and logs it to the Hot Leads sheet" gets you something usable.
Name your services. If Gmail, Postmark, and SMTP could all plausibly do the job, say which one. The model guesses otherwise, and it does not always guess well.
Iterate, do not restart. If the first pass got 80% right, refine in the same conversation. Starting over loses context and usually makes things worse.
Start with read access. Ask Claude to analyze an existing workflow first. That verifies the connection, surfaces what it actually understands about your setup, and protects you from triggering something in production while you are figuring it out.

Honest tradeoffs

This is a public preview, not a finished product. n8n is up front about it. Things will break.

The model can't inspect live execution logs yet. If a workflow fails at run time, you read the log and describe what you see. That is annoying.

Canvas layout comes out messy. The visual flow is functionally correct but ugly to look at, especially for branching workflows. You will end up dragging nodes around.

Complex branching workflows often need manual cleanup. Conditional paths and nested logic are the area n8n itself flags as actively under work.

Node selection when options overlap is the single most common thing worth steering. If the prompt is vague about which node to use, Claude picks one and is sometimes wrong.

Default parameter values are the leading source of runtime failures. The model fills in something reasonable but not always correct. Always do a manual read of the node config before you turn the workflow on.

These are not deal breakers. They are the realistic reasons you do not point this at your live billing pipeline on day one.

Where the real leverage is

The thing Felix said that I keep coming back to: "Stack your connectors. n8n alone is useful. Adding Google Drive, Chrome, and Slack lets Claude pull files, verify live data, and message your team, all in one conversation. That's where the real leverage is."

That is the operator move that most AI consultants are still asleep on. Each MCP connector you add to your AI client is a capability your AI now has across every conversation. n8n is one of the highest-leverage ones to add because n8n is the layer where the automations actually run. Once Claude can build, edit, and execute n8n workflows for you, the practical question shifts from "what can AI do" to "what should I have it build today."

What to do with this

If you are running an SMB and you have not opened n8n before, this is the week to look at it. Specifically:

Spin up an n8n instance. The fastest path is n8n Cloud's free 14-day Starter trial at n8n.io/pricing. If you have a developer on staff or you are technical yourself, the free Community Edition self-hosted on a $10/month VPS is the cheaper long-term play. Either gets you the MCP server.
Enable instance-level MCP access in Settings, generate your access token, copy it once (you only see it once), and connect Claude Desktop or Claude Code per the official setup guide. Pin a coding agent (Claude Code) over a chat client. n8n's own team got better results from coding agents.
Pick one repetitive task you do every week, manually, that lives in two or three SaaS tools. The lead-routing thing. The invoice-chasing thing. The "scrape this report and email it to me" thing. Describe the task to Claude, in plain English, with the actual tool names. Let it build the workflow. Read what it built. Adjust. Run it. Time it against the manual version.

That is the loop. Once you have run it three times for three real tasks, you understand how to use this in production, and you have your own opinion on where it falls down. That is worth more than another twelve LinkedIn posts about agentic AI.

If you want a second pair of eyes on which automation to build first, that is exactly what an AI Clarity Call is for. We will look at your stack, find the highest-leverage automation in it, and decide together whether it is worth building yourself or worth bringing in a vendor.

n8n's MCP build feature is not a replacement for an automation strategist. The piece it replaces is the implementation step, the typing-and-clicking part. Strategy of what to build and why is still on you. The change worth noticing this week is that the implementation cost just dropped by a factor of ten or twenty for the kinds of workflows most SMB operators actually need.

Andrew

Meta just gave media buyers a command line. Here's what that means.

Andrew Mudd — Thu, 30 Apr 2026 18:18:19 GMT

Meta launched a command line for ads. Here’s what that means if you’ve never seen one.

Yesterday afternoon Meta dropped a tool called the Ads CLI.

If you live in Ads Manager, you probably scrolled past the announcement. The phrase “command-line interface” sounds like something that belongs on a developer’s laptop, not yours. So let me translate.

This is one of the bigger shifts in how paid social actually gets done in years, and you can read what’s going on in the next ten minutes.

First, what is a CLI

CLI stands for command-line interface. It’s a way to talk to a piece of software by typing instructions instead of clicking buttons.

Picture two doors into the same building.

Door one is Ads Manager. You open the page. Click into Campaigns. Click Create. Fill out a form. Click Continue. Fill out the next form. Click Publish. It’s the way most people first learned the platform, and it’s how 95% of buyers still operate.

Door two is the command line. You open a black terminal window on your laptop, type one line of text, press Enter, and the same thing happens. There’s no clicking through screens or waiting for the dashboard to load.

A familiar example most people have already touched: when you book a flight on Google, you can use the calendar widget (clicking) or you can just type “flights LAX to JFK Dec 5” into the box (a command). Same destination. Different way of telling the machine what you want.

That is all a command line is. A box where you type the thing you want, and the software does it.

What Meta launched

On April 29, 2026, Meta published a post on their developer blog called “Introducing Ads CLI: A Command-Line Interface for Meta Ads and Commerce.” Authors: John Holstein, Matt Mayberry, Andrew Kutsy, and Sanjay Patel from the Meta engineering team.

The official name is Ads CLI. When you type commands, every line starts with meta ads. So:

meta ads campaign list

That single line pulls a list of every campaign in your ad account into your screen. Same data you’d see if you opened Ads Manager and waited for the campaigns table to load. Just faster, and in plain text you can pipe directly into a spreadsheet or another tool.

The product is in open beta as of yesterday. You install it on your computer with Python 3.12 and a package installer. If those last two terms were noise to you, that’s fine. Read on.

What it actually does

The Ads CLI gives you keyboard access to almost everything Ads Manager lets you click: campaigns, ad sets, ads, creatives, Pixels, product catalogs, and performance data.

Here’s a real command from Meta’s announcement that creates a campaign:

meta ads campaign create --name "Summer Sale" --objective OUTCOME_SALES --daily-budget 5000

Translation: make a campaign called “Summer Sale,” set the objective to sales, give it a $50.00 daily budget. (Budgets in the API are entered in cents, so 5000 means $50.00. Worth knowing before you accidentally fund a $5,000/day campaign.)

Here’s one that pulls a 7-day insights report on a specific campaign:

meta ads insights get --campaign_id 12345 --date-preset last_7d --fields impressions,conversions

Translation: pull the impressions and conversions for the last seven days on this campaign. The data comes back as a clean table, or as JSON if you want a script to read it.

And here’s one that builds a full ad creative with an image, body text, headline, and a Shop Now button:

meta ads creative create --name "Hero Banner" --page-id 111222333 --image ./banner.jpg --body "50 percent off everything" --title "Shop Now" --link-url https://example.com/sale --call-to-action SHOP_NOW

That’s it. One line, full creative built and ready to attach to an ad.

By default, every resource is created in PAUSED status. So nothing goes live until you explicitly run a status update command. Worth knowing if you’re nervous about clicking the wrong button. There’s no wrong button to click.

Why this matters for media buyers

Most paid social work today is human time spent doing repetitive things in a UI. Maybe you’re building 12 ad set variations to test 4 audiences across 3 placements, or pulling a weekly report on 30 campaigns by hand, or pausing one creative across 8 ad sets on a Monday morning because it tanked over the weekend.

A CLI lets you collapse that work into one command, or a small text file that loops through a spreadsheet.

Meta calls these text files “recipes” in the docs. A 30-line recipe can read a CSV of ad copy variations, build out 50 ads from one creative template, and have them queued up in seconds. The same task in Ads Manager is roughly an afternoon of clicking, depending on your tolerance for repetition.

But the part that actually changes the job is what the CLI is built to plug into.

The Ads CLI is one half of a new framework Meta calls “Meta ads AI connectors.” The other half is an ads MCP server, which is the piece that lets a tool like Claude or ChatGPT manage your campaigns in plain English. The CLI is the same capability surfaced for terminals, scripts, and CI/CD.

The MCP server is what makes a request like “Pull last week’s spend by campaign, find anything below a 1.5x ROAS, and pause the bottom three” possible without you ever opening a terminal. The CLI is what makes that same request possible inside an automated script that runs every Monday morning, with no AI in the loop at all.

A LinkedIn post from Constantine Yurevich the day of the announcement put the implication bluntly: “Meta just killed every ‘AI for Meta Ads’ startup that launched in the last 12 months. And they did it with a single command-line tool.” His point: a layer that until yesterday required buying a $300/month SaaS now ships free from Meta itself.

The job description is shifting from “person who clicks the right buttons in Ads Manager” to “person who knows what should happen and can describe it in a sentence.” Knowing what should happen is what a CLI can’t do for you. Everything downstream of that decision is what gets compressed.

The honest tradeoffs

It’s open beta. Things will break.

Buyers in the early r/FacebookAds thread are already worried Meta will flag accounts that lean on the CLI heavily, citing past incidents where automated Marketing API usage triggered account reviews. That’s a real concern worth tracking over the next few months.

It still requires installing Python on your machine, specifically Python 3.12 or later plus pip and uv. That’s standard developer setup, but if you’ve never opened a Terminal window before, the install can feel bigger than it actually is.

It’s not pretty. There are no charts and no creative thumbnail grids. If you read the dashboard visually to understand what’s happening in your account, you’ll probably keep using Ads Manager for that. The CLI is built for execution and reporting, not for browsing.

And the biggest one: a CLI does not make a bad media buyer better. If your audience targeting is wrong, the CLI lets you launch the wrong campaign 50 times faster than before. A common comment on the launch announcement: garbage in, AI just scales the garbage. The leverage cuts both ways.

How to actually start, if you’ve never touched a terminal

Three concrete first steps, in order:

Read Meta’s overview page. Don’t try to install anything yet. Just read the section called “Key capabilities” so you know what the tool can and cannot do before you spend any time on it.
Watch a 5-minute YouTube video on “what is a terminal Mac” or “what is a terminal Windows” depending on your machine. The terminal is the black box you’ll type into. Knowing what it is removes most of the fear of it.
If you have a developer or technically inclined friend, ask them to install the CLI alongside you the first time. Meta requires Python 3.12 or later, pip, and uv, plus authentication via a Meta business system user token. From what I’ve seen with non-technical clients, the first install is a long afternoon if you’ve never done it. With someone walking you through it, the same setup is quick.

If you don’t have a technical friend, that’s part of why I do AI Clarity Calls. Walking through this kind of install with someone who’s done it once is the difference between a 20-minute conversation and a lost Saturday.

Why I’m flagging this for you

For the last three years I’ve watched the tools that used to require hiring an agency get pulled, one by one, into the AI layer. Reporting went first, then creative production, and audience research is most of the way there. Ad operations was the holdout. Yesterday Meta itself moved that line.

The buyers I work with who are pulling ahead in 2026 share one trait: they spend less time clicking and more time deciding what should be clicked. The Ads CLI is one of the cleanest examples I’ve seen of that shift shipping from a major platform.

If you want to talk through what to install, where to start, and how a CLI fits into a real media buying stack, that’s exactly the kind of conversation I have on AI Clarity Calls. You can book one at muddventures.com/book.

If you know a media buyer who needs to see this, forward it to them. Or send them to muddventures.substack.com.

Andrew

Your Team Is Probably Faking Their AI Enthusiasm

Andrew Mudd — Tue, 28 Apr 2026 21:34:33 GMT

I run an AI community on Whop. Small group right now, mostly operators and agency founders figuring out how to actually use this stuff. Every day, my AI assistant Zane posts a breakdown in the community: a framework, a use case, a prompt technique, something tactical and useful. The posts are solid.

They get read by one or two people. Nobody replies.

Every single post ends with a question: “Anyone running something like this?” or “What automations are you working on right now?” Silence. Every time. People are in the room. They signed up. They’re reading. But they’re not engaging, not asking questions, not saying “I tried this and it didn’t work” or “I don’t understand what you mean.” They’re just... quietly present.

I’ve been staring at this pattern for weeks trying to figure out what’s going on. Then I read a stat from the 2026 Business.com Small Business AI Outlook Report that snapped it into focus: 30% of small business workers say they act more enthusiastic about AI in front of their colleagues than they actually feel.

Not confused by AI. Not opposed to it. Performing excitement they don’t have.

That hit different because I realized what I’m seeing in my own community is the same thing happening inside companies everywhere. People show up, they nod along, they read the material, they sit in the meeting about the new AI workflow. But they don’t actually engage with it in any real way, because they don’t feel safe saying “I don’t get this” or “I tried it and it was bad” or even just “this doesn’t seem relevant to what I actually do all day.”

The Business.com data backs this up further: 45% of small business workers worry that adopting “too much AI” could hurt their company’s reputation with customers. Fortune reported this month that 80% of white-collar workers are either avoiding or outright rejecting AI tools they’ve been given. And there’s a new term circulating for the specific anxiety driving all of it: FOBO, Fear of Becoming Obsolete. Four in ten workers now name AI-driven job loss as a primary fear, and that number doubled in a single year.

When people are scared, they don’t raise their hand. They smile and nod and go back to doing things the old way.

Here’s what I think most business owners miss about this: the silence IS the feedback. If your team isn’t pushing back, asking dumb questions, or complaining about the AI tools you introduced, that’s not a sign that adoption is going well. That’s a sign people don’t feel safe being honest about where they actually are.

I learned this the hard way in my own community. Zane’s posts are genuinely good. But “good content” doesn’t create engagement. Safety does. Permission does. Someone going first and saying “I have no idea what I’m doing” does. I’ve started jumping into the community myself, replying to Zane’s posts, sharing my own half-baked experiments, asking questions I don’t know the answer to. Not because it’s a content strategy, but because modeling honesty about where you’re at is literally the only thing that makes other people willing to do the same.

Slate ran a piece this month about something related that I thought was fascinating. Bosses are getting excited about ChatGPT, sending AI-generated memos and strategy docs to their teams, and the employees are spending extra time cleaning up or decoding the output. The AI didn’t save anyone time. It created a new chore: managing the boss’s enthusiasm. One employee in finance described getting told to be more concise at work while simultaneously receiving, and I quote, “the longest shit known to man” in the form of raw ChatGPT output from their manager.

That’s the dynamic I think a lot of small business owners are accidentally creating. You’re excited about AI because you’ve been using it. You’ve built reps. You found the tools that work for your specific brain and your specific workflow. And then you introduce those tools to your team and expect the same enthusiasm to transfer automatically. It doesn’t. Because enthusiasm follows competence, and competence comes from practice in a low-stakes environment where screwing up is okay.

The businesses I’ve been watching that actually get AI to stick share a few things in common. They pick one boring, repetitive task and let people mess around with it. Not a mandate, not a performance review metric, not a company-wide announcement about being “AI-first.” Just: here’s a thing that eats two hours of your week, here’s a tool, try it, see what happens, no pressure.

Fortune’s data says 56% of workers globally received zero skills development even as their companies rolled out AI tools. That’s like buying everyone a piano and then being disappointed nobody can play Mozart. The instrument isn’t the issue.

The average small business worker saves 5.6 hours per week when they actually use AI well. That’s basically a free extra day per person per week. But that number only becomes real when people are using the tools because they genuinely help, not because they’re afraid of looking behind or being the person who “doesn’t get it.”

I’m working through this same thing right now. Building the Abra AI community, running content systems, figuring out which automations actually stick and which ones just look impressive on paper. And the biggest lesson so far is the same one showing up in all the data: the technology is almost never the bottleneck. The bottleneck is whether people feel like they can be honest about what they don’t understand.

If you’re running a team and wondering why the AI tools aren’t delivering, start there. Not with a new tool. Not with a training deck. With a conversation where you go first and say “here’s what I’m still figuring out.”

That’s the unlock nobody’s selling, because you can’t package it.But here’s something you CAN do right now.

One of the biggest blockers I see is that people hit a wall with an AI tool and just... stop. They don’t know what to ask next. They don’t know how to get unstuck. So they close the tab and go back to doing it manually, and nobody ever finds out.

If that’s happening on your team (or to you), here are some troubleshooting prompts you can copy and paste directly into ChatGPT, Claude, or whatever you’re using. These are designed for the moment when the AI gives you garbage and you don’t know what went wrong:

“That output wasn’t useful. Here’s what I actually needed: [describe it]. What did I get wrong in how I asked?”
“Can you explain what you interpreted my request to mean? I want to see where the disconnect was.”
“Give me 3 different ways I could phrase this request to get a better result.”
“I’m going to paste in what you gave me. Tell me which parts are weakest and why: [paste output]”
“Pretend you’re a [job title of the person who would normally do this task]. Now try again with that context.”
“That’s too generic. I need this to sound like it was written by someone who actually works at a [type of business] with [number] employees.”
“What information would you need from me to make this output actually useful? Ask me the questions.”
“Break this task into 3 smaller steps and do just the first one really well.”
“I’ve tried this prompt 3 times and keep getting [describe the problem]. What’s the pattern I’m stuck in?”
“I don’t understand your output. Can you explain it to me like I’m a small business owner who has never used this tool before?”
“This is close but the tone is wrong. Here’s an example of the tone I want: [paste example]. Match that.”
“I need this to be something I can actually send to a client/customer without editing. Right now it’s not. Fix: [list what’s off]”
“What’s the single most important thing I should include in this prompt that I’m probably leaving out?”
“I’m trying to use you for [task] but I’m not sure you’re the right tool. What would this task require that you can’t do?”
“Forget everything about this conversation and start fresh. Here’s exactly what I need, from scratch: [restate clearly]”

Save those somewhere your team can find them. Seriously. Half the reason people abandon AI tools is because they hit one bad output and assume the whole tool is broken. It’s not. They just need a different way in.

I wrote about this ROI gap a couple weeks ago: the real math behind what small businesses spend on AI vs. what they get back. The businesses hitting 34:1 returns aren’t using better tools. They’re using the same tools with better reps. And if you missed yesterday’s piece on why most businesses are paying for AI they don’t actually use, it ties directly into this. The pattern is the same: it’s never the technology.

If you’re an operator or agency founder working through this stuff in real time, that’s exactly what the Abra AI community on Whop is for. Daily breakdowns, prompt techniques, automation frameworks, and a space to actually say “this isn’t working” without anyone judging you for it. Free tier to get started.

If you want a direct conversation about what’s working and what to cut, book an AI Clarity Call. 45 minutes, no pitch, just tool names and a real plan for your specific business.

Hit reply if any of this landed. I read every one.

Talk soon,
Andrew

Most Businesses Are Paying for AI They Don't Actually Use

Andrew Mudd — Mon, 27 Apr 2026 15:27:31 GMT

I talk to business owners every week who are paying for 3, 4, sometimes 6 AI tools. When I ask what results they’re getting, the answer is almost always some version of “well, we use ChatGPT for emails sometimes.”

That’s not adoption. That’s a subscription.

And apparently it’s happening everywhere. New data from Inc. and Pertama Partners puts the AI project failure rate at over 80%. MIT found that 95% of generative AI pilots get abandoned before they ever become part of the actual business. Not because the tech broke. Because nobody knew how to make it stick.

I’m not surprised. I’ve watched it happen in real time for three years.

Here’s what it actually looks like

Someone hears about AI, gets excited, signs up for a handful of tools. Maybe watches some YouTube videos. Builds a prompt or two. Gets a decent output once and thinks “okay cool, this works.”

Then a week goes by. The tool sits there. Nobody built it into a repeatable process. Nobody measured whether it saved time or made money. It just... exists in the stack, billing monthly.

Multiply that by a few tools and suddenly you’re spending $300 to $500 a month on AI that produces essentially nothing. I’ve seen this exact pattern dozens of times. It’s not a motivation problem. It’s a “nobody showed me what to do after the free trial” problem.

The stat that explains everything

75% of executives admit their company’s AI strategy is “more for show” than actual internal guidance. Three quarters. And these are people with teams, budgets, and consultants. If they’re winging it, imagine what it looks like for a 4-person company with no dedicated ops person.

The wild part: only 29% of companies report meaningful ROI from generative AI. That means the majority of businesses paying for these tools are getting... vibes. A sense that they’re “doing the AI thing.” But no actual business outcome they can point to.

Why small businesses have an unfair advantage here

Big companies fail at this because of politics, committees, and employees who use the tools at surface level out of anxiety about their jobs. You don’t have that baggage.

A small team can go from “this task is eating 6 hours a week” to “AI handles 80% of it” in a single afternoon, if someone who’s done it before sets it up. No change management. No committee approval. No six-month pilot program. Just: identify the bottleneck, configure the solution, train the person, measure the result.

The operational advantage of being small is massive right now. But only if you skip the “wander around trying tools” phase and go straight to “someone who knows what works tells me exactly what to do.”

The market is splitting and it’s obvious

There are businesses using AI as a line item that produces results every week. And there are businesses using AI as a vague subscription they feel good about having. The gap between those two groups is getting wider, not smaller, because the first group compounds their advantage every month.

The difference isn’t intelligence. It isn’t budget. It’s whether someone in the operation has enough reps to know which tool goes where, and more importantly, which tools to ignore entirely.

That’s my entire job. Not teaching AI theory. Not running workshops. Getting it producing results in your specific business, this week. If your AI tools are collecting dust, hit reply and tell me what’s not working.

See you tomorrow,

Andrew

Everyone's Selling You an AI Agent. Most of Them Are Lying.

Andrew Mudd — Fri, 24 Apr 2026 14:19:33 GMT

The hottest sales pitch in tech right now? “Our AI agent will handle that for you.”

Every software vendor on the planet is racing to slap the word “agent” on their product. Your CRM has an agent. Your email tool has an agent. Your project management app just launched three agents. It’s agents all the way down.

Here’s what nobody selling you this stuff wants you to know: Gartner found that of the thousands of vendors claiming to sell AI agents, only about 130 are actually building genuinely agentic systems. The rest? They’re doing what Gartner calls “agent washing,” which is basically greenwashing for AI. They’re rebranding chatbots, basic automations, and scripted workflows as “agents” because the word sells.

And even the real ones have problems.

The 40% number nobody’s talking about

Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027. The reasons are exactly what you’d expect: escalating costs, unclear ROI, and systems that aren’t mature enough to follow nuanced instructions over time.

CNBC ran a piece in March called “Silent Failure at Scale” that nailed the core risk. One example: an autonomous customer service agent at a major company started approving refunds outside policy guidelines after a customer talked it into making an exception. The agent then kept granting additional refunds on its own, optimizing for positive reviews instead of following the rules. Nobody noticed for weeks because nothing crashed. The failures were quiet.

As one VP of AI operations put it: “Autonomous systems don’t always fail loudly. It’s often silent failure at scale.”

What “agent” actually means (and doesn’t)

A real AI agent can perceive its environment, make decisions based on context, take actions, and learn from outcomes without someone scripting every step. That’s a high bar.

Most of what’s being sold to small businesses right now is closer to a fancy if/then automation with a chatbot strapped to the front. There’s nothing wrong with that. Automations are powerful. But calling it an “agent” creates expectations that the technology can’t deliver on yet, and that gap is where businesses waste money.

What actually works right now

The businesses getting real value from AI in 2026 aren’t chasing autonomous agents. They’re doing something less flashy but far more reliable: using AI as a capable assistant within clear boundaries.

That means AI drafting your proposals while you edit and approve. AI organizing your customer data while you decide what to do with the insights. AI handling first-pass research while you make the strategic calls.

The pattern that works is human judgment plus AI speed, not AI autonomy minus human oversight.

Fifty-seven percent of U.S. small businesses are now investing in AI technology, up from 36% in 2023. The ones seeing returns aren’t the ones buying the most sophisticated tools. They’re the ones who got clear on which specific, repeatable tasks AI handles well, and then built simple workflows around those tasks.

The real opportunity in all this hype

The gap between what AI agents are promised to do and what they actually do well creates a real advantage for operators who understand the difference right now.

The path that is working: set up three solid AI-assisted workflows. Use them consistently. Measure the time saved. Reinvest that time into the work that requires a human brain. Relationships, creative problem-solving, strategic decisions. That is what compounds.

The vendors selling you autonomous agents that “run your whole business while you sleep” are selling a version of the technology that does not exist yet at the reliability level they are implying. The operators who understand that are building something more durable in the meantime.

Most business owners do not have time to separate AI hype from AI reality. That is what we do.

If you want a clear picture of what AI can actually do for your specific operation, book an AI Clarity Call at muddventures.com/book. One hour, we map the specific places it actually makes sense for your business right now.

If you want to stay current on what is working and what is not, the Abra AI community is where I share what I am actually building. Join at whop.com/abra-ai/.

Andrew

Stop Learning Automation Tools. Just Describe What You Want.

Andrew Mudd — Thu, 23 Apr 2026 14:41:37 GMT

I built an automation this week that would have taken me a full afternoon two years ago. Someone joins my free community on Whop, the webhook fires to GHL, a contact gets created with their name, email, and membership tier, an if/else branch checks whether they paid or joined free, and they get tagged and enrolled in the right nurture sequence automatically. From there, nothing else needs to happen manually.

The part that used to be hard wasn’t the logic. It was all the technical wiring in between. Variable formats, field mapping, conditional branches that only accept contact fields instead of raw webhook data. Things you had to already know, or spend hours figuring out.

What’s changed is that I can now describe what I want the system to do, work through the edge cases conversationally, and get there. Not by learning the tool’s internal rules first, but by staying focused on the outcome and letting the tooling catch up.

That shift is worth paying attention to.

The Technical Barrier Just Collapsed

For years, automation tools like Zapier, Make, and n8n were powerful but required you to think like a programmer. You had to understand triggers, actions, filters, data mapping, and API connections. Most business owners I talk to tried automation once, got frustrated, and went back to doing things manually.

That era is ending.

The major platforms have all added natural language interfaces in 2026. You describe what you want in plain English, and the AI builds the workflow for you. Zapier calls theirs Copilot. Lindy lets you spin up autonomous AI agents from a conversation. Even n8n and Make have added AI-assisted workflow builders.

This matters because the bottleneck was never willingness. It was accessibility. According to a 2026 Zapier survey of enterprise leaders, 30% said they see the most potential for AI agents in automating routine workflows. And nearly 72% of enterprises are already using or testing AI agents. But for small business owners, the tools were still too technical.

Not anymore.

What This Actually Looks Like in Practice

Here are three real examples of automations you could build today using just a plain English description:

“Every Friday at 5 PM, pull this week’s new customers from Stripe, calculate total revenue, and email me a summary.” That used to be a spreadsheet ritual. Now it runs itself after a single sentence.

“When someone books a call on Calendly, check if they’re already in my CRM. If yes, update their record. If not, create one and tag them as ‘new lead.’” No field mapping. No conditional logic menus. Just describe the outcome.

“Summarize every customer support email that comes in, categorize it by urgency, and post high-priority ones to my Slack channel.” This would have taken an afternoon to build a year ago. Now it takes two minutes of typing.

The common thread: you describe what you want to happen, not how to make it happen. The AI handles the technical wiring.

Why This Is a Bigger Deal Than It Sounds

The no-code automation market hit $12.3 billion in 2024 and keeps growing. About 80% of U.S. businesses already use some form of no-code or low-code tools. But adoption was always uneven, concentrated among the more technical teams.

Natural language changes the distribution. The person who knows the business best, whether that’s the founder, the ops lead, or the office manager, can now build the automation directly. No middleman. No developer ticket. No YouTube tutorial rabbit hole.

That means the people closest to the problem are now the ones solving it. And they can do it in the time it takes to write a clear sentence.

The One Catch Worth Knowing

These tools are good, not magic. The clearer your description, the better the output. “Automate my sales process” will give you something generic. “When a lead fills out my contact form, wait 10 minutes, then send Email Template A and add them to my Monday pipeline board” will give you something you can actually use.

Specificity is the skill now. Not technical knowledge.

Where This Fits for You

If you’ve been putting off automation because the tools felt too complicated, this is your window. The learning curve just got dramatically shorter.

Start with one workflow that eats your time every week. Describe it in a sentence. Try it in Zapier, Lindy, or whichever tool you already pay for. Most of these natural language features are included in existing plans.

At Mudd Ventures, this is the kind of shift we help businesses act on, not just read about. Turning a real bottleneck into a working automation in one sitting, then moving on to the next one.

Two ways to go deeper if you’re ready:

If you want a clear picture of what to automate first and exactly how to describe it, book an AI Clarity Call. One hour, we map your highest-leverage workflows and you leave with a plan you can execute the same day.

If you want to learn alongside other business owners who are building this in real time, join the Abra AI community. It’s where I share what I’m actually building, and members share what’s working for them.

Links for both are below. Either way, the window is open. Might as well use it.

Andrew

$120/Month In, $4,100 Back. The Real AI Math for Small Business.

Andrew Mudd — Wed, 22 Apr 2026 13:54:37 GMT

I talk to small business owners about AI every single day. The question I hear most isn’t “what tool is best.” It’s “what does this actually cost, and what do I actually get back?”

Fair question. So let’s talk real numbers.

SUCCESS Magazine recently profiled 10 small businesses and tracked what they spend on AI tools versus what they get back. The average monthly spend: $120. The average monthly benefit: $4,100.

That’s a 34:1 return.

Before you roll your eyes, here’s what makes that number credible: the ROI isn’t measured in headcount reduction. None of the 10 businesses fired anyone. The value came from redirected time, from moving people off repetitive work and into customer relationships, strategic planning, and revenue-generating activity.

That distinction matters. A lot.

What $120 Actually Buys

The tools in that $120 range aren’t exotic. We’re talking about ChatGPT at $20/month (OpenAI just dropped the Business plan by $5/seat on April 2nd), a scheduling or content tool in the $30-50 range, and maybe an AI-assisted customer service layer.

The SBE Council’s March 2026 survey of 517 small business employers backs this up at scale. 82% have adopted at least one AI tool. The typical small business now uses five. And here’s the stat I find most interesting: owners report saving a median of 5 hours per week personally, while their businesses save 11.5 employee hours per week total.

Five hours a week of owner time. That’s a full extra workday every month that didn’t exist before.

The Revenue Side

This part gets overlooked. Everyone talks about AI saving time, but the SBE Council data shows 66% of small businesses now report revenue increases directly linked to AI. And 22% report revenue gains exceeding 10%.

That’s not “AI might help someday” territory. That’s “AI is already moving the needle on revenue” territory.

How? The most common patterns I see: faster response times to leads (speed-to-lead is everything in service businesses), more consistent content output driving inbound traffic, and better data on which customers are worth more attention.

None of that is flashy. All of it compounds.

The Math That Actually Matters

Here’s how I think about AI spending for any small business. Start with one question: what’s the most expensive repetitive task in your operation right now?

Not expensive in dollars. Expensive in what it blocks. If your best salesperson spends 6 hours a week writing follow-up emails instead of having conversations, that’s your first AI win. If you personally spend 3 hours a week on invoicing and scheduling, that’s your first AI win.

The Versalence research shows first-year ROI for small business AI typically lands between 280% and 520%, with most businesses breaking even within the first 6 weeks. But here’s the catch: that only holds when the AI is pointed at a real bottleneck. The businesses that see 34:1 returns aren’t using AI because it’s trendy. They identified a specific constraint and solved it.

The businesses that spend $120/month and see nothing? They bought tools without identifying the bottleneck first.

The Honest Take

I’ve been using AI tools daily since February 2023. In that time, I’ve watched costs drop consistently while capability has gone up. ChatGPT Business was $25/seat three weeks ago. Now it’s $20. That trend isn’t slowing down.

For most small businesses, the cost of experimenting with AI is now less than a decent lunch for your team. The SBE Council data suggests the upside is showing up in revenue: 66% of AI-adopting businesses are already seeing top-line growth from it.

The math isn’t complicated. It just requires being honest about where your time and your team’s time actually goes.

That’s the starting line.

Andrew

If AI math like this is useful for your business, reply to this email. I read every one.

And if you want a second pair of eyes on where AI fits in your operation, book a conversation with Mudd Ventures. No pitch deck. Just a real look at what’s worth automating and what isn’t.