Webhook Architecture: What Fires Reliably

Webhooks are the event-driven plumbing underneath most of the internet's automation layer. Service A sends an HTTP POST to a URL you control whenever something happens — a payment completes, a commit lands, a form gets submitted. It's conceptually simple. In practice, the gap between "conceptually simple" and "works reliably in production" is where most AI-driven workflows quietly break.

The problem isn't that webhooks are bad. They're a reasonable design pattern for event notification. The problem is that people treat them like guaranteed message delivery, and they're not. They're a best-effort push notification with reliability characteristics that vary wildly by provider. Some providers do it well. Some treat it as an afterthought. Knowing which is which saves you from building on sand.

What It Actually Does

A webhook is an HTTP request — usually a POST — that a service sends to your endpoint when an event occurs. You register a URL. The service hits that URL with a JSON payload describing what happened. Your code receives it, does something, and returns a 200 status code to confirm receipt. That's it.

What a webhook is not: a message queue, a guaranteed delivery system, a real-time notification (there's usually some latency), or a replacement for polling in all cases. These distinctions matter because the failure modes of each are different, and conflating them leads to architectures that look correct on a whiteboard and lose data in production.

The reliability of a webhook system depends on several factors. Does the provider retry on failure? How many times, with what backoff? Does it sign the payload so you can verify it's authentic? Does it log delivery attempts so you can debug failures? Does it have a dead letter queue or notification when deliveries repeatedly fail? Does it guarantee ordering? Does it provide idempotency keys so you can handle duplicate deliveries safely?

Most providers answer yes to some of these and no to others. The ones that answer yes to all of them are the ones you can build production workflows on. The rest require you to build the missing pieces yourself.

What The Demo Makes You Think

The demo shows a webhook firing, your receiver catching it, and magic happening. New Stripe payment triggers a Slack message and a database update. GitHub push triggers a build pipeline. It looks bulletproof.

Here's what the demo doesn't cover.

It doesn't show your server being down for five minutes during a deployment. During those five minutes, Stripe sent three webhook events. If Stripe retries — and it does, aggressively — you'll get them later. If a less robust provider sent them, those events are gone. You won't know they happened unless you independently poll the source.

It doesn't show the duplicate delivery problem. Webhooks can — and do — fire more than once for the same event. Network timeouts, retries, provider bugs. If your handler processes a payment webhook and creates an order, and that webhook arrives twice, you now have two orders for one payment. Idempotency — the practice of making your handler produce the same result whether the event arrives once or five times — is not optional. It's not mentioned in most webhook tutorials either.

It doesn't show the payload change. You built your handler to parse a JSON structure. Six months later, the provider adds a field, changes a type, or deprecates a key. Your handler starts throwing parse errors on every event. There's no advance notice because the provider considers it a non-breaking change and you didn't version-pin your webhook to a specific API version. This happens regularly with services that iterate fast.

And it doesn't show the signature verification you skipped. Webhook endpoints are public URLs. Anyone who discovers them can send fake events. Without verifying the cryptographic signature that good providers include, your system will happily process a fabricated "payment received" event from an attacker. The signature check is ten lines of code. Skipping it is a security hole.

Platform-by-Platform: Who Does It Right

Not all webhook implementations are equal. Here's how the platforms people actually integrate with stack up.

Stripe is the gold standard. Retries failed deliveries for up to 72 hours with exponential backoff. Signs every payload with HMAC-SHA256 using a per-endpoint secret. Provides a delivery log in the dashboard showing every attempt, the response code, and the response body. Supports webhook endpoint testing from the dashboard. Offers an API version pinning system so payload structures don't change under you. If every provider implemented webhooks like Stripe, this article would be three paragraphs long.

GitHub is very good. Retries failed deliveries — the exact policy varies but it makes multiple attempts [VERIFY]. Signs payloads with HMAC-SHA256. Provides delivery logs in the webhook settings page, including request/response details. Supports a "Recent Deliveries" tab where you can redeliver events manually. GitHub Actions has effectively replaced webhooks for CI/CD use cases, but for external integrations, the webhook system is mature and reliable.

Shopify is solid. Retries failed webhooks for up to 48 hours [VERIFY]. Signs payloads with HMAC-SHA256. Provides a webhook delivery dashboard. The main gotcha is that Shopify can generate a high volume of events — inventory updates, order status changes — and if your endpoint is slow, the queue backs up and deliveries get delayed for all your webhooks, not just the slow ones.

Slack is decent for its core use cases but limited. Slack's Events API retries delivery with a fairly aggressive timeout — if your endpoint doesn't respond within 3 seconds, Slack considers it failed [VERIFY]. This means your webhook handler needs to acknowledge immediately and process asynchronously. If you try to do real work in the request handler — like calling an LLM, which takes seconds — Slack retries, your handler gets called again, and you process the event twice. The 3-second timeout is documented but trips up nearly everyone the first time.

Ghost (the CMS) supports webhooks but the implementation is basic. It fires events for content changes — post published, post updated, member added. Retry behavior is limited [VERIFY], signature verification is minimal compared to Stripe/GitHub, and the delivery logging is not as detailed. Usable for non-critical workflows. Not what you'd build a billing system on.

Generic SaaS webhooks — the long tail of smaller platforms — are a gamble. Many implement fire-and-forget: they send the request once, and if it fails, the event is lost. No retries, no signatures, no delivery logs. Before building a workflow on any platform's webhooks, check three things: do they retry, do they sign, and can you see delivery logs. If the answer to all three is no, treat the webhook as a convenience notification and build polling as your actual source of truth.

Building Webhook Receivers for AI Workflows

If you're using webhooks to trigger AI processing — a new document arrives via webhook, gets sent to an LLM for summarization, result gets stored — the architecture needs to account for the specific failure modes of LLM calls.

Acknowledge immediately, process asynchronously. Your webhook endpoint should accept the request, store the payload, return 200, and then process it in a background job. LLM calls take 1-30 seconds depending on the model, the prompt, and the load. No webhook provider is going to wait that long. If you try to process synchronously, you'll get timeouts and retries.

Queue, don't process inline. Use a proper job queue — Redis-backed (Bull, Celery), a database-backed queue, even a simple file-based queue for small scale. The queue gives you retry logic, failure tracking, and rate limiting. When Claude returns a 529 (overloaded) or GPT-4o times out, the job fails and retries automatically instead of losing the event.

Idempotency is non-negotiable. Store the event ID (most providers include one) before processing. Check for it before processing. If you've seen this event before, skip it. This is a few lines of code and saves you from duplicate processing, double-charges, duplicate Slack messages, or sending the same email twice.

Validate signatures. Every provider that signs its webhooks publishes the verification algorithm. Implement it. It's always some variant of HMAC — compute the hash, compare it to the header value, reject if they don't match. This prevents anyone from hitting your public endpoint with fake events.

Log everything. Every incoming webhook, every processing attempt, every failure. When something goes wrong — and it will — logs are how you figure out whether the provider didn't send the event, your receiver didn't accept it, or your processing logic broke.

The Polling Heresy

Sometimes polling is more reliable than webhooks. This is heresy in event-driven architecture circles, but it's true for a specific and common scenario: when the webhook provider is unreliable and the data source supports efficient polling.

If a platform's webhooks are fire-and-forget with no retries, and the platform has an API endpoint like "get all events since timestamp X," polling that endpoint every 5 minutes is strictly more reliable than hoping the webhook arrives. You control the retry logic. You control the polling interval. You don't lose events because your server was restarting when the webhook fired.

The tradeoff is latency (you're checking periodically rather than reacting instantly) and API rate limits (polling consumes quota). For many AI workflows — especially batch processing jobs where a 5-minute delay doesn't matter — polling is the pragmatic choice. The webhook-first, polling-as-backup pattern gives you the best of both: fast reaction when webhooks work, guaranteed eventual consistency when they don't.

What's Coming

The webhook ecosystem is slowly improving. CloudEvents is an emerging specification [VERIFY] that standardizes webhook payload formats across providers — same envelope structure, same metadata fields, same delivery semantics. Adoption is early but growing. If it reaches critical mass, the "every provider has a different payload format" problem gets significantly better.

On the AI side, MCP doesn't currently have a standardized way to handle incoming webhooks — MCP servers call out to APIs but don't typically receive inbound events. Bridging webhooks to MCP would require an intermediary that receives the webhook and translates it into an MCP-compatible interaction. This is achievable with tools like n8n or Pipedream today, but there's no native MCP primitive for it.

The broader trend is toward webhook-to-queue architectures where a managed service (AWS EventBridge, Google Eventarc, Hookdeck, Svix) sits between the webhook provider and your processing logic. These services handle retry, deduplication, and routing, which means you stop building that infrastructure yourself. For production AI workflows, this intermediate layer is worth the cost.

The Verdict

Webhooks work well enough to build on — if you choose your providers carefully and build your receivers correctly. The short version: Stripe and GitHub webhooks are production-grade. Major platforms like Shopify and Slack are reliable with caveats. The long tail of SaaS webhooks ranges from adequate to unreliable, and you need to check before you depend on them.

For AI workflows specifically, the pattern is: receive fast, queue everything, process asynchronously, handle duplicates, log obsessively. The webhook is just the trigger. The reliability comes from everything after the trigger — your queue, your retry logic, your idempotency checks, your monitoring.

The most common mistake is treating webhook integration as a one-time setup task. It's not. Providers change their payload formats. Endpoints go down. Tokens expire. The webhook that worked fine for six months suddenly stops, and the only way you find out is that the downstream process stopped producing results. Build monitoring. Check the logs. Treat the plumbing like plumbing — it needs maintenance, or it floods the basement.


This is part of CustomClanker's MCP & Plumbing series — reality checks on what actually connects.