When an online store processes an order, that information has to reach ERP, CRM, shipping and accounting within seconds. Classic polling asks for updates at fixed intervals and wastes enormous resources doing so: across an analysis of 30 million requests, only 1.5% of polls actually found an update, 98.5% were in vain (Zapier). A webhook architecture flips the principle around: the store actively reports an event the moment it happens. This event-driven approach is not only more efficient but also near real-time, provided it is built properly, with retries, idempotency, signature verification and queues. This guide shows what a resilient webhook architecture for shop integration looks like and where the typical pitfalls lie.
Webhooks vs. Polling: Why Event-Driven Wins
Polling and webhooks solve the same problem in opposite ways. With polling, the target system asks the store regularly: is there anything new? With a webhook, the store actively calls the target system the moment an event occurs. The difference is dramatic: polling an interface every 30 seconds creates 2,880 requests per day, even if only ten relevant events occur, meaning 2,870 wasted requests (Zapier). A webhook sends exactly ten calls, one per event.
This waste has real costs. Polling creates on average 66 times more server load than webhooks (Zapier), and the shorter the interval, the steeper the load: with 10,000 users and a 5-second interval, up to 10,000 requests per second can occur, while webhooks need only about 150 responses per second for the same updates (Zapier). On top of that comes latency: polling delivers data only at the next cycle, a webhook nearly in real time.
| Criterion | Polling | Webhooks (event-driven) |
|---|---|---|
| Trigger | Fixed interval | On event |
| Efficiency | 98.5% of requests in vain | One request per event |
| Server load | Up to 66x higher | Low, proportional to events |
| Latency | Until the next cycle | Near real time |
| Complexity | Simple but costly | Higher but scalable |
That does not mean polling is never useful. For rare batch reconciliations or systems without webhook support it remains a valid option, and it works well as a fallback when events are occasionally lost. For near real-time shop processes such as order handover, inventory synchronization or shipping status, however, the event-driven architecture is clearly superior.
In practice, robust integrations deliberately combine both approaches: webhooks carry the load of day-to-day operations and deliver events nearly in real time, while a rare, periodic reconciliation serves as a safety net to verify the data state between store and target system. This way you can catch the few events that slip through despite all precautions, for instance when a receiver was unreachable for a longer period and the retries are also exhausted. This hybrid setup combines the efficiency of the event-driven model with the completeness guarantee of a reconciliation, without accepting the drawbacks of pure polling.
Anatomy of a Webhook Architecture
A production-grade webhook architecture is more than a URL that receives data. It comprises the event producer (the store), the transport (HTTP POST with signature), an ingress layer at the receiver, a queue for decoupling, a retry mechanism and the actual consumers such as ERP, CRM or email dispatch. Only this interplay turns a fragile call into a resilient integration. On the receiving side, these endpoints are usually part of a REST API that accepts the incoming events.
Producer (shop)
Triggers an HTTP POST on events like checkout.order.placed. Shopware, for example, offers an app and event system for business and lifecycle events (Shopware Docs).
Transport with signature
The payload is sent as JSON over HTTPS and carries an HMAC signature so the receiver can verify authenticity and integrity.
Receiver endpoint
Accepts the POST, validates the signature and idempotency key, responds quickly with 2xx and places the event into a queue instead of processing it synchronously.
Queue and consumers
A message queue decouples receiving from processing. Workers handle events with retries on failure and a dead-letter queue as the final safety net.
The decisive architectural principle is: acknowledge fast, process asynchronously. The endpoint should accept the webhook, validate minimally and respond immediately with a 2xx status. The actual work, writing to the ERP or updating the CRM, happens afterwards asynchronously from the queue. This keeps the endpoint responsive even under load spikes, and the producer receives no timeouts that would trigger unnecessary retries.
Idempotency: The Load-Bearing Element
Practically all webhook providers deliver events at least once, never exactly once (Hookdeck). That means: sooner or later your receiver will get the same event twice, for instance because an acknowledgement was lost and the producer sent again. Without protection this leads to double bookings, double shipments or duplicate emails. Idempotency is therefore not a nice-to-have but the load-bearing wall of any production webhook integration (Hookdeck).
The principle is simple: every event carries a unique ID that stays constant across all retries. The receiver stores processed IDs and skips any repeat. This guarantees that the same order is written to the ERP only once, no matter how often the webhook arrives.
public function handle(Request $request): Response
{
// 1. Verify signature (HMAC, constant-time)
if (!$this->verifySignature($request)) {
return new Response('invalid signature', 401);
}
$eventId = $request->header('X-Event-Id');
// 2. Idempotency: already processed?
if ($this->store->seen($eventId)) {
return new Response('ok', 200); // still acknowledge with 2xx
}
$this->store->markSeen($eventId);
// 3. Enqueue, do not process synchronously
$this->queue->push(new ProcessOrderEvent($request->getContent()));
// 4. Acknowledge fast
return new Response('accepted', 202);
}Checking the ID at the endpoint alone is not enough if a worker crashes mid-processing and pulls the event from the queue again. The actual processing should also be idempotent, for example through upsert operations instead of blind insert and through unique keys per business transaction.
Signature Verification and Security
A webhook endpoint is a publicly reachable URL that triggers write actions. Without authentication, anyone could inject forged events. The standard for securing it is the HMAC signature: the producer signs the payload with a shared secret, the receiver recomputes the signature and compares. HMAC-SHA256 is the method used by most modern providers (Hooklistener). Shopware, for instance, signs its app webhooks with a SHA256 HMAC over the request body in the shopware-shop-signature header (Shopware Docs).
- Compute the signature over the raw body, not over a deserialized-then-reserialized object, since JSON can reorder fields and the hash would otherwise not match (Hooklistener)
- Use constant-time comparison to prevent timing attacks on signature verification (Hooklistener)
- Sign a timestamp as well and reject events whose timestamp is too old, typically more than five minutes, to prevent replay attacks (Hooklistener)
- Enforce HTTPS so that payload and signature are not transmitted in clear text
- Treat secrets like passwords: store them in environment variables or a secret management service, never in code or version control (Hooklistener)
A valid signature alone offers no protection if an attacker resends an intercepted, correctly signed event. Established providers therefore sign the combination of timestamp and payload and discard events whose timestamp deviates by more than a few minutes (Hooklistener). This way, old, repeated signatures become invalid automatically.
Retries, Backoff and Dead-Letter Queues
Networks are unreliable, target systems have maintenance windows, and load spikes cause timeouts. Without retry logic, events are lost: the failure rate without retries is 3-5% (Hookdeck). That may sound harmless, but it is not: with 100 orders a day, 3-5% already means three to five lost transactions, missing ERP bookings or unsent confirmations (Hookdeck).
The solution is a staged retry process with exponential backoff: after a failed attempt, delivery is retried with growing delays. So that thousands of simultaneously failed events do not all surge again at the exact same second (the so-called thundering herd problem), jitter is added, a random spread of the wait times (Hookdeck).
- First delivery immediately when the event occurs
- Retry with backoff on transient errors (5xx, timeout, 429), with growing delays plus jitter
- No retries on permanent errors such as 4xx (except 429), since a repeat changes nothing here (Hookdeck)
- Dead-letter queue (DLQ) as the final safety net once all attempts are exhausted, so no event silently disappears (Hookdeck)
- Monitoring and alerting on the DLQ so failed events can be reprocessed manually
Retries and idempotency work hand in hand: retries ensure that every event arrives at least once, idempotency ensures that multiple deliveries cause no harm. Only this combination achieves delivery rates in the high ninety percent range in practice without risking double processing.
Queues Decouple Receiving from Processing
The message queue is the heart of a scalable webhook architecture. It separates the fast acceptance of an event from the potentially slow processing. When a burst of orders arrives at once, for example during a promotion, the queue buffers the load and the workers handle it at their own pace, without the endpoint blocking or events getting lost.
This decoupling gain is measurable. Organizations that move to an event-driven architecture report 62% lower system latency, 58% higher throughput and 47% lower infrastructure costs compared to synchronous architectures (IJSAT 2025). Accordingly, 72% of enterprises already use event-driven workflows for scalable, loosely coupled systems (Growin).
Absorb load spikes
The queue buffers order surges so that no events are lost during peaks, a known weak point of direct processing.
Isolate failures
If a target system goes down, events pile up in the queue and are processed after recovery instead of being lost.
Scale independently
More load simply means more workers. Receiving and processing scale separately, which makes the architecture elastic.
For shop projects this means in practice: an incoming order webhook first lands in the queue. From there, workers distribute the event to several consumers, for example a shipping interface, the ERP and the confirmation email dispatch. Each consumer works independently, with its own retry logic. If one fails, the others keep running. It is precisely this robustness that distinguishes a real integration from a fragile point-to-point script.
The payload itself also deserves attention. Webhook payloads should contain only the data the receiver needs to react, and for larger data volumes, transport-level compression pays off, much like the Brotli compression of shop assets. Lean, clearly versioned payloads not only reduce the amount of data transmitted but also make debugging easier, because every event stays traceable. A common pattern is the so-called thin payload: the webhook contains only the event ID and a reference, and the receiver fetches the full data on demand via the API. This keeps delivery fast and decouples the data volume from the event rate.
Equally important is the observability of the entire chain. Without logging every delivery attempt, every signature check and every queue movement, a fault stays hidden until customers report it. A well-designed architecture therefore keeps an event log that records status, attempts and final state for every event, and alerts automatically as soon as the dead-letter queue grows. This turns a black box into a traceable system in which root causes can be pinpointed instead of being left in the dark.
What Happens When the Architecture Is Missing
The consequences of poor integration are not theory. A survey shows that 60% of retailers in the United Kingdom suffered direct revenue losses due to interface failures, for instance through delayed orders or inventory mismatches (Uptrends, State of API Reliability 2025). Over the same period, average API uptime fell from 99.66% to 99.46%, which means 60% more downtime year over year (Uptrends).
It becomes especially critical in multi-channel inventory management: faulty synchronization leads to overselling and marketplace penalties, and rate limits regularly become the breaking point during load spikes, resulting in failed requests, delayed payments and sync problems (Uptrends). An event-driven architecture with a queue and retries addresses exactly these weaknesses, because it buffers load spikes and automatically catches up on lost events instead of letting them silently slip away.
This is where a well-designed middleware layer comes in: instead of connecting every system directly to every other, it bundles events, unifies formats and ensures consistent retry and idempotency rules across all interfaces. This applies to standard processes as well as complex flows such as B2B order approvals or structured EDI exchange, where events must flow reliably between store, ERP and business partners.
Webhook Architecture as the Foundation of Reliable Integration
A webhook architecture is far more than a technical detail, it determines whether orders, stock and customer data flow reliably between systems. The data is clear: 98.5% wasted polling requests (Zapier), 3-5% event loss without retries (Hookdeck) and 62% lower latency with an event-driven architecture (IJSAT 2025) show how big the difference is between a fragile script and a clean architecture.
The four building blocks interlock: webhooks replace wasteful polling, queues decouple and stabilize, retries with backoff secure delivery, and idempotency plus signature verification make processing secure and repeat-safe. Only together do they form an integration that stays reliable even under load spikes and partial failures.
As an agency focused on e-commerce and integrations, we design and operate such architectures for online stores in Lower Saxony and beyond, from the first event definition through signature and idempotency logic to the queue with monitoring. This turns a collection of individual interfaces into a resilient, near real-time system that grows with your store.
This article is based on data from: Zapier (wasted polling requests, server load compared to webhooks), Hookdeck (failure rate without retries, at-least-once delivery, idempotency, backoff with jitter, dead-letter queues), Hooklistener (HMAC signature verification, raw body, constant-time compare, timestamps against replay attacks, secret management), Shopware Documentation (app and event system, shopware-shop-signature HMAC), IJSAT 2025 (latency, throughput and cost impact of event-driven architecture), Growin (adoption of event-driven workflows) and Uptrends State of API Reliability 2025 (revenue losses from interface failures, API uptime, rate limits). The figures cited may vary depending on industry, system and implementation.
Frequently Asked Questions About Webhook Architecture
With polling, a system actively asks for updates at fixed intervals; with a webhook, the source reports an event on its own as soon as it occurs. Polling is simple but inefficient: in one analysis, about 98.5% of polling requests were in vain (Zapier). Webhooks, by contrast, send one request per event and deliver nearly in real time.
Practically all providers deliver events at least once, never guaranteed exactly once (Hookdeck). As a result the same event can arrive several times, for instance after a retry. Without idempotency this leads to double bookings or duplicate shipments. A unique event ID stored at the receiver ensures each event is processed only once.
The standard is verifying an HMAC signature that the sender creates with a shared secret and the receiver recomputes (Hooklistener). In addition, HTTPS, a time-limited timestamp against replay attacks and a constant-time comparison against timing attacks all belong to it. Shopware, for example, uses the shopware-shop-signature header (Shopware Docs).
A resilient architecture retries delivery with exponential backoff and jitter. If it fails after all attempts, the event moves to a dead-letter queue for manual reprocessing (Hookdeck). Without this mechanism, 3-5% of events are typically lost (Hookdeck).
The queue decouples fast receiving from slower processing. The endpoint acknowledges immediately and places the event into the queue, then workers process it asynchronously. This absorbs load spikes, isolates failures and allows independent scaling. With an event-driven architecture, companies typically report significantly lower latency and higher throughput (IJSAT 2025).
As a rule yes, as soon as several systems have to work together reliably. Even with a few orders a day, a clean architecture prevents silent data loss that can lead to overselling in multi-channel retail (Uptrends). The effort scales: even a lean solution with signature verification, idempotency and simple retry logic already brings a large gain in reliability.