Webhook retry mechanism
When using webhooks to receive events, it's essential to ensure that your system can handle potential delivery failures. Our webhook implementation utilizes AWS EventBridge, which provides a robust retry mechanism to enhance reliability.
Retry strategy
By default, Adfin retries sending events for up to 24 hours. During this period, the service attempts to deliver the event up to 185 times. The retry mechanism is based on an exponential backoff strategy with added jitter (randomised delay), which helps to prevent overwhelming your server with requests during high traffic periods or outages.
Exponential Backoff
Exponential backoff is a standard error-handling strategy for network applications in which the client increases the wait time between retries exponentially. For example, the wait time may double after each failed attempt, leading to a more extended pause before the next retry. This strategy helps in spreading out the retries, allowing for a more stable network environment.
Jitter
To further optimize the retry process, jitter is introduced, which adds a degree of randomness to the delay times. This helps to reduce the likelihood of simultaneous retries from multiple clients, further enhancing overall system performance and reliability.
What to Expect
When your system receives an event via webhook, you can expect the following behavior in case of delivery failures:
- Retry Duration: Events will be retried for up to 24 hours.
- Maximum Attempts: A maximum of 185 retries will be attempted during this time.
- Backoff Strategy: The wait time between retries will increase exponentially.
- Randomisation: Delays will include a randomised component (jitter) to prevent spikes in traffic.
For more in-depth information on the principles of exponential backoff and jitter, you can refer to the AWS blog post on the topic.
Best Practices
To ensure your webhook can effectively handle incoming events, consider the following best practices:
- Idempotency: Implement idempotent operations in your webhook handler to safely process duplicate events.
- Error Logging: Maintain logs of failed attempts to diagnose and resolve issues promptly.
- Monitoring: Set up monitoring and alerting to keep track of webhook performance and retries.
Updated 29 days ago