Intermittent failure with some Webhook connectors
Incident Report for Fivetran
Postmortem

Hi All,

On the morning of 04/07/2020 in between 11:50 UTC to 21:25 UTC, our customers started observing failures in syncs for our webhooks based connectors.

Summary of the issue :

  • Multiple customers reported 404 errors when attempting to post the data via webhooks.

Steps taken to resolve the issue:

  • Upon further troubleshooting, we found that a new change that was deployed earlier that morning around 2020-04-07 11:50 UTC, started rejecting certain events with 404 error
  • Reverted the code on 2020-04-07 21:25 UTC

Current Status:

  • Webhooks connector is back to normal since the code was reverted.
  • Fivetran is not able to get data for certain sources that do not have API’s to get historic data.

Customer Impact:

  • Customers have lost the data during the downtime of the connector (2020-04-07 11:50 UTC to 21:25 UTC) if the source has no API’s to get the data or does not allow a retry via webhooks.

Action Needed from Customers: None needed

Steps to prevent/mitigate these risks in the future:

  • Add further monitoring for 5xx and 4xx errors.
  • Tighten the alerting levels to a higher percentage.
  • Add more test cases to thoroughly test within the staging environment

We appreciate your patience and help through the issue and apologize for the inconvenience caused.

Regards,

Fivetran Team

Posted Apr 10, 2020 - 23:16 UTC

Resolved
The incident has been resolved.
Posted Apr 07, 2020 - 22:53 UTC
Monitoring
We had to roll back some instances and the issue has been taken care of. We have been monitoring since.
Posted Apr 07, 2020 - 22:08 UTC
Investigating
We are currently investigating an issue that is causing intermittent failure (404 error) with some of our webhook connectors
Posted Apr 07, 2020 - 20:26 UTC
This incident affected: Web Application and Fivetran API.