Postgres - Error connecting to source db seen in dashboard
Incident Report for Fivetran Inc
Postmortem

Root Cause:

There was a change to the boolean logic for detecting empty Postgres LSNs which moved the code path into the wrong path. This reset Postgres LSNs for all connectors that synced between the daily deployment (~9:30 am PDT) and when we reverted that change (~11:30 am PDT).

Timeline:

11/17/2020 ~9:30 am - The first issues were popping up after the day's deployment

11/17/2020 ~11:30 am - The problem PR was reverted and the deployment was reverted to 11/16's build

Resolution:

The change was reverted which fixed the immediate failures and prevented any further incorrect state changes.

XMIN Postgres connectors were fixed after the change was reverted as their states were unaffected.

The affected WAL Postgres connectors were identified and we’ve emailed the affected customers with the next recovery options.‌

Future Plans:

Improved release management monitoring process. We are scoping out additional monitoring to check for connector failure spikes after a code release.

Posted Nov 19, 2020 - 20:17 UTC

Resolved
This incident has been resolved.
Posted Nov 18, 2020 - 03:03 UTC
Monitoring
Reverting the code change has been complete and we are observing successful syncs again. All Postgres connectors running WAL replication may have been affected.

We are investigating possible data integrity issues as a result of this incident.
Posted Nov 17, 2020 - 21:08 UTC
Identified
A recent code change in the Aurora Postgres connector has been identified as the root case.
This is being reverted now and will be complete by 2pm PDT.
Posted Nov 17, 2020 - 19:59 UTC
Investigating
Users may see the following error in the connector logs:
"ERROR: Function pg_current_xlog_location() is currently not supported for Aurora"

We are currently investigating the root cause. An ETA for a fix will be posted as soon as possible.
Posted Nov 17, 2020 - 19:18 UTC
This incident affected: Fivetran Services (Replication Servers).