Syncs for Data Lake Destinations Are Failing With Table Corruption Error

Incident Report for Fivetran

Resolved

Incident Summary:

What happened

On June 14, 2025, multiple connectors in Managed Data Lake destinations started failing with the Data lake tables are corrupted task.

Data lake tables become corrupt when either the metadata files of the table are deleted or the data files referenced in the snapshot are deleted.
During this incident, the tables were corrupted due to the unintentional deletion of some data files referenced in the snapshot.
This issue stemmed from a bug in a code change in the orphan file cleanup flow of Data Lake tables that was deployed on June 6, 2025. Although the change was deployed on June 6th, the issue began on June 14th (Saturday), as the orphan file cleanup is performed on alternate Saturdays.

Timeline[In UTC - 24hr format]:
2025-06-14, 01:59 UTC – Detected the failure of multiple connectors with Data lake tables are corrupted tasks.
2025-06-14, 07:45 UTC - Identified the root cause of the corrupted tables & reverted the problematic code change to prevent any other connectors from getting impacted.
2025-06-14, 16:19 UTC - Deployed a code change (on Feature Flag) that would detect the tables affected by this issue, and it would unreference the deleted data files referenced in the latest snapshot of the table.
2025-06-15, 09:26 UTC - Deployed a code change (on Feature Flag) that would detect the table affected by this issue, and it would restore the unintentionally deleted data files from the AWS S3 bucket if the bucket versioning feature is enabled.
2025-06-15, 16:38 UTC - Customer support team started reaching out to the customers to check whether bucket versioning is enabled. And then decide the preferred solution for fixing this issue.
Resolution:

Fivetran presented the following 2 possible solutions for this problem:

1) Data Lake destinations with bucket versioning enabled:
a) Manual restoration: Customers could manually restore all the data files deleted from the bucket on June 14, 2025 using the UI of the cloud provider.
b) Automated restoration: Customers could make a change in the IAM policy to grant additional permission to the IAM role configured in their Data Lake destination, so that Fivetran could restore the deleted data files on behalf of the customer. (Applicable only for AWS S3)

2) Data Lake destinations without bucket versioning enabled:
Fivetran would unreference the deleted data files from the latest snapshot so that the successive syncs do not fail for the connector.
Fivetran would trigger a free resync to ensure data integrity and cover for any data loss due to the unintentionally deleted data files.

Some customers proactively dropped the corrupted tables from the Data Lake destination after seeing the Data lake tables are corrupted task on the connector dashboard. However, the connector still failed in such cases as the table was still present in the Fivetran-managed Polaris catalog. In such cases, Fivetran dropped the table from the Polaris catalog and performed a resync for the connector.

Impact:

149 Fivetran accounts and 460 Managed Data Lake destinations were affected.

Sync failures: When a deleted data file referenced in the snapshot had to be rewritten to merge it with the incoming data, the syncs would fail as the file did not exist in the bucket.

Temporary query outage: In cases where the queries on the corrupted Data Lake tables tried to read the deleted data files, the query would have failed due to the absence of the data file.

Data integrity issue:

Solution 1 ensured Data integrity for all connector services.
Solution 2 ensured Data integrity for all connector services except those where the data source has a retention period. In case of such data sources, there may be a loss of data older than the retention period.

What we will do moving forward:
- Enforce a stricter process for deploying such code changes that could lead to critical data integrity issues.
- Stagger the orphan file cleanup process across the connectors to minimize the impact on the connectors.
- We will also ensure that we will always do a Feature Flag based slow rollout for such critical service areas to identify such issues early.
- We will review our process for handling data integrity issues to ensure faster resolution and more effective communication.
Posted Jun 23, 2025 - 16:55 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 23, 2025 - 13:55 UTC

Update

If you have Connection failures due to the "data_lake_corrupted" task, please raise a Support Ticket for assistance in resolving the issue. Please avoid dropping any tables before contacting support.

Communications have been sent to customers with AWS S3 destinations and communications for other destinations will follow soon.
Posted Jun 18, 2025 - 19:38 UTC

Update

If you have Connection failures due to the "data_lake_corrupted" task, please raise a Support Ticket for us to assist in resolving this issue.

We will be sending a communication to all affected customers with further details on this issue and the resolution.
Posted Jun 16, 2025 - 20:00 UTC

Update

Customers with bucket versioning enabled, please raise a Support Ticket for us to provide further guidance on resolving this issue.
Posted Jun 15, 2025 - 15:27 UTC

Update

Customers with bucket versioning enabled can proceed to restore all files for their bucket to allow the sync to complete successfully. For customers without bucket versioning enabled, our engineering team is actively working on determining the next steps and will provide further guidance shortly
Posted Jun 15, 2025 - 02:59 UTC

Update

We have identified the root cause of the issue, which was related to orphan file cleanup deleting some active files. A hotfix has been deployed to prevent further impact. We are currently working on unlinking the deleted files to restore sync functionality. Our team continues to actively work on resolving this.
Posted Jun 14, 2025 - 15:12 UTC

Update

We are continuing to work on a fix for this issue.
Posted Jun 14, 2025 - 10:12 UTC

Identified

The issue has been identified and we are working to resolve it.
Posted Jun 14, 2025 - 07:45 UTC
This incident affected: Systems (Destinations).