Multiple connections with Managed Data Lake Service are failing in AWS region

Incident Report for Fivetran

Resolved

Incident Summary:

Description:
Connections to the Managed Data Lake Service were failing in the AWS region with the Polaris connectivity to AWS STS.
Connections hit a second issue and started to fail with the error:
"reason":"com.fivetran.warehouses.data_lake_v2.exception.UnifiedDataLakeException: org.apache.iceberg.exceptions.ValidationException: Found conflicting files that can contain records [...] "

Timeline:
The First issue began on Aug 11th at 09:48 AM UTC and was resolved on Aug 11th at 19.30 UTC.
The Second Issue began on Aug 11th at 19.30 UTC and was resolved on August 13 at 1:54 PM UTC.

Cause:
Due to the recent change in the retry mechanism, the number of connections increased, which, together with ongoing customer migrations onto the Polaris service, resulted in high-volume traffic in Cloud NAT in the GCP region US-east4, where Polaris is running.

Resolution:
Polaris retry mechanism feature has been reverted, and Cloud NAT per VM minimum port count has been increased to 512 ports from the existing (default) 64.

Posted Aug 15, 2025 - 16:30 UTC

Monitoring

We have deployed a fix to rollback and recover affected snapshots to the latest stable version.

Affected connectors are resuming their normal sync functionality and we are continuing to monitor progress.

Posted Aug 14, 2025 - 23:27 UTC

Update

We are continuing to work on a fix for this issue. We will additionally contact customers directly for resolutions in certain cases.

Posted Aug 14, 2025 - 08:00 UTC

Update

We are making code changes to roll back to the previous snapshot if the table is found to be corrupted. We will share further updates as soon as more information becomes available.

Posted Aug 12, 2025 - 19:00 UTC

Update

We are continuing to work on investigating the root cause and a fix for this issue.

Posted Aug 12, 2025 - 07:22 UTC

Update

We are currently observing sync failures across multiple connectors, with the error "Found conflicting files that can contain records matching".

Our team is actively investigating the root cause of this issue. We will provide further updates as soon as more information becomes available.

Posted Aug 12, 2025 - 02:54 UTC

Update

We have deployed a fix to reduce the number of retries and also increased the minimum number of ports to help handle the large number of requests. This has lead to a reduction in failures, but some connectors are still affected.

Remaining connectors are still being investigated for intermittent connectivity issues with Polaris.

Posted Aug 11, 2025 - 22:00 UTC

Update

We are still working on a fix for this issue.

Posted Aug 11, 2025 - 19:01 UTC

Update

We are continuing to work on a fix for this issue.

Posted Aug 11, 2025 - 16:51 UTC

Update

We are currently investigating an issue affecting multiple connections, which are failing with the following error:
"org.apache.iceberg.exceptions.RESTException: Unable to process: Failed to get subscoped credentials: Unable to execute HTTP request, Connect timed out"

Posted Aug 11, 2025 - 14:56 UTC

Identified

The issue has been identified and we are working to resolve it.

Posted Aug 11, 2025 - 14:50 UTC

This incident affected: Systems (Destinations).