Reduced customer data access availability for data connector customers

Incident Report for Explo Status

Postmortem

Timeline

At approximately 06:16 UTC our primary data store for our data connector service performed it's scheduled weekly credential rotation process. Our data connector service member nodes also experienced an uptick in inbound requests approximately 10 minutes later leading to automatic database connection pool upsize across nodes. Due to a misconfiguration of our credential provider in our infrastructure our credential provider did not request a credential refresh for another approximately 30 minutes and the previous credentials were returned until the scheduled refresh occurred. At approximately 06:50 all cluster member nodes had completed credential refreshes allowing database connections to be created again.

Impact

For approximately 25 minutes between 06:25 and 06:50 UTC approximately 15% of requests failed in our data connector service.

Resolution

We've modified our credential provider to use the data store's credential rotation schedule when refreshing credentials instead of internally scheduling periodic refreshes.

Posted Sep 27, 2024 - 16:19 UTC

Resolved

Reduced availability for customer data access

Posted Sep 27, 2024 - 06:00 UTC