At approximately 06:16 UTC our primary data store for our data connector service performed it's scheduled weekly credential rotation process. Our data connector service member nodes also experienced an uptick in inbound requests approximately 10 minutes later leading to automatic database connection pool upsize across nodes. Due to a misconfiguration of our credential provider in our infrastructure our credential provider did not request a credential refresh for another approximately 30 minutes and the previous credentials were returned until the scheduled refresh occurred. At approximately 06:50 all cluster member nodes had completed credential refreshes allowing database connections to be created again.
For approximately 25 minutes between 06:25 and 06:50 UTC approximately 15% of requests failed in our data connector service.
We've modified our credential provider to use the data store's credential rotation schedule when refreshing credentials instead of internally scheduling periodic refreshes.