Explo system wide outage

Incident Report for Explo Status

Update

Heroku has not indicated that services are fully back, but based on our logs and metrics, we are seeing the backend service to be healthy at this time.

Our team continues to monitor closely, but services seem back to normal.
Posted Jun 10, 2025 - 23:52 UTC

Update

A quick update. We are ready to cut over to the AWS service, we have instructions and context ready to go. However, we waited given Heroku's most recent update.

With the most recent update, we attempted to reboot the nodes and then stopped them as they suggested. We are seeing nodes come back up and properly respond to requests now. Things should slowly get back to a stable state (based on the information we have). Heroku has not marked the incident as resolved, so we don't think we are in the full clear yet, but embedded dashboards and report builders should load again.

Marking as degraded performance for now as we continue to monitor.
Posted Jun 10, 2025 - 22:27 UTC

Update

New update from Heroku: "The Heroku dashboard is now back online. Customers can now attempt to recycle their dynos on the dashboard. Specific instructions for recycling dynos will be shared shortly."

We don't know exactly the implication of this yet and are testing on our side.

Meanwhile the team is doing some final testing on the AWS service.

We will update shortly based on what we learn from both points.
Posted Jun 10, 2025 - 22:05 UTC

Update

Update from Heroku: "We don't have additional information to share at this time. Our efforts remain focused on internal testing and validation as we continue to see incremental improvements. We will provide a resolution timeline as soon as possible, and we apologize for the continued trouble."

Here are our updates. We are in the process of hydrating the AWS service's DB with a backup. Once that is up and a few other items, we will be very close to having the backstop in place. There will be restrictions to the backstop as we are optimizing for getting embeds working for end users and not a full blown replacement. More info to come.

Thank you all so much for being patient and understanding.
Posted Jun 10, 2025 - 21:00 UTC

Update

This is the latest from Heroku: "We are observing positive results in our initial testing on pre-production environments. We're continuing to test the fix before applying it to production environments and will provide a resolution timeline once determined."

Taking this with a grain of salt, but sending it along for transparency.
Posted Jun 10, 2025 - 19:54 UTC

Update

Here is the latest from Heroku: "We are actively developing the fix and working to determine a resolution timeline. Once the fix is complete, it will undergo testing in our pre-production environments to validate its effectiveness."

Heroku and Salesforce are continuing to iterate on their fix, and we hope it will be available soon. Our engineering team continues to surge on moving the backend service from Heroku to AWS. With the restrictions we mentioned previously, there are clear roadblocks, but we feel optimistic about getting a backstop soon. This backstop will be independent of Heroku's fix they are working on.
Posted Jun 10, 2025 - 19:42 UTC

Update

Latest note from Heroku: "We've restarted Heroku's internal platform apps and are actively monitoring to confirm whether the issue is resolved. However, errors are resurfacing." You can follow along here: https://status.salesforce.com/generalmessages/10001540

From the Explo side, we are simultaneously trying to move off of Heroku, however, we have some secrets (passwords / hidden keys for the service) that are stored in the Heroku product that we are unable to get access to. This is making it difficult to actually spin up the replacement service. We have some leads and ideas we are tracking to hopefully get this up and running.
Posted Jun 10, 2025 - 16:28 UTC

Update

We just received this hopeful update from Heroku: "We have restarted a subset of our Heroku instances. This has shown positive results for the impacted services. We are working to restart the rest of the Heroku instances to fully restore the services. We are continuing to validate and closely monitor the services."
Posted Jun 10, 2025 - 14:49 UTC

Update

Heroku services are slowly and intermittently coming back up. This will take some time to fully heal, but we are finally seeing some requests succeed again. We are unable to access any internal Heroku resources on our side (that we usually can log into), so we can't get any more information on our own. We verified this by consistently checking a wide range of requests and following their updates.
Posted Jun 10, 2025 - 14:10 UTC

Update

We are still actively monitoring this outage with our cloud provider, Heroku. Their latest update was that their parent company, Salesforce, has identified some network configuration issues on one of the servers within their Heroku environment, which could trigger this issue. They are monitoring closely and will update us when they get more information.
Posted Jun 10, 2025 - 13:00 UTC

Identified

We've identified the issue to be an issue with our cloud provider Heroku and are actively monitoring their status here: https://status.heroku.com/
Posted Jun 10, 2025 - 08:14 UTC

Update

We are actively monitoring this issue that Heroku is experiencing: https://status.salesforce.com/generalmessages/10001540
Posted Jun 10, 2025 - 07:52 UTC

Update

We are continuing to investigate this issue.
Posted Jun 10, 2025 - 07:45 UTC

Investigating

We are currently investigating an issue affecting all Explo services. We suspect there is an issue with one of our providers Heroku
Posted Jun 10, 2025 - 07:44 UTC
This incident affects: Explo Dashboards, Explo Report Builder, Exports: Email delivery, Exports: Image and PDF format, GovCloud Data Connector, Legacy Query Infrastructure (Customer Data Access, Dashboard Embeds, Report Builder Embeds, Tabular Exports), and Data Connector Infrastructure (Customer Data Access, Dashboard Embeds, Report Builder Embeds, Tabular Exports).