Some Exalate nodes unavailable

Incident Report for Exalate

Postmortem

Executive Summary

On September 24, 2025, Exalate experienced a service interruption lasting 17 hours and 28 minutes that affected our cloud-hosted integration nodes. During this time, customers were unable to synchronize data between their integrated systems. No customer data was lost. We sincerely apologize for the inconvenience and want to share what happened and how we're preventing future issues.

What Happened

Timeline:

  • 10:02 UTC (Sept 24): Infrastructure issue detected through customer reports
  • 13:00 UTC: Partial service restoration achieved
  • 13:45 UTC: Secondary technical issue caused complete service unavailability
  • 17:24 UTC: Core infrastructure restored
  • 21:00 UTC: Priority customer services online
  • 03:30 UTC (Sept 25): Full service restoration completed

Root Cause: A network connectivity issue on our hosting platform triggered a cascading infrastructure failure. The recovery process was complex due to database resilience challenges and infrastructure management system complications.

Customer Impact

During the outage:

  • Data synchronization between systems (Jira, ServiceNow, etc.) was unavailable
  • Automated workflow processes were temporarily halted

What was NOT affected:

  • No customer data was lost or corrupted
  • All existing synchronized data remained intact
  • Customer configurations and sync histories were preserved

Our Response

We immediately activated our 24/7 incident response team, maintained continuous status page updates, directly contacted Enterprise customers, and coordinated with infrastructure providers throughout the recovery.

Prevention Measures

We're implementing comprehensive improvements on an accelerated timeline:

Immediate (October 2025):

  • Enhanced infrastructure monitoring and alerting systems
  • Comprehensive disaster recovery documentation

Short-term (November 2025):

  • Infrastructure resilience improvements
  • Automated recovery procedures
  • Regular disaster recovery testing

Medium-term (Q1 2026):

  • Multi-cloud architecture implementation
  • Advanced predictive monitoring

Customer Support

If you need assistance related to this outage:

  • Enterprise Customers: Use your dedicated support channels
  • Standard Support: Submit tickets through our support portal
  • Status Updates: Monitor our status page for ongoing information

We apologize for this service disruption and appreciate your patience. Your trust is essential to our business, and we're committed to earning it through reliable service delivery and continuous improvement.

Posted Oct 10, 2025 - 12:42 CEST

Resolved

We have extensively monitored the cluster health and there have been no outstanding issues found.
Posted Sep 25, 2025 - 16:21 CEST

Monitoring

All nodes are fully restored and functioning normally now.
We continue to monitor the cluster carefully to ensure stability.
Posted Sep 25, 2025 - 05:23 CEST

Update

We are continuing the scale-up of the Exalate nodes. The team is maintaining a close monitoring posture to ensure stability throughout this process.

We will provide the next update in one hour.
Posted Sep 25, 2025 - 03:44 CEST

Update

The previous technical challenges have been addressed. We have re-initiated the scale-up of the Exalate nodes and are closely monitoring the environment for stability.

We will provide the next update within one hour.
Posted Sep 25, 2025 - 02:48 CEST

Update

During the scale-up phase, the restoration process has presented some challenges that are taking longer than initially expected to resolve.
The engineering team is actively working to address these stability issues.

We will provide the next update within one hour.
Posted Sep 25, 2025 - 01:50 CEST

Update

The engineering team continue working on the main restoration phase.
We'll continue to monitor stability closely.

Further updates to be expected in an hour
Posted Sep 25, 2025 - 00:49 CEST

Update

Following the deliberate, multi-step sequence, the engineering team has now moved into the main restoration phase.
We are actively scaling up the restart of the affected Exalate nodes. We'll continue to monitor stability closely.

We will provide the next update in one hour.
Posted Sep 24, 2025 - 23:45 CEST

Update

The team is actively working through the restart sequence for all individual Exalate nodes. This is a deliberate, multi-step process to ensure stability upon full restoration.

Further updates to be expected in an hour.
Posted Sep 24, 2025 - 22:51 CEST

Update

We continue restarting the individual Exalate nodes impacted by the failure.
5% of the affected nodes have been brought back online.
Further updates to be expected in an hour
Posted Sep 24, 2025 - 21:54 CEST

Update

Infrastructure component failures have been addressed by our Engineering team.
Next step: We are restarting the individual Exalate nodes impacted by the failure.
More information will be provided in an hour.
Posted Sep 24, 2025 - 20:47 CEST

Update

Node recovery has presented some challenges and is taking longer than expected.
We continue to strive to bring all nodes back online as soon as possible.
Next update will be provided within 2 hours.
Posted Sep 24, 2025 - 18:56 CEST

Identified

Our monitoring uncovered a problem still lingering with the cluster.
We are working to restore full functionality as soon as possible.
Posted Sep 24, 2025 - 16:50 CEST

Monitoring

All nodes are fully recovered.
We continue to closely monitor the situation.
Posted Sep 24, 2025 - 15:00 CEST

Update

The recovery process is still ongoing.
We will provide an update within an hour.
Posted Sep 24, 2025 - 13:47 CEST

Identified

We have root caused the issue and the restoration process is underway already.
The next update will be provided in 30mins.
Posted Sep 24, 2025 - 13:18 CEST

Investigating

There is an outage on one of the clusters in Exalate cloud that might affect access to the nodes temporarily. We are currently investigating the issue to ensure that service is restored as soon as possible.
Posted Sep 24, 2025 - 12:56 CEST
This incident affected: Zendesk (Exalate Console), Jira Cloud (Synchronisation node), Azure DevOps (Exalate for Azure DevOps), Service Now (Exalate for ServiceNow in Exalate Cloud), GitHub (Exalate for GitHub), Salesforce (Exalate for SalesForce), Exalate for Freshdesk in Exalate Cloud (Freshdesk), and Exalate for Freshservice in Exalate Cloud (Freshservice).