Exalate nodes unrachable

Incident Report for Exalate

Postmortem

Incident: Service Outage - December 12, 2025

Duration: 16:35 CET - 03:01 CET (approximately 10 hours)

Impact: A significant number of Exalate Cloud nodes experienced service unavailability, resulting in temporary disruption to data synchronization services. No data was lost during this incident.

Summary: On December 12, 2025, our infrastructure monitoring detected a service outage affecting Exalate Cloud customers on our production clusters. The root cause was identified as a storage system failure triggered by network latency on a legacy infrastructure component, which caused storage connectivity issues for customer workloads.

Timeline:

16:35 CET - Issue detected by monitoring systems
17:31 CET - Root cause identified, restoration initiated
21:20 CET - Majority of nodes restored to service
03:01 CET - Full service restoration confirmed

Root Cause: A storage management component became unresponsive due to elevated network latency caused by a legacy networking layer. This resulted in storage disconnection for customer workloads.

Resolution: The storage system was restored, all affected workloads were rescheduled, and the external dependency was updated to a supported version. All nodes were returned to full operation with no data loss.

Preventive Measures:

Deploying patches to address the storage component stability issue
Accelerating migration away from the legacy networking layer to modern infrastructure
Implementing internal mirroring of critical external dependencies to eliminate reliance on third-party availability
Enhancing operational safeguards for infrastructure management procedures

We sincerely apologize for the disruption this incident caused. Our team remains committed to improving the reliability and resilience of the Exalate Cloud platform.

Posted Dec 22, 2025 - 05:45 CET

Resolved

All Exalate nodes are stable and working normally.
A post-mortem for the incident will be published in due course.

Posted Dec 13, 2025 - 05:26 CET

Monitoring

All nodes are back to operational status now.
We continue to monitor the stability closely.

Posted Dec 13, 2025 - 03:56 CET

Update

Most nodes are back on operational status, a full operational message will be posted once all nodes are back.

Posted Dec 13, 2025 - 02:51 CET

Update

Most nodes are back on operational status, a full operational message will be posted once all nodes are back.

Posted Dec 13, 2025 - 00:11 CET

Update

Nodes are being restored now, most nodes are now online, a full operational message will be posted once all nodes are back online.

Posted Dec 12, 2025 - 21:20 CET

Update

We continue to work on the restoration of the affected nodes.
We will be providing a new update in one hour.

Posted Dec 12, 2025 - 20:11 CET

Update

We continue to work on the restoration of the affected nodes.
We will be providing a new update in one hour.

Posted Dec 12, 2025 - 19:12 CET

Update

The recovery process is still ongoing.
We will provide an update within an hour.

Posted Dec 12, 2025 - 17:58 CET

Identified

We have root caused the issue and the restoration process is underway already.
The next update will be provided in 30mins.

Posted Dec 12, 2025 - 17:31 CET

Investigating

There is an outage on one of the clusters in Exalate cloud that might affect access to the nodes temporarily. We are currently investigating the issue to ensure that service is restored as soon as possible.

A new update will be provided within 1 hour

Posted Dec 12, 2025 - 17:15 CET

This incident affected: Exalate Cloud (connect.exalate.net (mapper), connect.exalate.cloud), Zendesk (Exalate Console), Jira Cloud (Synchronisation node), Azure DevOps (Exalate for Azure DevOps), Service Now (Exalate for ServiceNow in Exalate Cloud), GitHub (Exalate for GitHub), Salesforce (Exalate for SalesForce), Exalate for Freshdesk in Exalate Cloud (Freshdesk), Exalate for Freshservice in Exalate Cloud (Freshservice), and Exalate.app.