Rakuten announces launch of its enterprise grade data watchdog, Rakuten SixthSense Data Observability! Check out here

How Rakuten SixthSense can help SRE team to zero in quickly, on specific incident root cause, for a critical Service Degradation due to High Traffic Load

...

Incident Summary

On August 1, 2023, between 2:00 PM and 4:30 PM (UTC), One of your critical web application, experienced service degradation and was intermittently unresponsiveness, impacting customer access to the platform. The incident was triggered by a surge in traffic, leading to increased server load and reduced application responsiveness.

Incident Impact

  • Approximately 30% of HTTP requests experienced delayed responses or timeouts during the incident.
  • Around 20% increase of 4XX HTTP errors.
  • Customer experience was significantly impacted, leading to a 15% increase in customer support inquiries.
  • Loss of potential revenue due to 10% increase in cart abandonment during the incident period.

Rakuten SixthSense APM page

2:00 PM: DevOps engineers logs in to Rakuten SixthSense and looks into the APM dashboard.

Request Correlation

2:01 PM: They quickly identify a sharp increase in incoming HTTP requests and suspect a traffic surge.

Request Correlation

2:05 PM: DevOps engineers also identifies a significant increase in HTTP 4XX error rates.

Request Correlation

2:15 PM: They quickly check the error traces from the dashboard.

Request Correlation

2:16 PM: database query performance present in the same frame along the application metrics.

Request Correlation

2:17 PM: The application-database trace reveals an unexpected database query pattern causing database contention revealing a slow query.

2:30 PM: The root cause is identified as an inefficient query introduced in the recent code deployment.

3:00 PM: The problematic query is optimized and applied as an emergency patch to the production environment.

3:05 PM: The performance is quickly viewed in the dashboard again.

Request Correlation

Root Cause Summary

From a single Rakuten SixthSense dashboard, the SRE team is able to quickly validate the Application metrics, trace to the Errors and from the span information, was able detect the Database metrics, that lead to an inefficient database query.
Rakuten SixthSense's powerful distributed and method-level tracing capabilities have redefined the management of distributed systems. The technology, akin to a GPS for your requests, dramatically reduces RCA times, minimizes costs, and bolsters system reliability. Indeed, SixthSense has emerged as the backbone of observability tools in the ever-evolving landscape of distributed systems.