Partial API handling degradation
Incident Report for Cleeng
Postmortem

We have completed the RCA for this incident. If not received please request it via broadcastersupport@cleeng.com

Posted Apr 29, 2024 - 12:36 CEST

Resolved
The infrastructure works as expected from 02:10 CET on 27/04/2024
Posted Apr 27, 2024 - 12:29 CEST
Investigating
At 00:45 CET on 27/04/2024, we saw a substantial growth in the traffic coming to our platform. Our Auto Scaling (ASG) kicked off and started adding more instances to the infrastructure. At 01:00, the spike was massive and the ASG could not keep up the rate of growth. This affected about 50% of traffic between 01:00 to 01:10. Between 1:10 to 1:40, the unhandled traffic reduced to 30%. We took further manual actions at 1:44 CET to scale to a larger number of instances. From 1:53 CET, the unhandled rate further improved to 5-10% and by 2:10, the infrastructure managed to handle all the incoming traffic as expected.
Posted Apr 27, 2024 - 12:28 CEST
This incident affected: API.