Every outage has its learnings…

Early May 2019, Microsoft suffered from an outage which left many customers unable to connect to Office 365 or (some) Azure services. At the root of the issue lies a faulty DNS configuration which surfaced in an effort to move DNS services in-house at Microsoft.

Whilst I’m sure that Microsoft took all necessary precautions to avoid issues, I found one thing in the Post-Incident Report (PIR) very interesting: Apparently, Microsoft did not pick up on the outage until after it was reported by customers. Guessing at why that may be is because the have a heavy inside-focused approach to monitoring.
Indeed, from an “inside”-perspective, Office 365 was working just fine. However, externally not so much. As the PIR states: Microsoft will review their monitoring configuration as a result of this outage.

So, as you can see: every outage has its learnings! Should you want to learn morning, head over to the ENow blog where I’ve written a little more on the topic.