Early May 2019, Microsoft suffered from an outage which left many customers unable to connect to Office 365 or (some) Azure services. At the root of the issue lies a faulty DNS configuration which surfaced in an effort to move DNS services in-house at Microsoft.

Whilst I’m sure that Microsoft took all necessary precautions to avoid issues, I found one thing in the Post-Incident Report (PIR) very interesting: Apparently, Microsoft did not pick up on the outage until after it was reported by customers. Guessing at why that may be is because the have a heavy inside-focused approach to monitoring.
Indeed, from an “inside”-perspective, Office 365 was working just fine. However, externally not so much. As the PIR states: Microsoft will review their monitoring configuration as a result of this outage.

So, as you can see: every outage has its learnings! Should you want to learn morning, head over to the ENow blog where I’ve written a little more on the topic.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.