Note: All times mentioned in this article are by CEST (Central European Standard Time) unless noted otherwise
17:45 – The Reports (Source 1, Source 2, Source 3, Source 4)
Starting from 17:44-17:45, within a few minutes hundreds of thousands reported to Downdetector that they couldn’t refresh their feed, access the website, send messages to family and friends and much more. Many expected a short outage… but that’s where it all started from.
Facebook was reported 476.746 times in the first hour.
Instagram was reported 372.009 times in the first hour.
Whatsapp was reported 120.612 times in the first hour.
Facebook messager was reported 33.099 times in the first hour.
18:47 – Dane Knecht (source 1)
Cloudflare senior vice president Dane Knecht notes that Facebook’s border gateway protocol routes — BGP helps networks pick the best path to deliver internet traffic — have been “withdrawn from the internet.”
Dane Knecht was fully correct here in the end. The very issue is known from this point on… But they would have to access the routers themselves to fix this.
18:51 – IOT Doors (Source 1, Source 2)
Reported via Twitter, Facebook’s own door locks rely on an IOT based network connected to the same network as the rest of the servers. There is a massive issue with this, as without said network being operational, you can’t access the routers. We don’t know why Facebook has made the choice to lock the very thing that keeps it alive behind said doors and what eventually opened it (we expect tools of some sort).
Was just on phone with someone who works for FB who described employees unable to enter buildings this morning to begin to evaluate extent of outage because their badges weren’t working to access doors.
JUST IN – Facebook employees reportedly can’t enter buildings to evaluate the Internet outage because their door access badges weren’t working (NYT)
21:11 – Routing Sections (Source 1)
So, someone deleted large sections of the routing….that doesn’t mean Facebook is just down, from the looks of it….that means Facebook is GONE.
(Image credit: @BenjaminEnfield)
23:11 – Cloudflare Releases Blog (Source 1)
The final part that confirmed everything, Cloudflare released a blog detailing what happened. We’ll include a few small snippets, but highly recommend reading it if you’re interested.
Today at 1651 UTC, we opened an internal incident entitled “Facebook DNS lookup returning SERVFAIL” because we were worried that something was wrong with our DNS resolver 1.1.1.1. But as we were about to post on our public status page we realized something else more serious was going on.
BGP stands for Border Gateway Protocol. It’s a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn’t know what to do, and the Internet wouldn’t work.
The Internet is literally a network of networks, and it’s bound together by BGP. BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable.The individual networks each have an ASN: an Autonomous System Number. An Autonomous System (AS) is an individual network with a unified internal routing policy. An AS can originate prefixes (say that they control a group of IP addresses), as well as transit prefixes (say they know how to reach specific groups of IP addresses).
Cloudflare’s ASN is AS13335. Every ASN needs to announce its prefix routes to the Internet using BGP; otherwise, no one will know how to connect and where to find us.
At around 21:00 UTC we saw renewed BGP activity from Facebook’s network which peaked at 21:17 UTC.
This chart shows the availability of the DNS name ‘facebook.com’ on Cloudflare’s DNS resolver 1.1.1.1. It stopped being available at around 15:50 UTC and returned at 21:20 UTC.
Undoubtedly Facebook, WhatsApp and Instagram services will take further time to come online but as of 21:28 UTC Facebook appears to be reconnected to the global Internet and DNS working again.
BGP Updates Facebook
Queries for Facebook, WhatsApp, Massager and Instagram
Queries for competitors Twitter, Signal, Telegram and TikTok
Availability Facebook.com on 1.1.1.1
BGP activity Facebook Network
23:28 – Facebook Is 100% Up Again
As shown in one of the graphs above, Facebook came back online at 21:28 UTC or 23:28 CEST, with WhatsApp and Instagram soon following along. This marks the end of the Facebook outage for now, as the cause of the BGP data disappearing is still unknown.
Read more
Cloudflare – What is DNS? | How DNS works
Cloudflare – What is BGP? | BGP routing explained
Cloudflare – What is an autonomous system? | What are ASNs?