October 5 2021 (updated Oct 6, Oct 7): In what was one of the longest outages for Facebook at a group level, billions of users worldwide, were unable to access the apps Facebook, WhatsApp, and Instagram for approximately six hours yesterday. The outage was caused by lack of communication between Facebook’s systems owing to a DNS error.
Facebook issued a statement on Tuesday confirming that the cause of the outage was a configuration change to the backbone routers that coordinate network traffic between the company’s data centres, which had a cascading effect, bringing all Facebook services to a halt.
In light of this occurrence, we share some comments from experts:
Lotem Finkelstenn, Head of Threat Intelligence at Check Point Software Technologies: Facebook, Instagram, WhatsApp and other popular internet services are not available for the last hours. While it might look like a colossal failure in all those services and apps, the reason is probably a DNS service they all use to route their pages and service to our devices. So, what is DNS?! Simply, it is the internet protocol to convert the words we use like Facebook.com to language computers know - numbers, or internet address. They do the conversion and route us to the services and applications we asked to use. When this service falls, the services look like they are down, but actually just now accessible.
Taaran Chanana, MD and Co-Founder of MemeChat: The fact that WhatsApp Facebook and Instagram going down for more than 10 plus hours just shows how important homegrown apps are to its country and that we cannot rely on overseas platforms be it for social media, communication or entertainment.It's been happening recently many times in the last few months if you notice. Earlier for few minutes, then for an hour or two, and now for 6-7 hours.
Fortunately, India houses many homegrown apps to fulfill the need of internet users. These apps have proven their worth in the heavily competitive Indian market with features, connectivity, and agility that are at par with the best in the world. They provide users and marketers with more reliable and stable platforms for every need, be it communication, personal expression, connecting with others, or marketing.
Candid Wuest, VP of Cyber Protection Research, Acronis: While there’s no confirmation on what caused the incident from Facebook Inc, it’s possible that the issue lies with the BGP or DNS protocol – which happen to be popular targets among cyber criminals.There are various potential attacks against DNS infrastructure – from DDoS attacks to local DNS rebinding or hijacking a DNS with social engineering against the registrar. Looking at overall attack statistics, they are a lot less popular then common malware and ransomware attacks, but they can be extremely devastating if successful in a sophisticated attack. It’s like pulling the electric cable to your server room – whole enterprise suddenly goes dark.
Protection against DNS attacks is not trivial as they come in multiple facets. It requires strong authentication and patching to guard your own services, training against social engineering attacks, as well as classical DDoS mitigations from providers, such as Cloudflare. Naturally, configuration issues should be avoided as well. Depending on what service is attacked – for example, if it’s a central authentication server shared between multiple brands, like in this case, then such a single outage can lead to multiple brands going offline.To be fair, we must note that most commonly such outages are caused by non-malicious actions – suspect it to be the case here too.
Boris Cipot, Senior Security Engineer, Synopsys Software Integrity Group: Facebook is maintaining millions of servers to provide different offerings such as the Facebook platform itself, Instagram, WhatsApp and Oculus Rift VR services to their users. Part of this maintenance is also changing certain server settings that define how the server and the services on it are working.
In this outage, the problem was caused due to the change of the Domain Name Service settings. Domain Name Service, or DNS for short, is a hierarchical and decentralised naming system that makes it possible for computers on the internet to find each other. If this setting is wrong, then your servers will no longer be reachable as other computers will not be able to find them. In simpler terms, think about a phone number. If you have a phone number, other people can call you. If you hand out the wrong phone number, others will not be able to contact you - and this is what has happened to Facebook. Due to a misconfiguration of those DNS settings, the Facebook servers were no longer accessible and therefore took all services offline.
Changes of that magnitude are not done by hand. IT administrators use scripts and specialised programs to execute their work on several machines automated and fast. However, the risk of running into glitches is always present. As with any programming language, bugs can be also a part of scripting languages. In order to avoid glitches on a large scale, the best way is to do it step by step. That way you are rolling out the changes in a small, controlled environment, and containing any possible threat in a in a sustainable manner. Why was this not possible here and why did this happen is a question only Facebook can answer. However, changes of any kind should be done in smaller steps until it’s confirmed that everything is working. After the confirmation you can up the scale, however all services at once is a big change.
On the other hand, we also see other configuration mistakes that are maybe on the contrary; not bringing the services down, but keeping them up and running without any security implemented. For example, in many cases, there have been reports of misconfigured AWS .
Nanjunda Prasad Ramesh, CEO, Multi-Verse Technologies:The Facebook suite collapsed like a house of cards on Monday which led to the Great Social Media Blackout of 2021 for 6 hours. One thing to keep in mind is that such things do happen in the tech world and it cannot be avoided. However, the impact could have been minimized or eliminated to a large extent. Facebook ran three different apps from one centralized infrastructure. To top it all, it was a single point of failure for users worldwide. If things go wrong, users across the world are impacted. This outage demonstrates the risks of the whole Internet being dependent on one company and this could have been minimized. A possible solution to this is to segregate infrastructure by country/region so that the impact in case of failure is minimized/localized. It also helps respect the local data laws of that particular country/region. From a user’s perspective, we need to have alternatives available. It is not good for the entire world to be dependent on apps from one single company.
Whistleblower alleges Facebook puts "profit over safety."
The world's largest social media platform has been embroiled in a firestorm brought about by ex employee Frances Haugen, who as an unnamed whistleblower had earlier shared documents with US lawmakers and The Wall Street Journal that detailed how Facebook knew its products, including Instagram, Haugen, a 37-year-old data scientist from Iowa, has worked for companies including Google and Pinterest -- but said in an interview with CBS news show "60 Minutes" that Facebook was "substantially worse" than anything she had seen before. The whistleblower who shared a trove of Facebook documents alleging the social media giant knew its products were fuelling hate and harming children's mental health revealed her identity Sunday in a televised interview, and accused the company of choosing "profit over safety."
She called for the company to be regulated. "Facebook over and over again has shown it chooses profit over safety. It is subsidizing, it is paying for its profits with our safety….The version of Facebook that exists today is tearing our societies apart and causing ethnic violence around the world," she said.
In the "60 Minutes" interview Haugen explained how the company's News Feed algorithm is optimized for content that gets a reaction. The company's own research shows that it is "easier to inspire people to anger than it is to other emotions," Haugen said.
"Facebook has realized that if they change the algorithm to be safer, people will spend less time on the site, they'll click on less ads, they'll make less money."
Read the CBS report and excerpts from the interview here