Amazon Web Service (AWS) went through outage for four hours around three days back, which took the internet security community by surprise. All sorts of speculations and rumors started spreading about the reasons behind the service outage.
However, the company has now publicly announced that the actual reason behind the breakdown of its internet service is not a scam or hack attack but a typo error. This is even more surprising as the company admitted that one of their engineers caused the error.
Reportedly, Amazon has released an official statement regarding the issue that disconnected Amazon from the rest of the world for a few hours and the company has blamed an engineer at Amazon Web Service (AWS).
In its recent blog post, Amazon stated that “At 9:37 AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that are used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”
This means just by pushing a button or executing a command that is not supposed to be executed can create all sorts of troubles. It is worth noting that about one-third of Amazon’s entire internet traffic flows through AWS servers. So, when the engineer deleted a bunch of these servers, a huge number of users was deprived of visiting Amazon.
However, here, the question arises that what took Amazon hours to resolve the issue? To this concern, the company clarified that some of its main systems had not been restarted completely in the past few years, which is why it took them “longer than expected” to resume internet service.
RunKeeper, Medium, Trello, Imgur, Giphy, Soundcloud, Quora, Business Insider, Coursera, Time Inc and several other sites were down due to the issue.
Amazon is now implementing several modifications to its systems after this mishap; such as it will be modifying a tool for preventing deletion of such a large number of servers at once. But what Amazon is failing to understand is that to avoid such problems in the future, it has to do something to make its internet service dispersed across multiple services. Currently, it heavily relies on a single service.
Typo errors like these will continue to occur as it is virtually impossible to eliminate possibilities of human errors completely. Therefore, it is important that the company makes its system fool-proof from all aspects.