IT Data Recovery Human Error Takes Netflix Down

Patch Management for Methodist Healthcare Ministries

If you were counting on watching “It’s a Wonderful Life” on Netflix this past holiday season, you were probably disappointed. According to a post by Derrick Harris at Gigaom, a naughty developer-elf at Amazon Web Services (AWS) “accidentally deleted Elastic Load Balancer state data in Amazon’s East region that the service’s control plane needs to manage load balancers.”  The developer’s goof affected Netflix users trying to stream movies to a wide variety of devices.

Uh-ohh.

IT Data Recovery Failure Precedes Success

They tried to restore the ELB state data to a point immediately prior to when the data was deleted. According to AWS’s post about the outage, the first attempt at data recovery “consumed several hours and failed to provide a useable snapshot of the data.” Fortunately, the team found another way to recover critical data and merge in changes. Approximately 24 hours after the event had begun, Netflix service was fully restored.

Data Recovery Lessons: AWS Post Mortem on Netflix Outage

In hindsight, AWS identified several human errors that led to the accidental loss of mission critical data and the service interruption at Netflix. According to an AWS posting, the developer who accidently deleted the crucial data had access to production ELB state data that was “incorrectly set to be persistent rather than requiring a per access approval.”  In other words, one IT data recovery human error created the conditions that made the second more costly human error possible.

AWS indicates that it has a robust change management process to protect against data losses that can produce outages like this one. However, Amazon’s post about the outage says that the protocols were “not appropriately enabled for the ELB state data.” AWS has corrected this oversight, changed its data recovery process based upon what its IT professionals learned from this incident, and apologized to Netflix and its customers for being so naughty.

Recover Lost Data Quickly

Uh-ohh is going to happen. Human error, natural disaster, hardware failure or malicious attack may create an outage in a mission critical system. The crucial question is how quickly you firm’s IT team will be able to recover and restore the lost data and resume normal business operations.

Powered by Storage Craft’s ShadowProtect engine, Kaseya’s new System Backup and Restore is the right tool to turn uh-ohh back into “It’s a Wonderful Life.”  Try this IT data recovery tool now!

Sources:
http://gigaom.com/2012/12/31/amazon-blames-human-error-for-xmas-eve-outage-netflix-vows-better-resiliency/
http://aws.amazon.com/message/680587/

ROI of Omni IT

What’s the ROI of Omni IT?

According to Gartner, worldwide IT spending is projected to total $3.79 trillion in 2019, an increase of 1.1 percent fromRead More

Cloud Sourced IT Automation

Save Time with Crowd Sourced IT Automation

With ever-evolving businesses, IT professionals face several challenges such as maintaining critical system up-time along with complex system management. You constantly needRead More

Deliver Efficient IT Service

Deliver Efficient IT Service with Automation

Today’s companies depend on IT to achieve business objectives and gain a competitive advantage in the industry. Gartner’s 2018-2019 AnnualRead More

Business guy with digital automation board

Automate IT Monitoring and Remediation – Fix Problems Before They Affect Your Users

If you work in IT for a small to midsize business or a managed service provider (MSP), you probably haveRead More

Connect IT Asia-Pacific - Don't Miss the Premier IT Management Event of the Year - Join Us in Sydney 1-3 October 2019 - Register Now

Archives

Categories