September 19, 2024

For Fast Recovery, Plan For the Worst: Emergency Checklist for Data Center Managers

by Kate Fulkert, Vertiv Company

Try as we might, there is a limit to what we can do to anticipate the unexpected. But when it comes to the electric grid in the modern era of severe weather events, we can rely on one thing: unreliability. For data center operators, this puts basic and essential business functions in harm’s way.

The devastating California fires over the last few years have shown us all just how exposed data centers can be in the face of a natural disaster. Consider the 2018 Camp Fire in California. As the fire itself abated, the utility giant PG&E announced it would start pre-emptively interrupting service to fire-prone regions that hadn’t been burned in that cycle. Moreover, those planned outages appear to be an unpleasant fact of life for California residents and businesses, because the utility company has indicated that it could take a decade to upgrade and harden its systems. Until then, the outages are likely to continue.

This problem has been looming for some time. A 2017 infrastructure report card, an analysis conducted every four years by the American Society of Civil Engineers, gave America’s energy system a low-performing D+, in large part because the country’s power lines were built more than 70 years ago.

Extreme weather, be it climate-fueled fires, floods, winter storms or hurricanes, has gotten worse while America’s grid ages. That harsh reality means data center managers would be wise to develop plans to help their facilities and networks survive the various kinds of disruption. Especially for places where you may have a customer.

It’s impossible to anticipate every threat that could jeopardize a data center, but organizations can and should plan for emergencies. A robust business continuity and disaster recovery plan eliminates one common cause of data center downtime: human error. A well-curated checklist reduces the risk of mistakes and oversights in the heat of the moment and ensures an organization is prepared for any eventuality.

The following are a few suggestions specific to the data center.

  1. Risk Assessment: This should be the first step for any organization in developing a disaster recovery plan. What types of threats are relevant to your area? Hurricanes, flooding, tornadoes, fires, earthquakes, volcanoes are examples of natural disasters that require planning, but are any of your facilities located near areas where radiation exposure, toxic waste or explosives are a consideration? A good data center services provider can help with the assessment and the ensuing emergency plan.
     
  2. Evacuation Plan: Human safety always comes first, so you should have a plan to evacuate any personnel potentially at risk. This should include a detailed plan for communication with staff to confirm their safety.
     
  3. Check the Generator: There is a tendency to set and forget a generator, but that piece of machinery requires maintenance and upkeep to ensure it performs as expected when needed. Is it full of clean fuel? Are the fuel line and air filter free of contaminants? Test the generator regularly and ahead of any anticipated weather events. Line up at least three vendors to deliver fuel in the event of an extended outage. Remember, fuel often is at a premium after a disaster, and yours will not be the only organization requiring delivery.
     
  4. Communicate with Utilities: Consider the ramifications of loss of power, water, phone or internet. Communicate early with utility providers to set up contingency plans. Create a contact list and have a plan for communicating if traditional channels are compromised.
     
  5. Weatherproof the Data Center: If the threat is a hurricane or flooding or something weather-related, take the necessary steps to harden your facility. Secure or store loose items and make sure servers are secured in their racks. Check gutters and storm drains to make sure they’re clear. Make sure doors can be sealed against high winds and blowing rain. Water is the enemy of the data center, so do everything necessary to ensure no water enters the server rooms.
     
  6. Backup Data: Many data centers conduct routine data backups once a week. If you know severe weather is coming, increase the frequency of those backups. We can’t always know when a disaster is going to strike, so organizations should consider making daily backups a regular practice. Consider where data is being backed up. It should go offsite, but make sure the offsite location is secure and safe from potential disaster.
     
  7. Emergency Staffing: In the event of a significant disaster, local employees may be unavailable to work. They may have evacuated with their families, be dealing with urgent damages to their homes or vehicles or be unable to reach the data center due to impassable roads. Consider bringing in emergency crews and establishing crisis housing near the data center to ensure you have on-site personnel.
     
  8. Contact Vendors: Establish a list of vendors and prioritize those requiring communication in the event of an emergency. Reach out to them early and make the necessary arrangements so you can be free to focus on more immediate needs during the crisis.
     
  9. Trust Your Team: Bring together all parties – IT, Facilities, Security, HR, Communications, Legal, Logistics, Information Security, as well as Business Continuity – and make sure everyone understands their responsibilities throughout the crisis. Have a plan for communicating with that team in the event regular communications are down. Consider that the more your teams can do to generate a sense of calm and well-managed order, the more this will help customers and business partners to weather this virtual storm and be grateful for the cooler heads that prevailed on their behalf.
     
  10. Confirm Insurance Coverage: This starts with insurance on the facility itself, but additional coverage may be warranted on the infrastructure or for continuity of business. If the data center is down for a week, continuity of business insurance can compensate the organization for lost revenue.
     
  11. Remember the Edge: Today, the enterprise data center is just one piece in a distributed network. Many organizations manage multiple edge sites, those sites are more critical than ever before, and they must be considered in disaster planning. In many cases, the core data center may be safe from a specific event, but one or more edge sites could be at risk. Prioritize by criticality and have a plan for those facilities and the personnel at those sites.
     
  12. Mind the Cloud: Just because some of your data and applications are housed in the cloud does not mean they are always safe from emergency events. Those cloud servers are in a data center somewhere, and you should know how your cloud provider will handle a potential disaster. How often are they backing up data? Do they have redundant sites? Ask these questions before a crisis, because once disaster strikes, it’s too late.
     
  13. Consider the Opportunists: Hackers see natural disasters or similar emergency events as an opportunity to access networks while attention is focused elsewhere. Make sure your information security and physical security teams are prepared for bad actors.

Don’t forget that teams facing these tricky situations in developing countries must develop contingency plans as unique as their locale. In some parts of the world, the electric grid is under the control of the government. As civil unrest increases, these basic elements of the modern economy can be revoked suddenly, leaving businesses to find alternate means of power and staffing.

Finally, keep in mind that an emergency preparedness plan is a living document, one that you should be prepared to alter as new information comes to light or as circumstances change. It is a guide for ongoing planning, exercises, and updates. Circumstances and personnel change, equipment ages and is refreshed, replaced or added, and risks evolve over time. Work across your organization and with your data center service provider to ensure your plan is up to date and the relevant personnel are prepared for the worst.

As global business continuity and disaster recovery manager for Vertiv, Kate Fulkert is responsible for developing strategies and solutions to minimize business disruptions related to crises or disasters experienced around the globe. Fulkert joined Vertiv in 2018, bringing with her more than 19 years of experience in business continuity, disaster recovery and crisis management. She holds a master’s degree in emergency and disaster management from Georgetown University and is a Master Business Continuity Professional (MBCP), which is the highest level of certification offered from DRI International. Fulkert also holds a crisis management certification from the Massachusetts Institute of Technology and has completed Certified Information Systems Security Professional (CISSP) training.