Executive Summary
- Infrastructure is crucial in the face of maintaining operational resilience, but so are the processes that guide the use of the data centre and the people following those procedures.
- Well-defined processes and a team of trained people and management with good communication and energy are the difference between a data centre in chaos because a disruption is taking longer to solve and everyone’s all over the place, and a smoothly running data centre with staff who can deal with even the most unpredictable, complex outage.
- Data centre operators need to take more of a holistic approach to resilience; they should develop a robust operational resilience framework that includes all three of these elements: people, processes and infrastructure.
There are plenty of articles out there raising concerns, commenting on and predicting what infrastructure resilience should look like. But that’s just one element of maintaining operational resilience. The processes and the people behind it are also just as crucial to the treasure all pirates – I mean operators – strive for (uptime, if you hadn’t guessed!).
Ahh, 24/7 uptime, the big dream we’re all trying to make a reality, but achieving it is incredibly difficult. There are many factors at play, from power failures, cooling system overloads, network issues, complex infrastructure and alas, human error. There will always be a margin of error, a risk of downtime. Disruptions happen, it’s a fact – cyberattacks, supply chain issues, natural disastors and shifts in regulation. They are becoming more frequent too, so companies must adapt and recover quickly should the worst happen – shudders – downtime.
Infrastructure, people and processes are all crucial
While infrastructure is essential, the processes guiding the use of it and the people behind the machines, executing those processes, are equally critical. It’s important that operators take into consideration all three critical components to be able to absorb and adapt to any disruptions.
You can have the best early stage resilience planning, it’s future-proof and evolves with the rapidly changing landscape that is the data centre sector, but it’s not going to be as effective if you don’t have your staff trained, updated SOPs in place, and a tight ship running smoothly within your teams.
Why processes matter
Well-defined processes can make the difference between an unprepared data centre experiencing an outage it can’t fix right away, and a dynamic, response-ready system that can prevent or minimise the consequences should the worst happen.
Processes are clear, repeatable frameworks that employees can use during a crisis; they must be clear, easy to understand, and consistent across the board, so there is no confusion in admist of a crisis. All personnel should understand their roles and the steps needed to be taken to restore operations in the event of downtime. Furthermore, they also need to be updated regularly in accordance with new regulations and new technology incorporated.
By ensuring processes are clear, consistent and updated, this immediately reduces the risk of critical errors in high-pressure situations.
Why people matter
People are capable of doing things in an emergency that predefined processes or automated systems cannot – adapt and make real-time decisions in the face of the unpredictable.
There will be times of unexpected disruptions with an unknown solution that isn’t in the processes and manuals. The industry as a whole is rapidly innovating, building and launching new technology to combat the increasing risks and solve the industry’s problems, which means that if that tech fails, we may need innovative solutions to fix it. Trained employees can adapt, innovate and make sound judgments in the face of this unpredictability.
In a crisis, not only are the correct processes important, but having trained staff and strong leadership that effectively communicate is crucial.
Communication is absolutely key to ensure a coordinated response to disruptions that happen and having those people skills is necessary to manage expectations and to keep morale going when things get tough.
A holistic approach
Resilient infrastructure is the foundation for operational resilience. Processes and people actively integrate resilience into everyday operations, enabling teams to resolve disruptions quickly and restore uptime.
Data centre operators need to take a more holistic approach to resilience and develop a robust operational resilience framework that incorporates all three elements, enabling them to maintain uptime and adapt their resilience strategy to be fully prepared for outages.



