How to attain Operational Resilience

“The need for operational risk management is more acute than ever” claims the Institute of Operational Risk (IOR) in its ‘Operational Resilience’ white paper.

Geared at helping risk professionals to improve the practice of operational risk management within their organizations, the guidance outlines the principles of operational resilience, together with a range of good practices, examples, and suggestions for its attainment. In this short blog we consider some key takeaways from the guidance:

Operational resilience is an outcome, not a risk

Operational resilience is defined as “the ability of an organisation to deliver critical operations through disruption” and it has historically been proactively managed through an organization’s operational risk management framework (ORMF) and reactively via continuity management, incidence response, and crisis or recovery planning. These, suggest the IOR, are “backward-looking and focus on recovery from specific high impact individual risk events (e.g. power outage, cyber incident.) Operational resilience is the outcome of the effectiveness of these risk management activities so its management requires coordination and understanding across them all.”

What’s the best approach for your organization?

Taking the stance that operational resilience is a component of the ORMF and should be managed within it, the IOR acknowledges that particularly in the larger enterprise, operational resilience teams or functions may have been set up independently from operational risk and supporting frameworks (IT, cyber, business continuity, incident management, etc). When deciding what would be most appropriate for your business, IOR suggests that consideration is given to:

The size and scope of the organization
The breadth of services offered
Relative maturity of existing proactive and reactive risk management frameworks

Look at risk through a ‘service’ lens

According to the IOR, “Operational resilience requires organisations to look at risk through a service lens, not just by the system, business or operational area. Whilst this requires a shift in mindset, operational resilience is an outcome, not a risk, so the key activities to manage the risks that in aggregate determine operational resilience should follow existing risk management processes and use existing frameworks where practicable to do so.”

Since collaboration can improve operational resilience governance, it is advisable to identify all business services, defining each as “a service that an organisation provides to an external end-user” deemed important if “its disruption would materially impact an organisation’s (financial or operational) viability, cause considerable customer harm or impact its ability to deliver is Board approved strategy.”

Mapping processes, measuring service impact

Why should processes be mapped? According to IOR, it enables the identification and management of key operational risks (people, processes, and systems) associated with delivering a service. “Understanding how a service is delivered and how it could be disrupted enables organisations to put proportionate measures in place to prevent service outage and as result, may create value through rationalisation of existing silo-based control activities.”

Since mapping is likely to be a major undertaking, the IOR advises that it should reference content within an organization’s risk register, which will aid focus on the largest risks and their impact (rather than their likelihood.)

When it comes to measurement, there is no requirement to quantify impact using specified metrics. It may be worth remembering that resilience is “dynamic and therefore measures should, where possible, be data-driven and able to be measured in as close to real-time as possible to support swift action should an operational event occur that impacts service delivery.”

Setting impact tolerance

Section 5 within the Operational Resilience whitepaper details the determination and setting of impact tolerances for important business services. “Tolerance will usually be expressed in terms of service outage time (this will be mandatory for UK Financial Services organisations) but can also be used in combination with other relevant metrics (for example number of clients impacted or production volume.)” IOR’s advice is that tolerances should be set at or before the point at which disruption would cause intolerable risk – harm to consumers/market participants, financial harm to the organization itself, or its licence to operate.

Operational resilience scenario testing, monitoring, and control

The Institute recommends that operational resilience scenarios should be designed and tested in the way that they are for operational risk. Though the outputs will different, they will be useful for governance and compliance, and to support strategic and operational decision-making.

It is suggested that operational resilience is considered within existing operational risk management fora and reporting. “…the overarching rile should be to use existing information and wherever possible, don’t overcomplicate. The objective is to identify emerging risks that could impact an organisation’s ability to deliver IBS (or stay within impact tolerance in the event of an outage) and, where appropriate, take action to mitigate them.”

Since managing operational resilience is about expecting change and reacting to external events, monitoring should take into account:

Interdependency and interconnections between systems and services
3^rd part dependencies and outsourcing
The external environment

Consider existing risk management processes

IOR advises that existing risk management practices should be referenced in the first instance since they will hold much of the data and proactive and reactive practices required to manage operational resilience.

It may also be useful to remember that Rome wasn’t built in a day. Operational resilience “is an evolving discipline and operational resilience management frameworks and policies should be designed to support agile decision making and be adaptable to respond to emerging threats, issues and regulations.”