insights

What the SIL? Safety Integrity Levels in Transportation Infrastructure

Introduction 

In the evolving landscape of Canadian railway transportation, the pursuit of enhanced safety, resilience, and reliability remains an enduring challenge. With the industry’s increasing reliance on larger, more intricate systems, the task of conducting robust system assurance, thorough safety assessments and accurate Safety Integrity Level (SIL) classifications has grown increasingly complex.  

At its core, SIL revolves around safeguarding against both systematic and random failures within safety functions and frequently features in discussions within the railway industry. However, its frequent misuse has led to substantial cost increases and project delays. 

Transportation infrastructure project owners can use SILs effectively to optimize costs and efficiency through judicious SIL allocation. Drawing upon our extensive industry experience in Canadian railway safety, we present valuable insights and lessons learned from real-world industry experience. This case underscores the critical importance of meticulous SIL allocation for specific functions within the railway systems. 

System Assurance 

System Assurance is a justified confidence that the system functions as intended and satisfies its set performance criteria. It ensures that the system is free of exploitable vulnerabilities, either intentionally or unintentionally, introduced in the system during its life cycle. The system must not only be safe but also useful and fit for its intended purpose. It helps project owners ensure high levels of Confidence, Reliability, Availability, Maintainability (RAM) and Safety of complex engineering systems.  

System Assurance is achieved through a planned, systematic set of multi-disciplinary activities. These activities are part of the overall System Engineering discipline and include RAM, Safety Assurance, Cyber Security, Electromagnetic Interference (EMI), Electromagnetic Compatibility (EMC), and Human Factors (HF) analysis. It should be noted that there are also many enabling activities that support System Assurance including Requirements Management and Configuration Management as part of a rigorous and robust Quality Assurance program, System Integration, and Verification and Validation activities. 

System Assurance

Safety Assurance 

Safety Assurance is concerned with identifying and minimizing hazards that can result in accidents with expected severity and a predicted probability. Safety is distinct from security as safety is defined as no injury or loss of life is caused, whether deliberate or not. Security refers to protecting people, facilities, operations, and data from loss, interference, theft, or negative changes. There is an overlap between safety and security and certain security risks can lead to hazardous outcomes that should not be overlooked.

However, it is very important to consider this distinction when professionals are involved in assessing System Safety. Furthermore, Safety can be divided into Physical Safety and Functional Safety.

Physical safety relates to concerns for passengers, staff, and the public in the operating environment, dealing with dangerous materials, processes and practices (i.e. dropping, slipping, tripping, falling, crushing, fire, explosion, etc.). 

Functional safety is the part of the overall safety of a system that relies on automatic protection, i.e. the system is operating correctly and predictably in response to inputs or failures. This can include human errors, hardware or software failures and operational/environmental stress. As systems have become more complex, System Safety Assurance focuses on functional safety, especially where existing safety codes and regulations address the physical safety. 

Iceberg of Incidents: Hierarchical Approach to Safety   

The triangle diagram below showcases that for every major injury there are more minor injuries and even more near misses. Faults lead to errors, errors lead to failures, failures lead to hazards, and hazards lead to accidents. Therefore, controlling faults becomes pivotal in accident prevention.  

To get a better understanding of the Hierarchical Approach to Safety, here are some helpful definitions from CENELEC 50126-1:2017: 

Accident: Unintended event or series of events that result in death, injury, loss of a system or service, or environmental damage. 

Hazard: Condition that could lead to an accident. 

Failure: Loss of ability to perform as required. 

Error: Discrepancy between a computed, observed or measured value or condition and the true, specified or theoretically correct value or condition. 

Fault: Abnormal condition that could lead to an error in a system. 

The cost of addressing these incidents is inversely related to their severity; with near misses being the least costly to address, and accidents causing major injuries being the costliest to address, in addition to the devastating human impact. 

To control and prevent accidents, we need to begin with analysing the faults. The near misses shown above are the warning signs that there is a fault in the system that needs to be addressed. The good news is that near misses are inexpensive to fix, and this is where infrastructure transportation project owners and operators need to spend most of their time and resources when assessing a system’s functional safety. Hardware faults can be predicted, but systematic faults pose a challenge. Knowing how to prevent systematic faults is as crucial as predicting hardware faults. We prevent systematic faults by scrutinizing the sources of faults, at an appropriate level, and following industry standards such as: 

  • CENELEC 50128 – Software and Data 
  • CENELEC 50159 – Communication Systems 
  • IEC 15288 – System Engineering 

Functional Safety Classification 

Functional safety is concerned with providing assurance and evidence that a hardware or software system meets the specified functional safety requirements. For a system, subsystem, or component to be classified as functionally safe, it must be independently certified via appropriate testing and ideally through accreditation bodies that have appropriate and recognized functional safety standards. A certified product can then be claimed to be functionally safe to a particular Safety Integrity Level (SIL). 

Safety Integrity is defined in IEC 61508 as “the likelihood of a safety-related system satisfactorily performing the requisite safety functions under all specified conditions within a stipulated timeframe”. The roots of SIL can thus be traced directly to the IEC 61508 standard, with specialised railway standards such as EN 50126, EN 50128, and EN 50129 deriving from this foundation. The IEC 61508 defines four SILs that indicate the degree to which a function will meet its specified safety, by analysing associated risks or hazards. It should be noted that SILs are ONLY assigned to functions and to NOT hardware, software, data, processes, or systems. Hardware, software, data, processes, and systems inherit SIL of the associated function. 

Recently, SIL allocations have been influenced not solely by safety considerations but also by security concerns encompassing health and safety. This evolving trend has ushered in the widespread allocation of SILs to systems whose failures bear no direct causality to hazardous incidents. Unfortunately, this misallocation contributes to escalated costs and delayed project timelines. 

SILs 

SIL (Safety Integrity Level) is determined through a risk analysis that considers the severity of potential consequences, their probability of occurrence, and the safety system’s ability to mitigate them. The diagram below shows the safety interval levels according to IEC 61508/ IEC 61511, with higher SILs having higher risk reduction capabilities.  

Here, SIL is defined in terms of the probability of failure on demand (PFD), which means the likelihood that a safety system will fail to perform its intended function when demanded. IEC 61508/ IEC 61511 uses a discrete scale from SIL 1 to SIL 4, with each level corresponding to a specific range of acceptable probability of failure on demand.  The higher the SIL, the higher the level of risk reduction. 

How to use SILs 

When using SILs, project owners must determine the intended system functions, understand functional failure consequences, and assign SILs based on one of the following methodical approaches. The 61508 Method: Initial assessment (number of blocking functions for a given consequence), assigning equal weight to all mitigation candidates, and creating a short list for further examination. One caveat for the 61508 method is that it requires further review of “edge cases” using other methods. The 50126 Method: is more refined and relies on defining the Tolerable Hazard Rate (THR) and the Tolerable Functional Failure Rate (TFR). This method should not be confused with Threat and Vulnerability Risk Analysis (TVRA) (i.e. security risk assessment) or requirements for High Availability (e.g. communication systems). 

Lessons Learnt  

One example of misuse of SIL and the need for proper SIL allocation within a sub-system comes from a railroad project owner looking to evaluate the Safety Integrity Level of their upgraded telecommunication system interfacing with the back-office. Initially, it was thought that this sub-system warranted classification as SIL 1 due to the transmission of sensitive and critical data and the telecommunication system’s high reliability requirement. However, through diligent assessment, leveraging our profound expertise in safety assurance, requirement definition and verification, hazard analysis, failure modes analysis, and risk mitigation, the SIL classification was corrected to SIL 0 for the telecommunication system, saving the project both time and cost. 

Safety Integrity Levels (SIL) are crucial to enhancing safety, resilience, and reliability in the railway transportation sector; however, its frequent misuse has led to substantial costs and project delays. Project owners must conduct SIL Assessments early in the project with vendor products that have the ability to be CENELEC certified to an acceptable SIL. Assigning the incorrect SIL to a system or component can cause extensive project delays and cost overruns. It is also important to note that SILs do not guarantee safety and SILs do not provide Reliability and Availability.