European associations and european commision

You will find hereafter importants links from European associations and European Commision

cordis.europa.eu - SAFEC cordis.europa.eu - STSARCES www.cen.eu    
         
         

 

English

Functional safety

Functional safety and safety

From this directory you will find :

  • Functionnal safety for machines
  • Functional safety and IEC 61508 standard
  • Functional safety and IEC 61511 standard
  • Functional safety in ATEX
English

Harmonized standards list ATEX 94/9/EC - OJ-C 445-02 - 12/12/2014

12.12.2014   

EN

Official Journal of the European Union

C 445/5


Commission communication in the framework of the implementation of Directive 94/9/EC of the European Parliament and the Council of 23 March 1994 on the approximation of the laws of the Member States concerning equipment and protective systems intended for use in potentially explosive atmospheres

(Publication of titles and references of harmonised standards under Union harmonisation legislation)

(Text with EEA relevance)

(2014/C 445/02)

ESO

Reference and title of the standard

(and reference document)

First publication OJ

Reference of superseded standard

Date of cessation of presumption of conformity of superseded standard

CEN

EN 1010-1:2004+A1:2010

Safety of machinery — Safety requirements for the design and construction of printing and paper converting machines — Part 1: Common requirements

8.6.2011

EN 1010-1:2004

Date expired

(8.6.2011)

CEN

EN 1010-2:2006+A1:2010

Safety of machinery — Safety requirements for the design and construction of printing and paper converting machines — Part 2: Printing and varnishing machines including pre-press machinery

4.2.2011

EN 1010-2:2006

Date expired

(28.2.2011)

CEN

EN 1127-1:2011

Explosive atmospheres — Explosion prevention and protection — Part 1: Basic concepts and methodology

18.11.2011

EN 1127-1:2007

Date expired

(31.7.2014)

CEN

EN 1127-2:2014

Explosive atmospheres — Explosion prevention and protection — Part 2: Basic concepts and methodology for mining

12.12.2014

EN 1127-2:2002+A1:2008

31.12.2014

CEN

EN 1710:2005+A1:2008

Equipment and components intended for use in potentially explosive atmospheres in underground mines

20.8.2008

EN 1710:2005

Date expired

(28.12.2009)

 

EN 1710:2005+A1:2008/AC:2010

 

 

 

CEN

EN 1755:2000+A2:2013

Safety of industrial trucks — Operation in potentially explosive atmospheres — Use in flammable gas, vapour, mist and dust

4.5.2013

EN 1755:2000+A1:2009

Date expired

(30.9.2013)

CEN

EN 1834-1:2000

Reciprocating internal combustion engines — Safety requirements for design and construction of engines for use in potentially explosive atmospheres — Part 1: Group II engines for use in flammable gas and vapour atmospheres

21.7.2001

 

 

CEN

EN 1834-2:2000

Reciprocating internal combustion engines — Safety requirements for design and construction of engines for use in potentially explosive atmospheres — Part 2: Group I engines for use in underground workings susceptible to firedamp and/or combustible dust

21.7.2001

 

 

CEN

EN 1834-3:2000

Reciprocating internal combustion engines — Safety requirements for design and construction of engines for use in potentially explosive atmospheres — Part 3: Group II engines for use in flammable dust atmospheres

21.7.2001

 

 

CEN

EN 1839:2012

Determination of explosion limits of gases and vapours

22.11.2012

EN 1839:2003

Date expired

(31.3.2013)

CEN

EN 1953:2013

Atomising and spraying equipment for coating materials — Safety requirements

5.11.2013

 

 

CEN

EN 12581:2005+A1:2010

Coating plants — Machinery for dip coating and electrodeposition of organic liquid coating material — Safety requirements

17.9.2010

EN 12581:2005

Date expired

(31.12.2010)

CEN

EN 12621:2006+A1:2010

Machinery for the supply and circulation of coating materials under pressure — Safety requirements

17.9.2010

EN 12621:2006

Date expired

(31.12.2010)

CEN

EN 12757-1:2005+A1:2010

Mixing machinery for coating materials — Safety requirements — Part 1: Mixing machinery for use in vehicle refinishing

17.9.2010

EN 12757-1:2005

Date expired

(31.12.2010)

CEN

EN 13012:2012

Petrol filling stations — Construction and performance of automatic nozzles for use on fuel dispensers

3.8.2012

EN 13012:2001

Date expired

(31.12.2012)

CEN

EN 13160-1:2003

Leak detection systems — Part 1: General principles

14.8.2003

 

 

CEN

EN 13237:2012

Potentially explosive atmospheres — Terms and definitions for equipment and protective systems intended for use in potentially explosive atmospheres

12.2.2013

EN 13237:2003

Date expired

(30.4.2013)

CEN

EN 13463-1:2009

Non-electrical equipment for use in potentially explosive atmospheres — Part 1: Basic method and requirements

16.4.2010

EN 13463-1:2001

Date expired

(31.12.2010)

CEN

EN 13463-2:2004

Non-electrical equipment for use in potentially explosive atmospheres — Part 2: Protection by flow restricting enclosure ‘fr’

30.11.2005

 

 

CEN

EN 13463-3:2005

Non-electrical equipment for use in potentially explosive atmospheres — Part 3: Protection by flameproof enclosure ‘d’

30.11.2005

 

 

CEN

EN 13463-5:2011

Non-electrical equipment intended for use in potentially explosive atmospheres — Part 5: Protection by constructional safety ‘c’

18.11.2011

EN 13463-5:2003

Date expired

(31.7.2014)

CEN

EN 13463-6:2005

Non-electrical equipment for use in potentially explosive atmospheres — Part 6: Protection by control of ignition source ‘b’

30.11.2005

 

 

CEN

EN 13463-8:2003

Non-electrical equipment for potentially explosive atmospheres — Part 8: Protection by liquid immersion ‘k’

12.8.2004

 

 

CEN

EN 13616:2004

Overfill prevention devices for static tanks for liquid petroleum fuels

9.3.2006

 

 

 

EN 13616:2004/AC:2006

 

 

 

CEN

EN 13617-1:2012

Petrol filling stations — Part 1: Safety requirements for construction and performance of metering pumps, dispensers and remote pumping units

3.8.2012

EN 13617-1:2004+A1:2009

Date expired

(30.11.2012)

CEN

EN 13617-2:2012

Petrol filling stations — Part 2: Safety requirements for construction and performance of safe breaks for use on metering pumps and dispensers

4.5.2012

EN 13617-2:2004

Date expired

(30.9.2012)

CEN

EN 13617-3:2012

Petrol filling stations — Part 3: Safety requirements for construction and performance of shear valves

4.5.2012

EN 13617-3:2004

Date expired

(30.9.2012)

CEN

EN 13617-4:2012

Petrol filling stations — Part 4: Safety requirements for construction and performance of swivels for use on metering pumps and dispensers

5.11.2013

 

 

CEN

EN 13760:2003

Automotive LPG filling system for light and heavy duty vehicles — Nozzle, test requirements and dimensions

24.1.2004

 

 

CEN

EN 13821:2002

Potentially explosive atmospheres — Explosion prevention and protection — Determination of minimum ignition energy of dust/air mixtures

20.5.2003

 

 

CEN

EN 13852-1:2013

Cranes — Offshore cranes — Part 1: General-purpose offshore cranes

5.11.2013

 

 

CEN

EN 14034-1:2004+A1:2011

Determination of explosion characteristics of dust clouds — Part 1: Determination of the maximum explosion pressure pmax of dust clouds

8.6.2011

EN 14034-1:2004

Date expired

(31.7.2011)

CEN

EN 14034-2:2006+A1:2011

Determination of explosion characteristics of dust clouds — Part 2: Determination of the maximum rate of explosion pressure rise (dp/dt)max of dust clouds

8.6.2011

EN 14034-2:2006

Date expired

(31.7.2011)

CEN

EN 14034-3:2006+A1:2011

Determination of explosion characteristics of dust clouds — Part 3: Determination of the lower explosion limit LEL of dust clouds

8.6.2011

EN 14034-3:2006

Date expired

(31.7.2011)

CEN

EN 14034-4:2004+A1:2011

Determination of explosion characteristics of dust clouds — Part 4: Determination of the limiting oxygen concentration LOC of dust clouds

8.6.2011

EN 14034-4:2004

Date expired

(31.7.2011)

CEN

EN 14373:2005

Explosion suppression systems

9.3.2006

 

 

CEN

EN 14460:2006

Explosion resistant equipment

15.12.2006

 

 

CEN

EN 14491:2012

Dust explosion venting protective systems

22.11.2012

EN 14491:2006

Date expired

(28.2.2013)

CEN

EN 14492-1:2006+A1:2009

Cranes — Power driven winches and hoists — Part 1: Power driven winches

16.4.2010

EN 14492-1:2006

Date expired

(30.4.2010)

 

EN 14492-1:2006+A1:2009/AC:2010

 

 

 

CEN

EN 14492-2:2006+A1:2009

Cranes — Power driven winches and hoists — Part 2: Power driven hoists

16.4.2010

EN 14492-2:2006

Date expired

(16.4.2010)

 

EN 14492-2:2006+A1:2009/AC:2010

 

 

 

CEN

EN 14522:2005

Determination of the auto ignition temperature of gases and vapours

30.11.2005

 

 

CEN

EN 14591-1:2004

Explosion prevention and protection in underground mines — Protective systems — Part 1: 2-bar explosion proof ventilation structure

9.3.2006

 

 

 

EN 14591-1:2004/AC:2006

 

 

 

CEN

EN 14591-2:2007

Explosion prevention and protection in underground mines — Protective systems — Part 2: Passive water trough barriers

12.12.2007

 

 

 

EN 14591-2:2007/AC:2008

 

 

 

CEN

EN 14591-4:2007

Explosion prevention and protection in underground mines — Protective systems — Part 4: Automatic extinguishing systems for road headers

12.12.2007

 

 

 

EN 14591-4:2007/AC:2008

 

 

 

CEN

EN 14677:2008

Safety of machinery — Secondary steelmaking — Machinery and equipment for treatment of liquid steel

20.8.2008

 

 

CEN

EN 14678-1:2013

LPG equipment and accessories — Construction and performance of LPG equipment for automotive filling stations — Part 1: Dispensers

4.5.2013

EN 14678-1:2006+A1:2009

Date expired

(30.9.2013)

CEN

EN 14681:2006+A1:2010

Safety of machinery — Safety requirements for machinery and equipment for production of steel by electric arc furnaces

8.6.2011

EN 14681:2006

Date expired

(8.6.2011)

CEN

EN 14756:2006

Determination of the limiting oxygen concentration (LOC) for flammable gases and vapours

12.12.2007

 

 

CEN

EN 14797:2006

Explosion venting devices

12.12.2007

 

 

CEN

EN 14973:2006+A1:2008

Conveyor belts for use in underground installations — Electrical and flammability safety requirements

7.7.2010

EN 14973:2006

Date expired

(31.12.2010)

CEN

EN 14983:2007

Explosion prevention and protection in underground mines — Equipment and protective systems for firedamp drainage

12.12.2007

 

 

CEN

EN 14986:2007

Design of fans working in potentially explosive atmospheres

12.12.2007

 

 

CEN

EN 14994:2007

Gas explosion venting protective systems

12.12.2007

 

 

CEN

EN 15089:2009

Explosion isolation systems

16.4.2010

 

 

CEN

EN 15188:2007

Determination of the spontaneous ignition behaviour of dust accumulations

12.12.2007

 

 

CEN

EN 15198:2007

Methodology for the risk assessment of non-electrical equipment and components for intended use in potentially explosive atmospheres

12.12.2007

 

 

CEN

EN 15233:2007

Methodology for functional safety assessment of protective systems for potentially explosive atmospheres

12.12.2007

 

 

CEN

EN 15268:2008

Petrol filling stations — Safety requirements for the construction of submersible pump assemblies

27.1.2009

 

 

CEN

EN 15794:2009

Determination of explosion points of flammable liquids

16.4.2010

 

 

CEN

EN 15967:2011

Determination of maximum explosion pressure and the maximum rate of pressure rise of gases and vapours

18.11.2011

EN 13673-2:2005

EN 13673-1:2003

Date expired

(29.2.2012)

CEN

EN 16009:2011

Flameless explosion venting devices

18.11.2011

 

 

CEN

EN 16020:2011

Explosion diverters

18.11.2011

 

 

CEN

EN 16447:2014

Explosion isolation flap valves

12.12.2014

 

 

CEN

EN ISO 16852:2010

Flame arresters — Performance requirements, test methods and limits for use (ISO 16852:2008, including Cor 1:2008 and Cor 2:2009)

17.9.2010

EN 12874:2001

Date expired

(31.12.2010)

Cenelec

EN 50050:2006

Electrical apparatus for potentially explosive atmospheres — Electrostatic hand-held spraying equipment

20.8.2008

 

 

Cenelec

EN 50050-1:2013

Electrostatic hand-held spraying equipment — Safety requirements — Part 1: Hand-held spraying equipment for ignitable liquid coating materials

14.3.2014

 

14.10.2016

Cenelec

EN 50050-2:2013

Electrostatic hand-held spraying equipment — Safety requirements — Part 2: Hand-held spraying equipment for ignitable coating powder

14.3.2014

 

14.10.2016

Cenelec

EN 50050-3:2013

Electrostatic hand-held spraying equipment — Safety requirements — Part 3: Hand-held spraying equipment for ignitable flock

14.3.2014

 

14.10.2016

Cenelec

EN 50104:2010

Electrical apparatus for the detection and measurement of oxygen — Performance requirements and test methods

4.2.2011

EN 50104:2002

Date expired

(1.6.2013)

Cenelec

EN 50176:2009

Stationary electrostatic application equipment for ignitable liquid coating material — Safety requirements

16.4.2010

 

 

Cenelec

EN 50177:2009

Stationary electrostatic application equipment for ignitable coating powders — Safety requirements

16.4.2010

 

 

 

EN 50177:2009/A1:2012

22.11.2012

 

23.7.2015

Cenelec

EN 50223:2010

Stationary electrostatic application equipment for ignitable flock material — Safety requirements

17.9.2010

 

 

Cenelec

EN 50271:2010

Electrical apparatus for the detection and measurement of combustible gases, toxic gases or oxygen — Requirements and tests for apparatus using software and/or digital technologies

4.2.2011

 

 

Cenelec

EN 50281-2-1:1998

Electrical apparatus for use in the presence of combustible dust — Part 2-1: Test methods — Methods for determining the minimum ignition temperatures of dust

IEC 61241-2-1:1994

6.11.1999

 

 

 

EN 50281-2-1:1998/AC:1999

 

 

 

Cenelec

EN 50303:2000

Group I, Category M1 equipment intended to remain functional in atmospheres endangered by firedamp and/or coal dust

16.2.2001

 

 

Cenelec

EN 50381:2004

Transportable ventilated rooms with or without an internal source of release

IEC/TR 60079-13:1982#IEC/TR 60079-16:1990

9.3.2006

 

 

 

EN 50381:2004/AC:2005

 

 

 

Cenelec

EN 50495:2010

Safety devices required for the safe functioning of equipment with respect to explosion risks

17.9.2010

 

 

Cenelec

EN 60079-0:2012

Explosive atmospheres — Part 0: Equipment — General requirements

IEC 60#IEC 60079-0:2011 (Modified)#IEC 60079-0:2011/IS1:2013

14.3.2014

EN 60079-0:2009

2.4.2015

 

EN 60079-0:2012/A11:2013

14.3.2014

 

7.10.2016

Cenelec

EN 60079-1:2007

Explosive atmospheres — Part 1: Equipment protection by flameproof enclosures ‘d’

IEC 60079-1:2007

20.8.2008

EN 60079-1:2004

Date expired

(1.7.2010)

Cenelec

EN 60079-2:2007

Explosive atmospheres — Part 2: Equipment protection by pressurized enclosure ‘p’

IEC 60079-2:2007

20.8.2008

EN 60079-2:2004

Date expired

(1.11.2010)

Cenelec

EN 60079-5:2007

Explosive atmospheres — Part 5: Equipment protection by powder filling ‘q’

IEC 60079-5:2007

20.8.2008

EN 50017:1998

Date expired

(1.11.2010)

Cenelec

EN 60079-6:2007

Explosive atmospheres — Part 6: Equipment protection by oil immersion ‘o’

IEC 60079-6:2007

20.8.2008

EN 50015:1998

Date expired

(1.5.2010)

Cenelec

EN 60079-7:2007

Explosive atmospheres — Part 7: Equipment protection by increased safety ‘e’

IEC 60079-7:2006

11.4.2008

EN 60079-7:2003

Date expired

(1.10.2009)

Cenelec

EN 60079-11:2012

Explosive atmospheres — Part 11: Equipment protection by intrinsic safety ‘i’

IEC 600#IEC 60079-11:2011

4.5.2012

EN 60079-11:2007

EN 61241-11:2006

Date expired

(4.8.2014)

Cenelec

EN 60079-15:2010

Explosive atmospheres — Part 15: Equipment protection by type of protection ‘n’

IEC 60079-15:2010

8.6.2011

EN 60079-15:2005

Date expired

(1.5.2013)

Cenelec

EN 60079-18:2009

Explosive atmospheres — Part 18: Equipment protection by encapsulation ‘m’

IEC 600#IEC 60079-18:2009

7.7.2010

EN 60079-18:2004

EN 61241-18:2004

Date expired

(1.10.2012)

Cenelec

EN 60079-20-1:2010

Explosive atmospheres — Part 20-1: Material characteristics for gas and vapour classification — Test methods and data

IEC 60079#IEC 60079-20-1:2010

17.9.2010

 

 

Cenelec

EN 60079-25:2010

Explosive atmospheres — Part 25: Intrinsically safe electrical systems

IEC 60079-25:2010

8.6.2011

EN 60079-25:2004

Date expired

(1.10.2013)

 

EN 60079-25:2010/AC:2013

 

 

 

Cenelec

EN 60079-26:2007

Explosive atmospheres — Part 26: Equipment with equipment protection level (EPL) Ga

IEC 60079-26:2006

20.8.2008

 

 

Cenelec

EN 60079-27:2008

Explosive atmospheres — Part 27: Fieldbus intrinsically safe concept (FISCO)

IEC 60079-27:2008

16.4.2010

EN 60079-27:2006

Date expired

(1.4.2011)

Cenelec

EN 60079-28:2007

Explosive atmospheres — Part 28: Protection of equipment and transmission systems using optical radiation

IEC 60079-28:2006

11.4.2008

 

 

Cenelec

EN 60079-29-1:2007

Explosive atmospheres — Part 29-1: Gas detectors — Performance requirements of detectors for flammable gases

IEC 60079-29-1:2007 (Modified)

20.8.2008

EN 61779-1:2000

EN 61779-4:2000

EN 61779-5:2000

EN 61779-3:2000

EN 61779-2:2000

Date expired

(1.11.2010)

Cenelec

EN 60079-29-4:2010

Explosive atmospheres — Part 29-4: Gas detectors — Performance requirements of open path detectors for flammable gases

IEC 60079-29-4:2009 (Modified)

8.6.2011

EN 50241-2:1999

EN 50241-1:1999

Date expired

(1.4.2013)

Cenelec

EN 60079-30-1:2007

Explosive atmospheres — Part 30-1: Electrical resistance trace heating — General and testing requirements

IEC 60079-30-1:2007

20.8.2008

 

 

Cenelec

EN 60079-31:2009

Explosive atmospheres — Part 31: Equipment dust ignition protection by enclosure ‘t’

IEC 600#IEC 60079-31:2008

7.7.2010

EN 61241-1:2004

Date expired

(1.10.2012)

Cenelec

EN 60079-31:2014

Explosive atmospheres — Part 31: Equipment dust ignition protection by enclosure ‘t’

IEC 60079-31:2013

12.12.2014

EN 60079-31:2009

1.1.2017

Cenelec

EN 60079-35-1:2011

Explosive atmospheres — Part 35-1: Caplights for use in mines susceptible to firedamp — General requirements — Construction and testing in relation to the risk of explosion

IEC 60079-35-1:2011

18.11.2011

 

Date expired

(30.6.2014)

 

EN 60079-35-1:2011/AC:2011

 

 

 

Cenelec

EN 61241-4:2006

Electrical apparatus for use in the presence of combustible dust — Part 4: Type of protection ‘pD’

IEC 61241-4:2001

20.8.2008

 

 

Cenelec

EN ISO/IEC 80079-34:2011

Explosive atmospheres — Part 34: Application of quality systems for equipment manufacture (ISO/IEC 80079-34:2011)

ISO/IEC 80079-34:2011 (Modified)

18.11.2011

EN 13980:2002

Date expired

(25.5.2014)

 

 

English

STSARCES - Annex 8 : Safety Validation of Complex Components - Validation by Analysis

Annex 8 - Safety Validation of Complex Components - Validation by Analysis

Final Report of WP3.1

 

European Project STSARCES

Contract SMT 4CT97-2191


Abstract

The aim of the safety validation process is to prove that the product meets the safety requirements. Safety validation of complex programmable systems has become an increasingly common procedure since programmable systems have turned out to be useful also in safety related systems. However, a new kind of thinking related to the whole life cycle of the programmable product is needed and new validation methods (analysis and testing) to support the old methods are inevitable. This means that methods such as failure mode and effect analysis (FMEA) are still applicable, but they are not sufficient. Methods are needed also to guarantee the quality of the hardware and software.

The main validation methods are analysis and tests, and usually both are needed to complete the validation process. Analysis is very effective tool to validate simple systems thoroughly, but a complete analysis can be ineffective against failures of modern programmable electronics. Large programmable systems can be so complicated that a certain strategy in the validation process is necessary to keep the resources required reasonable. A good strategy is to start as early as possible and at the top level (system level). It is then possible to determine the safety critical parts by considering the safety requirements, categories (according to EN 954), safety integrity levels (according to IEC 61508), and the structure of the system. The critical parts are typically parts that the system rely on and which have some properties which cannot be seen clearly at the top level.

A newly arising problem is that large programmable systems are becoming difficult to realise and the analysis is often difficult to understand. Figures can often illustrate the results of the analysis better than huge tables. However, there is no all-purpose excellent illustrating method, but the analyser needs to draw figures so that the main subject is well brought out.


Preface

STSARCES, the Standards for Safety Related Complex Electronic Systems project, is funded mainly by the European Commission SMT (Standards, Measurement and Testing) programme (Contract SMT 4CT97-2191). Work package 3.1 is also funded by the Finnish Work Environment Fund, Nordtest, and VTT. The project aim is to support the production of standards for the functional safety sector of control systems in machinery. Some standards are already available and industry and research institutes have their first experiences in how to apply the standards. Harmonisation of methods and some additional guidelines to show how to apply the standards are needed since the methods for treating and validating safety related complex systems are complicated and not particularly detailed. This report introduces the results of work package 3.1: Validation by analysis. The final format of the report was achieved with help, discussions and comments from Jarmo Alanen, Risto Kuivanen, Risto Tiusanen, Marita Hietikko, Risto Tuominen and partners from the consortium.

The following organisations participated in the research programme:

-     INERIS (Institut National de l’Environnement Industriel et des Risques, of France)

-     BIA (Berufsgenossenschaftliches Institut fur Arbeitssicherheit, of Germany)

-     HSE (Health & Safety Executive, of United Kingdom)

-     INRS (Institut National de Recherche et de Sécurite, of France)

-     VTT (Technical Research Centre, of Finland)

-     CETIM (Centre Technique des Industries Mecaniques, of France)

-     INSHT (Instituto Nacional de Seguridad e Higiene en el Trabajo, of Spain)

-     JAY (Jay Electronique SA, of France)

-     SP (Sveriges Provnings – ach Forkningsinstitut, of Sweden)

-     TUV (TUV Product Service GMBH, of Germany)

-     SICK AG (SICK AG Safety Systems Division, of Germany)

The research programme work-packages were assigned as :

- Work-package 1: Software safety (leader – INRS)

WP 1.1 Software engineering tasks : CASE tools (CETIM)

WP 1.2 Tools for software faults avoidance (INRS)

- Work-package 2 : Hardware safety (leader – BIA)

WP 2.1 Quantitative analysis (BIA)

WP 2.2 Methods for fault detection (SP)

- Work-package 3 : Safety validation of complex components (leader – VTT)

WP 3.1 Validation by analysis (VTT)

WP 3.2 Intercomparison white-box/black-box tests (INSHT)

WP 3.3 Validation tests (TÜV)

- Work-package 4 : Link between the EN 954 and IEC 61508 standards (leader – HSE)

- Work-package 5 : Innovative technologies and designs (leader – INERIS)

Operational partners: Industrial (SICK AG and JAY) and test-houses (INERIS and BIA)

- Work-package 6 : Appendix draft to the EN 954 standard (leader – INERIS)

Operational partners : STSARCES Steering Committee and industrial partners

 


Contents

1        Introduction

2        VALIDATION PROCESS

2.1     The Need for Validation

2.2     Safety Validation

2.2.1       Validation Planning

2.2.2       Validation

3        SAFETY ISSUES RELATED TO COMPLEX COMPONENTS

3.1     Analysing Strategy

3.2     Complex Modules and Systems

3.2.1       Analysing Strategy for Modules and Systems

3.2.2       Safety Principles of Distributed Systems

3.3     Complex Components

3.3.1       Failure Modes for Complex Components

3.3.3       Safety Aspects

4        METHODS OF ANALYSIS

4.1     Common Analysis Methods

4.1.1       FMEA

4.1.2       FTA

4.2     Illustrating the Results of a Safety Analysis

4.2.1       The Need to Clearly Show the Results of the FMEA

4.2.2       Examples for Illustrating FMEA Results

4.2.3       Conclusions for Methods of Illustration

5        CONCLUSIONS


GLOSSARY

Bottom-up method/analysis/
approach

The analysis begins with single failures (events) and the consequences are concluded.

CAN-bus

Control area network; communication method which is common in distributed systems, especially, in mobile machines and cars.

Component level analysis

Analysis is made on level in which the smallest parts are components.

CPU

Central processing unit

FMEA

Failure mode and effect analysis

FTA

Fault tree analysis

Module level analysis

Analysis is made on level in which the smallest parts are modules (subsystems).

SIL

Safety integrity level (IEC 61508)

System level analysis

Analysis is made on high level in which the smallest parts are subsystems.

Top-down method/analysis/
approach

The analysis begins with top events and the initiating factors are concluded.

 


1  Introduction

During the 1980’s, it was realised that it was not possible to thoroughly validate complex programmable electronic components and this resulted in complex electronic components not being used in safety critical systems. However, complex components make it possible to economically perform new complex functions without using many extra components, therefore the possibility of using complex programmable components to also perform safety functions increased. The methods for validating complex systems have developed significantly and they still continue to develop, and as a result, there are currently methods to validate control systems that include complex components.

Complicated integrated circuits and programmable circuits are considered as complex components, however, small devices like sensors or motor control units can be called complex components when the observer has a system point of view. The component is usually a part, which is not designed by the system designer and is bought as a whole; therefore, it is the smallest part that the system designer is controlling. This study is considering the analysis of complex components from different points of view and, therefore, the concept of complex components has several meanings.

Complex components within safety related systems are becoming increasingly common. One reason for this trend is that in general, systems are getting more and more complex and the monitoring and safety functions required are also complicated, therefore more complex control systems are required. This results in very complex systems, where the structures and the functions are difficult to understand and it can be a major problem for the validators. Desired features in safety systems are certain levels of redundancy and diversity, but they make the systems even more complex and difficult to thoroughly understand.

Since the components and the systems are complex they tend to include design errors because it is very difficult to verify, analyse, and test the complete system. Another problem is that the exact failure modes of complex components can also be difficult to predict. The question is, ‘can people trust the complex safety systems?’ If a safety function fails it often causes dramatic consequences since people take higher risks when they feel they can trust the safety system, and it is therefore important for safety systems to perform their safety functions reliably. A validation process provides proof that a safety system fulfils its safety requirements. This report gives guidance on one part of the validation process - validation by analysis, and in particular considers, systems including complex components.

Although complex programmable components can be difficult to validate, they make it possible to perform new kinds of safety and monitoring functions, for example, programmable systems can monitor reasonability of inputs and complicated safety limits, whereas normally these functions would be laborious and expensive to perform with hardwired technology. Therefore complex programmable safety related systems are becoming more common in areas where they are economically competitive. The designer has to decide whether he can accept the risks programmable systems bring along whilst also utilising the possibilities they give.

2  VALIDATION PROCESS

2.1  The Need for Validation

In general, a validation process is made to confirm by examination and provision of objective evidence, that the particular requirements for a specific intended use are fulfilled. When validation is related to the safety-related parts of a control system, the purpose is to determine the level of conformity to their specification within the overall safety requirements specification of the machinery. [prEN 954-2  1999].

Carrying out a validation process can be a laborious task especially for complicated systems, which have got high safety demands. However, although the process can be laborious it is also necessary. Validation is often needed for the following purposes:

  • to prove to customers that the product is applicable for the intended purpose,
  • to prove to authorities that the product is safe and reliable enough for the intended purpose,
  • to prove to the manufacturer that the product is ready for the market,
  • to prove the reasons for specific solutions,
  • to provide documentation to help with future alterations of the product,
  • to prove the quality of the product.

The validation process has been growing to meet the common needs as the technology has developed. Simple systems can be analysed (FMEA) and tested (fault injection) quite thoroughly. Systems with moderate complexity can also be analysed quite thoroughly, but the tests cannot cover the whole system. Very complex systems cannot be completely analysed in detail and thorough tests are also not possible. A number of different methods are needed in the process. Analysis is required in at least the system level and the detailed component level, but also requirements related to different lifecycle phases have to be fulfilled. This means that attributes such as quality control, correct design methods and management become more important since most of the failures or errors are related to these kind of issues.

Confidence is a very important factor related to the validation process. The user of the validation documents has to trust the validation quality, otherwise the validation has no meaning. The validation activities are actually carried out to convince someone that the product is properly designed and manufactured. One way to increase the confidence is to perform the validation process according to existing requirements and guides, and to have objective experts involved in the validation process.

2.2      Safety Validation

The safety validation process consists of planning and the actual validation. The same process can also be applied for subsystems. A checklist or alternative guide is required in the process to include all the necessary actions for the safety validation plan.

The phases of the validation process are presented in

figure 1 . First, the validation plan is made according to known validation principles. Then the system is analysed according to the validation plan, the known criteria, and the design considerations. Testing is carried out according to the validation plan and the results of the analysis. All the phases have to be recorded in order to have reliable proof of the validation process and the documents to help future modifications.

Figure 1 Overview of the validation process [prEN 954-2 1999].

When following the figure it is possible to go back from one state to earlier state.

2.2.1  Validation Planning

The purpose of safety validation planning is to ensure that a plan is in place for the testing and analysis of the safety requirements (e.g. standards EN 954 or IEC 61508). Safety validation planning is also performed to facilitate and enhance the quality of safety validation. The planning shows the organisation and states in chronological order, the tests and verification activities needed in the validation process. A checklist is needed in the planning process in order to include all the essential analyses and tests into the safety validation plan. Such a checklist can be gathered from IEC 61508-1, prEN 954-2 or the Nordtest Method [Hérard et. al. 1999]. Large control systems may include separate subsystems, which are convenient to validate separately.

The main inputs for safety validation planning are the safety requirements. Each requirement shall be tested in the validation process and the passing criteria shall be declared in the plan. It is also important to declare the person(s) who makes the decisions if something unexpected happens, or who has the competence to do the validation. As a result, safety validation planning provides a guideline on how to perform safety validation.

2.2.2  Validation

The purpose of safety validation is to check that all safety related parts of the system meet the specification for safety requirements. Safety validation is carried out according to the safety validation plan. As a result of the safety validation, it is possible to see that the safety related system meets the safety requirements since all the safety requirements are validated. When discrepancies occur between expected and actual results it has to be decided whether to issue a request to change the system, or the specifications and possible applications. Also, it has to be decided whether to continue and make the needed changes later, or to make changes immediately and start the validation process in an earlier phase.

3  SAFETY ISSUES RELATED TO COMPLEX COMPONENTS

3.1  Analysing Strategy

The traditional way to analyse an electronic control system is to apply a bottom-up approach by using Failure Mode and Effect Analysis (FMEA, see 4.1.1). The method is effective and it reveals random failures well. The method is good for systems, which can be analysed thoroughly. Systems are, however, getting more complex and so the top-down approach is getting more and more applicable. A top-down approach like Fault Tree Analysis (FTA, see 4.1.2) helps to understand the system better and systematic failures can also be better revealed. The top-down approach also reveals well failures other than just random failures, which are better revealed by the bottom-up approach.

Another development due to increasing system complexity has been analysis on a module by module basis rather than on a component by component basis. Non-programmable electronic systems with moderate complexity can and should be analysed on a component by component basis and, in some cases (large systems), also on a module by module basis to cover complicated module/system level errors. To analyse complex programmable systems at the component by component basis by using bottom-up analysis (FMEA) would require a lot of resources and yet the method is not the best way to find certain failures. The system functions can be better understood at a module or system level than at a component level and so the quality of the analysis can be improved in that part.

The system analysis could be started from the bottom (not preferable) so that first each of the small subsystems are analysed and finally the system as a whole. In the so-called V-model, the system is designed from the top to the bottom (finest details) and then validated from the bottom to the top. The analysis should, however, be made as soon as possible during the design process in order to minimise possible corrections. Thereby the system should be analysed by starting from the top at system/module level. Then detailed component level analysis can be made in modules which were found critical at module level analysis. This method reduces the resources needed in the analysis. Table 1 illustrates the analysis activities at different levels.

Table

1  System, module and component level analysis and some aspects related to bottom-up analysis and top-down analysis.

System level

·     Bottom-up analysis (e.g. FMEA) is useful and it reveals random failures well.

·     Top-down method (e.g. FTA) illustrates the failures well, reveals sequential failures and human errors. Useful when the amount of top events is small.

·     At system level (without details) the analysis can often be made thoroughly.

·     Validated modules can be used to ease the analysis.

Module level

·     Bottom-up analysis is useful and it reveals random failures well.

·     Top-down method illustrates the failures well, reveals sequential failures and human errors. Useful when the amount of top events is small.

·     Some hints for analysing standardised systems can be found.

Component level

·     Bottom-up analysis can be laborious, but necessary for analysing low complexity systems and systems with high safety demands.

·     Top-down method or a mixture of top-down and bottom-up methods can be reasonable for analysing complex components, or systems with complex components.

·     Usually the system cannot be analysed thoroughly at system level.

The common analysing strategy is bottom-up analysis on different levels, but it has some weak points, which have to be taken care of separately. The basic idea in FMEA is to analyse the system so that only one failure is considered at a time. However, common cause failures can break all similar or related components at the same time, especially if there is a miscalculation in dimensioning. These kind of failures have to be considered separately and then added to the analysis. If the safety demands are high, also sequential failures have to be considered carefully since FMEA does not urge the analyser to do so.

More and more often the bottom-up analysis tends to become too massive and laborious, so tactics are needed to minimise the work and amount of documentation. One strategy can be to document only critical failures. Another strategy can be to start the analysis on the most questionable (likely to be critical) structure and then just initially document the items and effects; the failure modes and other information are added only to critical failures. The FMEA table may then look rather empty, but it results in less work. 

3.2  Complex Modules and Systems

3.2.1  Analysing Strategy for Modules and Systems

Many complex components are at the present time too complex to be validated thoroughly (with reasonable resources) and programmable components are becoming even more complicated and specially tailored (e.g. ASIC, FPGA). This means that, for safety purposes, the systems including complex components have to cope with faults by being fault tolerant or by activating automatic safety functions; this can be achieved by concentrating on the architecture. Architecture can be best understood on a system/module level and, therefore, architectural weaknesses can also be conveniently revealed through a system or module level analysis . Additionally, on complex systems there are nearly always some design errors (hardware or software), which can be difficult to find at component level. At module level the analysis can be made thoroughly. One factor supporting module level analysis is the quality of the analysis. An increasing number of components in a unit to be validated corresponds to a reduction in the efficiency of the analysis. Although module level analysis is becoming more and more important, one cannot neglect the analysis at component level because certain failures can be better seen at the component level. A resource saving strategy is to concentrate on critical failures at all levels of analysis. The category (according to EN 954) or the SIL (according to IEC 61508) affects the detail to which the analysis should be performed.

Usually both system/module level and component level analyses are needed in validating complex systems. Analyses on system/module level are performed in order to determine the critical parts of the system, and component level analyses are carried out for those parts of the system.

For module level analysis there are some references which give hints for failure modes of modules and for some standardised systems some advice for analysis can be found, CAN-bus is considered in appendix A as an example. For system/module level analysis, failure modes resemble failures at component level, however, the analyser has to consider the relevant failure modes.

3.2.2      Safety Principles of Distributed Systems

Distributed systems are increasingly used in machinery. Distribution is normally realised by having multiple intelligent modules on a small area communication network. Each module may have several sensor inputs and actuator outputs. The trend, however, is to implement distributed systems with even smaller granularity, i.e. to have a network interface on single sensors or actuators.

Distribution helps to understand and grasp large systems better as the amount of wire is  reduced

[1] and the structure of the cabling is more comprehensible. Therefore, the number of mounting faults in a large system is most probably lower compared to a traditional centralised system. Hence, in regard to understandability and cabling simplicity, distributed systems introduce inherent dependability to some extent. Furthermore, in distributed systems it is easier to implement elaborate and localised diagnostic facilities as the system consists of several CPUs capable of doing both off-line and on-line monitoring and diagnostics

[2] The modularity of a distributed system also gives possibilities to implement 'limb home' capabilities in case of a failure in part of the system. These inherent characteristics of distributed systems increase the dependability of the system and therefore also affect the safety of the system in a positive way. However, distributed systems are always complex and hence bring along new kinds of safety problems and aspects, like:

  • communication sub-system faults and errors (faults in cables, connectors, joints, transceivers, protocol chips or in the communication sub-system software; transient communication errors)
  • communication sub-system design failures (like excessive communication delays, priority inversion and 'babbling idiot')
  • system design failures (like scheduling errors or the nodes in the system may have a different view of time of the system state or of the state variables)

When designing distributed systems and busses, various safety techniques can be used to achieve the required safety and dependability level. There is not one ideal solution for all applications.

The analysing strategy described in section 3.1 can also be applied to complex components including distributed systems. In addition to this, distributed systems may have several architectural safety features and techniques for detection, avoidance and control of failures. Such safety principles are described in sections 3.2.2.1 and 3.2.2.2.

3.2.2.1            Architectural Principles

Architectural principles crucially affect the safety performance of a safety critical distributed system. There are already some safety-validated busses, which are used for safety critical communication, and such systems always have redundancy and component monitoring. All fieldbusses have some kind of signal monitoring to reveal most of the errors in messages. In some cases the bus standard forces the use of certain architectural solutions. However, on higher level there are more architectural alternatives since large systems may have several different busses that are all used for adequate signalling. This section lists several architectural safety principles or techniques for distributed systems, and brings out the aim and description of each safety technique.

The architectural principles also have to be taken into account during the validation process. Distributed systems are so complex that many kinds of undetected component failures are possible. This means that the architecture of the system has to support fault tolerance and provide a way to force the system into a safe state in case of a failure. Table 2 lists some architectural principles, which should be considered both in the design and validation processes.

Table 2. Architectural principles.

Method

Applicability

Hardware topology

Hardware topology affects the safe performance of the system. It should be chosen so that in the weakest point the consequences are minimised.

  • Redundant hardware topology detects failures by comparing signals between busses (See IEC 61508-7, A.7.3)
  • Star topology can operate even if one node is faulty except if it is the node in the star point
  • Ring topology can operate even if there is a failure between two nodes [Kuntz, W et al. 1993]
  • Redundant ring topology can operate in case of multiple failures in the communication system [Kuntz, W et al. 1993]
  • Power supply cabling star topology can supply power from the power source to other nodes even if one node fails or it’s power cables break- provided that each node is fused separately.

Galvanic isolation [DeviceNet specification]

Galvanic isolation prevents different potential levels on distinct nodes to cause unwanted currents between the nodes.

  • The communication lines and power supplies of the nodes are galvanically isolated with the help of optoisolators and DC/DC converters.

Use of a dead man switch line among the bus cables [M3S Specification]

Dead man switch provides information to all nodes that the operator of the machine or vehicle is still controlling the system. 

  •  

    A single wire passing information from the dead man switch is included in the bus cabling and connectors. A total break in the bus cable should correspond to the situation that the operator is not controlling the system.

Use of power up/down line among bus cables [M3S Specification]

Use of a power up/down line gives a power up signal to all the nodes simultaneously and a power down signal in the case of power down or in an emergency.

  • A single wire passing power up/down information is included in the bus cabling and connectors. A total break in the bus cable should correspond to the situation that the power down signal is active.

Single wire communication [Pers 1992]

Single wire communication offers capability to communicate with single wire in case of malfunction in the other wire when twisted pair communication is used.

  • With the help of special transceiver circuitry communication can be continued with reduced signal-to-noise ratio in case of interrupt or short in the other twisted pair wire.

Redundant nodes [Kopetz 1994]

Redundant nodes enable continuous operation in the case of a node failure.

  • Safety relevant nodes may be replicated to provide a backup node in the case of a failure, which may lead to an accident.

Global clock [Gregeleit et al. 1994]

Global clock provides consistent view of time on all nodes.

  • All the nodes of the system should keep an accurate copy of the system time in order to be able to perform time-synchronised operation.

Shadow node [Kopetz 1994]

Shadow node provides a backup for services required in the system.

  •  

    A single node is arranged to provide the services of an impaired node or nodes. The shadow node works as a backup for multiple nodes.

Time triggered communication system

[Kopetz 1994]

[Lawson et al. 1992]

Assures timeliness of state variables.

  •  

    To implement hard real-time control systems, an event based communication system may not be adequate to guarantee the timeliness of the state variables. In the time triggered approach, communication is scheduled in the design phase of the system prior to the operation. All activities on the bus as well as on the nodes are triggered by time not by events. Hence, the system is predictable and not controlled by stochastic events.

3.2.2.2            Detection, Avoidance and Control of Failures

This section includes several failure detection, avoidance and control techniques for distributed systems (Table 3). The aims and the main features of each technique are brought out. When the person who analyses the distributed system recognises any of the safety techniques concerning failure detection, avoidance or control, he should observe, what the capabilities of each technique are in order to enhance the safety of the system.

Table 3. Failure detection, avoidance and control techniques.

Method

Applicability

Cyclic Redundancy Check (CRC)

Errors in received data can be detected by applying CRC over the transferred data.

  • The transmitter appends a CRC code to the end of the data frame. The receiver should get the same CRC value as a result when applying the same CRC algorithm (polynomial) to the data of the frame. If the CRC value calculated by the receiver differs from that of the one received in the transmitted frame, the data is regarded as erroneous.

Communication error recovery by retransmission [Kopetz 1994]

[ISO 11898]

Retransmission ensures reliable transfer of data in case of transient failures during transmission.

Messages that are discarded by some of the nodes are retransmitted.

Note! This may cause non-deterministic communication latencies if there is no way to control the retransmission process.

Message replication without hardware redundancy [Kopetz 1989] (see also IEC 61508-7, A.7.6)

 

Replicating the message by sending it twice or more allows loss of N-1 messages if the message is sent N times.

  • Always sending the message twice or more sequentially over a single bus allows deterministic timing compared to that of retransmission in case of failure. If the message is sent twice and the receiver receives the two messages but with different data, both messages must be discarded. If the number of replicated messages is for example three, 2 out of 3 voting can be used.

Monitoring shorts or open circuits of the bus wires [Pers 1992], [Tanaka et al. 1991]

Monitoring shorts or open circuits activates corrective or safety functions in case of a total communication blackout.

  • The bus wires are monitored by hardware and signalled to software in case of a malfunction.

Monitoring bus load [DIN 9684 Teil 3 ]

Monitoring bus load enables bus traffic to be restricted dynamically in case of excessive bus load.

  • The message rate is monitored by software and if the rate is too high, the nodes are forced to apply inhibit times in their transmission processes.

Monitoring presence of relevant nodes [CiA/DS301 1999]

Monitoring presence of relevant nodes expose accidental drop-out of a node.

Some of the nodes or all nodes monitor the presence of relevant nodes with the help of periodic messages.

Restricting transmission period of messages [DIN 9684 Teil 3 ]

Restricting transmission period disables excessive bus load and thus guarantees proper message latencies for all messages.

  • All the nodes of the system are forced to apply specific transmission rate rules in their transmission processes.

Babbling idiot avoidance [Tindel et al. 1995]

Babbling idiot avoidance prevents a single or several nodes from sending erroneously a lot of (high priority) messages, thus gaining exclusive bus access.

  •  

    The communication software of a node should not be able to enter such a mode. Hence the software should be carefully designed and analysed in order to avoid this type of situation.  Runtime monitoring can be done  together with hardware and software.

Priority inversion avoidance

Messages are controlled so that a low priority message cannot prevent a high priority message from entering the bus.

  • This type of situation occurs locally on a node if a low priority message enters the bus contest first and blocks the higher priority message. The situation can be avoided by software and sophisticated hardware, or by time triggered message scheduling.

Message scheduling based on inhibit times [Fredriksson 1995]

Message scheduling based on inhibit times ensures timeliness of the relevant messages on the communication bus.

  •  

    Messages are scheduled by introducing inhibit times to communication objects, thus guaranteeing bus access for low priority message. This method can be used in event based bus systems.

Time synchronisation [Fredriksson 1995], [Lawson et al. 1992]

[Kopetz 1994]

Time synchronisation ensures timeliness of all the messages on the communication bus.

  • Messages are scheduled by synchronising the transmission of a message with respect to time.

Time stamping

Time stamping enables the evaluation of the validity of the data or helps to recognise varying communication delays.

  • The arrival time of a message is stored.

Consistency control of state variables [ISO 11898]

Consistency control of state variables ensures that there is no discrepancy between data (and system state) on different nodes.

  • The communication protocol should be such that all the nodes accept the correct data from the bus at the same time. If one of the nodes receives incorrect data, all the nodes should discard the data.

Configuration check

Configuration check ensures that correct hardware and software versions are used on the nodes of the system.

  • A single node (master) or multiple nodes may check in start-up, with a help of a request message, if the relevant nodes use the presumed hardware and software and parameter versions.

End-to-end CRC [Kopetz 1994]

End-to-end CRC can be used to detect data errors beyond bus communication errors

  • Normal CRC checks the data integrity between message transmission and receiving, but the end-to-end CRC also checks data integrity from sensor measurement to message transmission and from message arrival to actuation.

Message numbering

 

Message numbering ensures correct assembly of the received stream of segmented data or enables discarding of duplicated messages.

  • Consecutive messages are numbered, in order to be able to detect discontinuities in the data block or replication of data segments. Numbering can often be accomplished only with a single bit (toggle bit).

3.3      Complex Components

Complex components hold more than 1000 gates and/or more than 24 pins [EN 954-2 1999]. The definition only provides a rough estimate as to which component could be complex. The amount of potentially different random failures in such a component is large. Only the number of combinations two out of 24 is 276 and this is just the amount of simple short circuits in a small (according to the definition) complex component. Complex components have several failure modes. If one blindly analyses all combinations then this would result in many irrelevant failures. Failure exclusions are needed in order to focus the resources on the critical failures.

3.3.1   Failure Modes for Complex Components

Table 4 shows the random failures of the complex components according to prEN 954-2. The table clearly shows all failures related to the input or output of the circuit. The exclusions column shows if it is possible to ignore certain type of failures; so “No” means that the failure mode has to be considered in all cases.

Table 4. Faults to be considered with programmable or complex integrated circuits. [prEN 954-2]

Faults considered

Exclusions

Faults of part or all of the function (see a and b)

The fault may be static, change the logic, be dependent on bit sequences

No (see a)

Open-circuit of each individual connection

No

Short circuit between any two connections (see c)

No

Stuck-at-fault; static ”0” and ”1” signal at all inputs and outputs either individually or simultaneously (see c and d)

No

Parasitic oscillation of outputs (see e)

No

A fault exclusion can be justified, if such an oscillation cannot be simulated by realistic parasitic feedback (capacitors and resistors).

Changing value (e.g. in/output voltage of analogue devices)

No

Undetected faults in the hardware which are unnoticed because of the complexity of the integrated circuit (see a and b)

No

Remarks

a - Faults in memory circuits and processors shall be avoided by self-tests, e.g. ROM-tests, RAM-tests, CPU-tests, external watchdog timers and the complete structure of the safety related parts of the control system.

b - The faults considered give only a general indication for the validation of programmable or complex integrated circuits

c - Because of the assumed short-circuits in an integrated circuit, safety signals need to be processed in different integrated circuits separated when redundancy is used.

d - i.e. short circuit to 1 and 0 with isolated input or disconnected output.

e - Frequency and the pulse duty factor dependent on the switching technology and the external circuitry. When testing, the driving stages in question are disconnected

However, the basic failures to be considered in the analysis can be simple compared to the actual failures that can happen inside the component. Such component specific failures can for example be, a failure in the microprocessor register or a failure in a certain memory location.

In the draft IEC 61508-2, failures typical to certain component technology (e.g. CPU, memory, bus) are considered instead of the pins (input, output etc.) of the component. A single component can include several technologies.

Table 5 shows some component dependent failures. The table is gathered from draft IEC 61508-2 and the listed failure modes need to be considered when the diagnostic coverage is high

[3]

.

Table 5. Faults or failures of complex components

           

Component

           

Faults or failures to be detected

           

 

CPU

 

  • register, internal RAM

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines- all these for data and addresses;

dynamic cross-over for memory cells;

no wrong or multiple addressing

  • coding and execution including flag register

no definite failure assumption;

  •  

    address calculation

no definite failure assumption;

  • program counter, stack pointer

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines.

Bus

 

  • general

time out;

  • memory management unit

wrong address decoding;

  • direct memory access

all faults which affect data in the memory; wrong data or addresses; wrong access time;

  • bus-arbitration (see a)

no or continuous or wrong arbitration.

Interrupt handling

no or continuous interrupts;

cross-over of interrupts.

Clock (Quartz)

sub- or superharmonic.

Invariable memory

all faults which affect data in the memory.

Variable memory

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines - all these for data and addresses; dynamic cross-over for memory cells;

no wrong or multiple addressing.

Discrete hardware

 

  • digital I/O

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines;

drift and oscillation.

  • analogue I/O

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines;

drift and oscillation.

  • power supply

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines;

drift and oscillation.

Communication and mass storage

all faults which affect data in the memory;

wrong data or addresses; wrong transmission time;

wrong transmission sequence.

Electromechanical devices

does not energise or de-energise; individual contacts welded, no positive guidance of contacts, no positive opening.

Sensors

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines;

drift and oscillation.

Final elements

Stuck-at faults, stuck-open, open or high impedance outputs, short-circuits between signal lines;

drift and oscillation.

a -   Bus-arbitration is the mechanism for deciding which device has control of the bus.

b -   "stuck-at" is a fault category which can be described with continuous " 0" or " 1" or "on" at the pins of a component.

3.3.3   Safety Aspects

It is obvious that the person validating the system has to decide which possible failures have to be documented. Usually an expert can see from the circuit diagram which failures can cause severe effects, but generic rules how to neglect some failures can be hard to find. For some standardised technologies it is possible to find in advance the critical failures to consider. This minimises the amount of failures to be considered, and improves the quality of the analysis.

Systematic failures related to complex components and complex systems become even more obvious. There are errors in most commercial programs (usually more than 1/1000 code lines), but usually the errors appear relatively seldomly [Gibbs 1994]. Hardware design failures are probable in complex components and especially in tailored components. Consequently, in complex systems, systematic errors are more common than random failures. The whole system has to be validated and both systematic and random failures have to be considered.

Appendix A shows as an example what kind of failures are related to CAN-bus. Most of the described failures can be used with other types of distributed systems, but the analyser has to know the special features related to the system that is under consideration.

4     METHODS OF ANALYSIS

4.1    Common Analysis Methods

Different analysing techniques are needed in different phases of the design. At first, hazard identification and risk analysis techniques are useful, for example techniques such as, “Hazard and Operability study (HAZOP)”, “Preliminary Hazard Analysis (PHA)”, and techniques which use hazard lists. There are many techniques for software verification and for probabilistic approach to determine safety integrity. In software verification the software errors are searched systematically by using for example data flow analysis, control flow analysis, software FMEA, or sneak circuit analysis (see IEC 61508-7). In probabilistic approach, it is expected that the verification process has already been carried out, and statistical values are used to calculate a probabilistic value for executing the program correctly. There are also methods for verifying components, such as ASIC, designs. This chapter, however, is concentrating on analysis techniques which are used in analysing control systems.

There are two basic types of techniques for analysing systems:

  • Top-down methods (deductive), which begin with defined system level top event(s) and the initiating factors are concluded.
  • Bottom-up methods, which begin with single failures and the system level consequences are concluded.

Both analysing techniques have their advantages and disadvantages, but ultimately the value of the results depend on the analyst. The techniques can, however, make the analyst more observant to detect certain type of failures or events. Bottom-up methods tend to help the analyst to detect all single failures and events, since all basic events are considered. Top-down methods tend to help the analyst to detect how combined effects or failures can cause a certain top event. Top-down methods are good only if the critical events have to be analysed. Bottom-up methods are good if the whole system has to be analysed systematically. The basic demand is that the analysing technique must be chosen so that all critical events are to be detected with the minimum duty. Top-down methods give an overview of the system, show the critical parts, systematic failures and human factors. Bottom-up methods consider the system systematically and many failures are found.

A combined bottom-up and top-down approach is often likely to be an efficient technique. The top-down analysis provides the global picture and can focus the analysis to areas that are most significant from the overall performance point of view. Bottom-up methods can then be focused on the most critical parts. Bottom-up analysis aims at finding ”the devil that hides in the details”.

The most important point after choosing the analysing method is to concentrate on the weak points of the method, and this can be done by using strict discipline. The weak points of FMEA and FTA are described in chapters 4.1.1 and 4.1.2.

4.1.1      FMEA

When the safety and performance of a control system is assessed, the Failure Mode and Effect Analysis (FMEA) is the most common tool used. An international standard (IEC 812. 1985) exists to defines the method. FMEA is a bottom-up (inductive) method, which begins with single failures, and then the causes and consequences of the failure are considered. In the FMEA, all components, elements or subsystems of the system under control are listed. FMEA can be done on different levels and in different phases of the design which affects the depth of the analysis. In an early phase of the design, a detailed analysis cannot be done, and also some parts of the system can be considered so clear and harmless that deep analysis is not seen as necessary. However, in the critical parts, the analysis needs to be deep and it should be made on a component level. If the safety of the system really depends on a certain complex component, the analysis may even include some inner parts of the component, for example this can mean software analysis or consideration of typical failures related to a certain logical function.

In prEN 954-2 there are useful lists for FMEA on failures of common components in different types of control systems. The standard gives probable component failures and the analyst decides if the failures are valid in the system considered or if there are other possible failures. If functional blocks, hybrid circuits or integrated circuits are analysed then the list in prEN 954-2 is not enough. Additionally, systematic failures and failures typical to the technology (microprocessors, memories, communication circuits etc.) have to be considered since those failures are more common than basic random hardware failures.

FMEA is intended mainly for single random failures and so it has the following weak points:

  • It does not support detection of common cause failures and design failures (systematic failures).
  • Human errors are usually left out; the method concentrates on components and not the process events. A sequence of actions causing a certain hazard are difficult to detect.
  • Sequential failures causing a hazard can also be difficult to detect, since the basic idea of the method is to consider one failure at a time. If the analysis is made with strict discipline it is also possible to detect sequential failures. If a failure is not detected by the control system, other failures (or events) are studied assuming the undetected failure has happened.
  • Systems with a lot of redundancy can be difficult to consider since sequential failures can be important.
  • The method treats failures equally, and so even failures with very low probability are considered carefully. This may increase the workload and cause a lot of paper-work.
  • In a large analysis documentation, it can be difficult to identify the critical failures. It can be difficult to see which failures have to be considered first, and what the best means are to take care of the critical failures.

However, FMEA is probably the best method to detect random hardware failures, since it considers all components (individually or as blocks) systematically. Some critical parts can be analysed on a detailed level and some on a system level. If the method seems to become too laborious, the analysis can be done on a higher level, which may increase the risk that some failure effects are not detected.

The FMEA table always includes the failure modes of each component and the effects of each failure mode. Since the analysis is carried out to improve the system or to show that the system is safe or reliable enough, some remarks and future actions are also always needed in the table. Severity ranking is needed to ease the comparison between failure modes, and therefore it helps to rank the improvement actions. When the analysis includes criticality ranking it is called Failure Mode, Effects, and Criticality Analysis (FMECA). The criticality and probability factor can be a general category, like; impossible, improbable, occasional, probable, frequent, or  exact failure probability values can be used. In many cases exact values are not available because they are difficult to get, or they are difficult to estimate. The circumstances very much affect the probability of a failure.

Table 6 shows an example of a FMECA sheet.

Table 6. Example sheet of a FMECA table (see figure 2).

Safety Engineering

                                        FMECA

System: Coffee mill

Subsystem:

Page:

Date:

Compiled by:

Approved by:

Item and function

Failure mode

Failure cause

Failure effects

Detection method

Probability & Severity

Remarks

Switch

Short circuit

- Foreign object, animal or liquid

- isolation failure (moisture, dirt, ageing

- bad, loose connection, vibration, temp. changes

- overheating (lack of cooling)

a) Coffee mill cannot be stopped by actuating the switch. Someone may cut his finger.

 

b) When the plug is put in, the coffee mill starts up although the switch is in the off state. Someone may cut their finger.

Coffee mill does not stop. Switch may become dark.

a) 3C

 

b) 4C

 

 

The coffee mill can be stopped by un-plugging it.

Switch

Open circuit

- The switch mechanism fails to operate, mechanism jams, breaks

The coffee mill cannot start up.

 

1D

 

4.1.2      FTA

Fault Tree Analysis (FTA) is a deductive technique that focuses on one particular accident or top event at a time and provides a method for determining causes of that accident. The purpose of the method is to identify combinations of component failures and human errors that can result in the top event. The fault tree is expressed as a graphic model that displays combinations of component failures and other events that can result in the top event. FTA can begin once the top events of interest have been determined. This may mean preceding use of preliminary hazards analysis (PHA) or some other analysis method.

The advantages of FTA are typically:

  • It can reveal single point failures, common cause failures, and multiple failure sequences leading to a common consequence.
  • It can reveal when a safe state becomes unsafe.
  • The method is well known and standardised.
  • The method is suitable for analysing the entire system including hardware, software,  any other technologies, events and human actions.
  • The method provides a clear linkage between qualitative analysis and the probabilistic assessment.
  • It shows clearly the reasons for a hazardous event.

The disadvantages of FTA are typically:

  • It may be difficult to identify all hazards, failures and events of a large system.
  • The method is not practical on systems with a large number of safety critical failures.
  • It is difficult to introduce timing information into fault trees.
  • The method can become large and complex [Ippolito & Wallace 1995].
  • The method gives a static view to system behaviour.
  • The method typically assumes independence of events, although dependencies can be present; this affects the probability calculations. The dependencies also increase the work.
  • It is difficult to invent new hazards which the participants of the analysis do not already know.
  • Different analysts typically end up with different representations, which can be difficult to compare.

Quite often probability calculations are included in the FTA. FTA can be performed with special computer pro­grams, which easily provide proper documentation. There are also programs, which can switch the method. The analysis only needs to be fed once, and the program shows information in the form of FTA or FMEA. Figure 2. shows an example of one hazardous event in an FTA format. The figure on the right illustrates the system (Hammer 1980, IEC 1025 1990).

Figure 2. Example shows as a fault tree analysis sheet how some basic events or failures can cause one hazardous top event. The analysed coffee mill is introduced on the top right corner [Hammer 1980].

4.2    Illustrating the Results of a Safety Analysis

4.2.1      The Need to Clearly Show the Results of the FMEA

Complex components are increasingly used in circuits and even one complex component can make the system very complicated. Complex components allow functions, which are difficult to implement with traditional electronics, and they also make communication easier. Complex programmable components make it possible to construct large and complicated systems, which are also difficult to analyse.

When the system is large and an analysis is made in the component level, the analysis requires a lot of work and produces a lot of paper. Control systems with complex components are typically large and so the FMEA project is also large. When the analysis is large it is difficult to verify and to take advantage of, the critical events can easily be lost in the huge amount of information, and it is also difficult to find the essential improvement proposals. Therefore a method is required, which has the advantages of the FMEA, but which does not come with the high level of paper-work.

The method needs to be simple since people tend to avoid complicated new methods. Something familiar is also needed so that the results are easy to understand. Quite often when the FMEA is performed the analyst draws his markings on the circuit diagram to help him to understand the functions of the diagram and also to confirm that all parts of the circuit diagram are considered. This kind of method can also be useful for illustrating results of the FMEA. However, since usually the main purpose of the method is to help the analyst, a good discipline is needed to also make the markings readable for other people.

VTT has studied, in parallel with FMEA, some graphical techniques which can express the key results of the FMEA more effectively than text and tables. The analysis of a power distributing system of a large facility is used here as an example. A power failure could cause severe damages in some occasions. Critical failures were found in FMEA, but since FMEA was quite large, i.e. over 200 pages, the key information was lost in the tables. Therefore a technique was needed to point out the essential results of the analysis.

The analysis was carried out in the system level and only some parts were analysed in the component level. Three different techniques were used in illustrating the FMEA results. First, a fault tree analysis format was used for illustrating some top events; next, flux diagrams of the energy flows were used to illustrate the critical paths; finally, a circuit diagram with severity ranking numbers and colours was used to point out the most critical parts of the system. The probability factor was not shown in the figures, but it would not be difficult to add it into the figures. The techniques were only compared in this single example case, but some general results can also be adapted to other systems.

4.2.2      Examples for Illustrating FMEA Results

FTA for illustrating FMEA results

Fault tree analysis (FTA) was only carried out for some top events, and the main purpose in this case was to point out the critical failures discovered in the FMEA (Figure 3). Only some top events were studied since the overall number of such events was large. There were several facilities involved and there were also several top events for each facility, and this meant that many figures were needed. Depending on the operating mode some specific failures caused different top events. Since some single failures were needed in several figures the amount of information increased. As a result the amount of information became so large that it was difficult to find the essential information from the FTA figures.

The result was that FTA was not, in this case, a good method for illustrating the FMEA results. FTA is good technique when the number of top events is not large and there are no dependencies between the top events. The advantage of the FTA is that it is a clear, well-known and well-documented technique. There are also software tools available for drawing the fault trees. If the tool can convert (FMEA) tables and (FTA) trees, it is especially useful.

Figure 3. FTA format figure for one top event.

Flux diagrams for illustrating FMEA results

Flux diagrams were drawn to illustrate criticality of the energy flows. Energy flow meant that the power was switched on and the critical function was able to proceed. The diagram was improvised to illustrate the different failure criticality properties of the energy flows. In a flux diagram, different arrows indicated different criticality levels. The same energy flow or facility did not always have the same criticality level, but the criticality did depend on the system operation mode (processing activities with different facilities). Therefore the figures were able to point out the critical failures during a certain operation mode. One operation mode period did last anywhere between a few minutes to a number of weeks. It was important to know the risks during a certain operation mode, and so several diagrams were needed to illustrate the criticality of a specific energy flow or facility. The technique was new and therefore each marking needed explanation. The meanings of the arrows used were not obvious and although there were only approximately six different arrows used, reading the results required some experience. Figure 4 aims to show which functions are needed during a certain operation and the severity of the failure of each energy flow path.

Figure 4. The flux diagram shows which functions are needed during a certain operation and the criticality of each energy flow path. NA means not applicable and that function is not needed in the illustrated operation.

Circuit diagrams with a ranking system to illustrate FMEA results

The starting point in drawing the circuit diagram with a ranking system was the main circuit diagram of the system. The diagram was redrawn into a file in order to make quick changes easier. Different colours were then used to indicate the criticality of each circuit or equipment. There were five severity levels, but only the three highest levels were assigned a distinct colour as the lowest levels were considered non-significant. For some parts of the system a number was also used, which indicated the severity of failure in power supply. The FMEA did consist of over 200 pages, but the key results could be put into one colourful diagram. The people in the company were familiar with the diagram since it resembled the original system level circuit diagram of the system. In this case “the circuit diagram with a ranking system” technique was quick to use and it was capable of illustrating effectively the key results of the FMEA.

Although the illustration technique was used here for a large system, it can also be useful for smaller systems, especially during the design process. An expert can quickly colour the circuit diagram and add reference numbers (refers to text) and criticality numbers into the diagram. The colours and numbers can be drawn with pencils, but the changes may be messy to accomplish therefore graphic files can be more useful.

Figure 5 shows an example of this simple method. In the figure a high ranking number means high consequence severity and * means that the severity is low (i.e. consequences of failure are insignificant). The numbers are related to a certain operational unit and beside the number is the name of the unit. If there is a shortage of space, reference numbers can be used. The severity of consequences (max.) related to failure of switches or cables is expressed by using different colours.

Figure 5. An example of the circuit diagram representation.

The advantages of the technique are:

  • the technique is simple and it is easy to use
  • it is easy to improvise new adequate markings
  • the technique is quick to use especially when the circuit diagrams are easily available
  • the technique shows in a very compact form the risks of the analysed system
  • the technique displays well the most critical risks
  • electricians feel that the technique is familiar since circuit diagrams are used.

The disadvantages of the diagram are:

  • it is hard to introduce new markings and all markings must be explained each time
  • people have to learn the technique before it is useful
  • the technique is not standardised
  • information is lost when it is shrunk into one figure
  • the technique can only effectively show single hardware failures
  • it is difficult to express events other than failures.

4.2.3      Conclusions for Methods of Illustration

FMEA is a common, reliable, but laborious method to analyse control systems. In large systems the analysis work can be more effective if some other method, or FMEA in a very high level, is used to locate the critical parts of the system. The amount of reasonable effort for analysing a system also depends on safety demands. If safety demands are high, then more effort can be allocated to be sure of the performance of the safety critical functions. Another problem arises when the amount of information in the FMEA is so huge that the essential results are lost. Figure 6 shows how the bottom-up method can be supported by other techniques.

Figure 6. Common methods to support the use of bottom-up analysis. Detailed bottom-up analysis is carried out only to some parts of the system.

It is not always clear how to point out the most essential results of the FMEA. Usually a critical items list and an improvement list is made to demonstrate the results of the analysis, but often a graphical method can show the results better than words. Graphical techniques are very powerful in pointing out certain results, and different graphical techniques provide different points of view of the results, so the analyst must decide which technique best illustrates the essential results. FTA shows well which events or failures may cause a top event. Flux diagrams can effectively show critical paths. Circuit diagrams with ranking information show which parts of the system are the most critical.

No single perfect method exists, which can best illustrate the most critical failures or events; therefore the analyst has to decide on case by case basis which technique to use. Important factors to be considered are:

  • type of results; should the results consider failures, events, human errors, and software errors?
  • extent of the analysis and resources
  • quantity of the results to be illustrated
  • audience
  • type of figures the audience is familiar with.

In some cases, FMEA is carried out to find a major critical failure, “show stopper”, which ends the analysis, because the system then has to be redesigned and the analysis starts from the beginning. If no critical failure is found then all the documentation is important because it provides a piece of proof for safety. If a critical failure is found, it must be well documented.

5  CONCLUSIONS

There does not exist a single ideal approach to analyse complex systems. However, some guidelines for using top-down analysis like FTA (Fault Tree Analysis) or bottom-up analysis like FMEA (Failure, Mode and Effect Analysis) can be stated. FMEA and FTA are the most common methods for analysing failures of control systems. One reason for selecting a certain method is the common practice. If a person is familiar with a certain method and he can use all the required tools for that method, then the analysis can be performed more effectively than with a new type of analysis method.

Both FTA and FMEA can be used in the system level, module level or component level. The difference between methods appears when certain types of failures are sought. In FTA, good system specialists are essential and the results depend very much on what they can find. Good system specialists can also point out the essential failures and so reduce the resources required for the analysis. In FMEA, it is slightly easier to replace experience with hard work since the system is analysed systematically. FMEA is usually more laborious than FTA, but it can reveal some new random failures. One way to ease the FMEA analysis is to document in the analysis table only the most critical consequences.

In comparison to more simple components, complex components introduce new aspects to be considered. Complex components are indeed so complex that it is difficult to analyse them thoroughly, and it is very difficult to predict the failure modes of the components. Also, the programs related to programmable components may contain critical errors. All these reasons cause some uncertainty related to the analysis of the complex components. A single complex component alone cannot control a safety function safely enough, because some redundancy, diversity and/or monitoring is needed. This means that the architecture of the control system is important and it can make the risks caused by complex components to become negligible.

REFERENCES

CiA/DS301. 1999. CANopen Application Layer and Communication profile V4.0. Nürnberg: CAN in Automation International Users and Manufacturers Group e.V. 111 pages.

DeviceNet specification, release 2. DeviceNet Physical Layer and Media (Layer 1 - Physical Layer), 1997, Open DeviceNet Vendor Association, Inc.

DIN 9684 Teil 3. 1993. Landmaschinen und Traktoren - Schnittstellen zur Signalüberteagung - Initialisierung, Identifier (draft). Berlin: DIN Deutsches Institut für Normung. 28 pages.

EN 954-1. 1996. Safety of machinery – Safety related parts of control systems. Part 1: General principles for design.

Fredriksson, L.-B. 1995. CAN Kingdom. rev. 3.0. Kinnahult: Kvaser AB. 104 pages.

Gibbs, W. 1994. Software’s Chronic Crisis. Scientific American, September 1994. http://www.di.ufpe.br/~java/graduacao/referencias/SciAmSept1994.html

Gregeleit, M. & Streich, M. 1994 Implementing a distributed high-resolution real-time clock using CAN-bus. 1. International CAN Conference Proceedings. Erlangen: CAN in Automation e.V. pp. 9-2 ... 9-7.

Hammer, W. 1980. Product Safety Management and Engineering. Prentice-Hall, Inc., Englewood Cliffs, N.J. 324 p.

Hérard, J., Jacobson, J., Kivipuro, M., Malm, T., Ståhlhane, T. & Tiusanen, R. Guideline for the validation of Functional Safety according to IEC 61508. Proposal for Nordtest Method. 48 p.

Ippolito, L. & Wallace, D. 1995. A study on Hazard Analysis in High Integrity Software Standards and Guidelines. NIST, Gaithersburg, U.S. Department of commerce. http://hissa.ncsl.nist.gov/HHRFdata/Artifacts/ITLdoc/5589/hazard.html#36..., 45 p.

IEC 812. 1985. Analysis techniques for system reliability - Procedure for failure mode and effect analysis (FMEA). International Electrotechnical Commission. 41 p.

IEC 1025. 1990. Fault tree analysis (FTA). International Electrotechnical Commission. 40 p.

IEC 61508 -1 : 1998. Functional safety of electrical/electronic/programmable electronic safety-related systems. Part 1: General requirements

IEC 61508 - 3 : 1998. Functional safety of electrical/electronic/programmable electronic safety-related systems. Part 3: Software requirements.

IEC 61508 - 4 : 1998. Functional safety of electrical/electronic/programmable electronic safety-related systems. Part 4: Definitions and Abbreviations

IEC 61508 - 5 : 1998. Functional safety of electrical/electronic/programmable electronic safety-related systems. Part 5: Guidelines on the application of Part 1.

IEC 65A/254/CDV Draft IEC 61508-2. Functional safety of electrical/ electronic/ programmable electronic safety-related systems. Part 2: Requirements for electrical/ electronic/ programmable electronic safety-related systems

IEC 65A/255/CDV Draft IEC 61508-6. Functional safety of electrical/ electronic/ programmable electronic safety-related systems. Part 6: Guidelines on the application of parts 2 and 3.

IEC 65A/256/CDV Draft IEC 61508-7 Functional safety of electrical/ electronic/ programmable electronic safety-related systems. Part 7: Overview of techniques and measures

ISO 11898. 1993. Road vehicles - Interchange of digital information - Controller area network (CAN) for high-speed communication. International Organization for Standardization (ISO). 58 pages.

Kopetz, H. & Damm, A. & Koza, C. Mulazzani, M. & Schwabl, W. & Senft, C. & Zainlinger, R. 1989. Distributed Fault-Tolerant Real-Time Systems: The Mars Approach. IEEE Micro Feb 1989. Pages 25-40.

Kopetz, H. 1994. Fault Management in the Time Triggered Protocol (TTP). Warrendale: Society of Automotive Engineers (SAE). 7 pages. (SAE paper 940140).

Kuntz, W. & Mores, R. & Morse, M.J. 1993. CAN operated on an optical double ring at improved fault-tolerance. European Transactions on Telecommunications and Related Technologies. Vol. 4, issue 4, pp. 465 - 470. ISSN 1120-3862.

Lawson, H.W. & Lindgren, M. & Strömberg, M. & Lundqvist, T. & Lundbäck, K.-L. & Johansson, L.-Å. & Torin, J. & Gunningberg, P. & Hansson, H. 1992. Gudelines for Basement: A Real-Time Architecture for Automotive Systems.  Göteborg: Mecel, Inc. and Lidingö: Lawson Publishing and Consulting, Inc. 42 pages.

Lehtelä, M. 1991. Failure Mode and Effect Analysis of Electronic Circuits. Tampere University of Technology, Electrical Engineering Department. Licentiate thesis. 77 p. + app. 12 p.

M3S Specification 2.0. 1995. An intelligent integrated and modular system for the rehabilitation environment. M3S draft standard, ISO/TC-173/SC-1/WG-7.

Pehrs, J-U. 1992. CAN bus failure management using the P8xC592 microcontroller. Philips Application Note, Report Nr: HKI/AN 91 020. Hamburg: Product Concept & Application Laboratory, 1992. 16 pages.

PrEN 954-2. 1999. Safety of machinery – Safety related parts of control systems. Part 2: Validation.

Tanaka, M. & Hashimoto, K. & Himino, Y. & Suzuki, A. 1991. High-reliability physical layer for in.vehicle high-speed local area network. Warrendale: Society of Automotive Engineers (SAE). 9 pages. (SAE paper 910464).

Tindell, K. & Burns, A. 1994. Guaranteed Message Latencies for Distributed Safety-Critical Hard Real-Time Control Networks. Technical report YCS229. York: University of York. 17 pages.

Tindell, K., Hansson, H. Babbling Idiots, the Dual-Priority Protocol, and Smart CAN Controllers. 2nd International CAN Conference, 1995. pp 7-22 .. 7-28.


APPENDIX A:

CAN bus FMEA

The CAN bus is originally made for road vehicles, but increasingly the system is often used in machine automation. Also some so called “safety busses” are based on CAN.

CAN modules are analysed using FMEA at I/O level. In this case the individual components inside the modules are not analysed. The results would depend on the type of the components and the components are developing rapidly. The FMEA is carried out according to the principles of IEC 812.

During normal operation, several bus failures may occur that could influence the bus operation. These failures and the resulting behaviour of the network are illustrated in Figure A1 and described in table A1. The possible open circuit and short circuit failures are given by the CAN standard [ISO 11898]. The failure 16 is not exactly in the ISO standard, but the failure differs from failure 15 if the shield is grounded from one point. These failure modes should be taken into account in CAN bus FMEA.

Figure A1. Possible failure modes of bus line according to ISO 11898. Failure modes 10-15 are not numbered in ISO 11898[4] and failure mode 16 is not given in ISO 11898. Failure modes 10-12 are interpreted as a single failure mode in ISO 11898.

Table A1. Bus failure detection according to ISO 11898.

Description of bus failures

Behaviour of network1)

Quality of specification2)

One node becomes disconnected from the bus (10,11,12)

The remaining nodes continue communicating.

Recommended.

One node loses power (13)

The remaining nodes continue communicating with reduced signal to noise ratio.

Recommended.

One node loses ground (14)

The remaining nodes continue communicating with reduced signal to noise ratio.

Recommended.

The connection to the shield breaks off in any node (15)

All nodes continue communicating.

Recommended.

The connection to the shield breaks off and all nodes lose shield connection (16)

All nodes continue communicating, but disturbances are more probable.

-----

(no reference to ISO 11898)

CAN_H interrupted (1)

All nodes continue communicating with reduced signal to noise ratio.

Recommended.

CAN_L interrupted (2)

- ” -

Recommended.

CAN_H shorted to battery voltage (3)

- ” -

Recommended.

CAN_L shorted to ground (4)

- ” -

Recommended.

CAN_H shorted to ground (5)

- ” -

Recommended.

CAN_L shorted to battery voltage (6)

- ” -

Recommended.

CAN_L wire shorted to CAN_H-wire (7)

- ” -

Optional.

CAN_H and CAN_L wires interrupted at the same location (8)

No operation within the complete system. Nodes within the resulting subsystem that contains the termination network contains communication.

Recommended.

Table A1. Continued

Loss of one connection to termination network (9)

All nodes continue communicating with reduced signal to noise ratio.

Recommended.

1) The example in figure A1 excludes all fault tolerant modes

2) The quality of specification is defined as follows.

Recommended: If the respective failure occurs the network behaviour should be as described in the second column of the table. To exclude this specified functionality is the manufacturer’s choice.

Optional: If the respective failure occurs the network behaviour may be as described in the second column of the table. To include this fuller specified functionality is the manufacturer’s choice.

 


[1] At least the total length of the wires is reduced, but the number of items (number of wires, connectors or joints, etc.) is not reduced in all cases.

[2] It should, however, be noted that not all diagnostic facilities increase the dependability of the system compared to a less intelligent system; some diagnostic facilities must be implemented only to maintain the same level of dependability as that of the previous generation 'old-fashioned' control systems.

[3] =fractional decrease in the probability of dangerous hardware failure resulting from operation of the automatic diagnostic tests [IEC 61508-4:1998]

[4] Note also that in some editions of the ISO 11898 standard, the numbering of the failure modes is not consistent: the illustrative figure and the detailed table of the ISO standard do not match with the failure mode numbers. The numbering given in Figure A1 follows the numbering of the table in the ISO 11898.

 

English

Basic safety principles - Well-tried safety principles - well tried components


In addition to functional requirements that are defined in functional safety standards, the machinery directive integrates the following requirements that are design rules and sometimes manufacturing technological choices.

ISO 13849-2:2012 : Safety of machinery - Safety-related parts of control systems - Part 2: Validation, defines for these Basic safety principles and Well-tried safety principles for :different kind of technologies in its annexes

Annex A (informative) Validation tools for mechanical systems

Annex B (informative) Validation tools for pneumatic systems
Annex C (informative) Validation tools for hydraulic systems
Annex D (informative) Validation tools for electrical systems
Annex E (informative) Example of validation of fault behaviour and diagnostic means


For example for electrical systems Annex D these tables defines :

Table D.1 — Basic safety principles. These principles are the one that defined the safety at high level such as 

  • Use of de-energization principle - whose objective is to suppress the residua energy in order to suppress the dangerous movement.
  • Protection against unexpected startup
  • ...

Table D.2 — Well-tried safety principles - These principle are more linked to technology and take into account the knowledge of safety devices regarding their characteristics 

  • "Positively mechanically linked contacts" and the remark gives information for the use of positively mechanically linked contacts for, e.g. monitoring function in Category 2, 3, and 4 systems (that complies to EN 50205, IEC 60947-4-1: Annex F, IEC 60947-5-1, Annex L).
  • Positive mode actuation - Direct action is transmitted by the shape (...) with no elastic elements, (...) that comply to some standards such as ISO 14119, ISO 12100.
  • ...

Table D.3 — Well-tried components - that table gives well tried components for which some failures do not have to be taken into account if they comply to a rule or a spécific requirement of one standard such as :

  • Switch with positive mode actuation (direct opening action) that complies to the requirements of IEC 60947-5-1:2003, Annex K
  • ...

The following tables gives for all classic components somes rules where it is possible not to take into account some failures. For example for a PCB, Table D.5 - Short circuit between two adjacent tracks/pads can be excluded if the design of the PCB complies with the requirements of  IEC 60664-1 for distances greater than (...) with pollution degree (...)

Table D.4 — Faults and fault exclusions — Conductors/cables.

Table D.5 — Faults and fault exclusions — Printed circuit boards/assemblies

Table D.6 — (...)

Table D.21 — Faults and fault exclusions — Electronic components — Programmable and/or complex integrated circuits


Similar tables exists for :

Annex A (informative) Validation tools for mechanical systems
Annex B (informative) Validation tools for pneumatic systems
Annex C (informative) Validation tools for hydraulic systems
Annex D (informative) Validation tools for electrical systems
Annex E (informative) Example of validation of fault behaviour and diagnostic means

 

English

Functional safety - detection error codes - CRC and Hamming codes

1 CRC (Cyclic Redundancy Check)

The CRC is an encoding method that consist to group when transmitting information in words of "n-k" bits and to associate them to a word of "n" bits. The redundancy consists of the "k" bits. The number of possible words of "n" elements is "2 n-k" and the other words correspond to words affected by errors.

The methods of "verification by key" objectives are to compress the information. From a finished sequence of information of length "n-k" , a compression mechanism (CRC - Cyclic Redundancy Check), characterizes this sequence of information with condensed information: the key. This key does not correct the errors, this key can detect the differences between several sequences.

 

 
 

1.1.Error detection Mecanism

Consider the following data:

  • i = the number of sequences of informationhaving the same key .
  • n-k = size of the information sequence (fixed or variable).
  • k = key size (resulting from the polynomial division).
  • n = size of the message.

The verification mechanism with key is to compress a message consisting of "n-k" symbols in a finite number of bits (k) .

Each key, of resulting value"S i" is representative of a number "m i" of sequences of information of fixed or variable size. The key "S i", is a word of 'k' bits that can take "2k" different values.

 
 

For an information sequence "mi" of constant size "n" , there are 2n possible forms, and each of these form, there is only one possible key whose value is comprised between "0" and "2k " and the following is obtained:

The probability of detection of error for a sequence "m i(rpresented by the key "Si")  represent the probability of obtain the same value of the key "S i" from an erroneous  sequence "m i" .

The probability of detection of error associated with a key "S i" is defined as follows:

 

mi-1 : represents the number of m-sequences mi having the same key as the exact sequence,

2n-1 : correspond to the total number of possible sequences (2n) minus the exact sequence (1).

Either:

In order to calculate the average detection power, we must sum all the cases corresponding to "mi" information sequences. There are m sequences of information whose probability of error detection is P det_i. The average value for all the "2n" possible sequences is :

 

or 

,  

from where

 

 

In the case of networks transmission, "n" is not generally constant (frames vary in length). Therefore, the residual errors are composed of all the possible combinations of errors on different frame length.

An "m i" sequence information can be written as follows:

mi = « 2n-k + xi »,

Thus we get:

whether

Are derived from the above formula that the probability of error detection of a compression mechanism is maximum when m i is constant regardless Si,

we obtain  mi = 2n-k  :

And we have:

When the key is a constant number of information sequences, the probability of error detection is optimal.

Otherwise, if there is a different number of information sequences, the probability of error detection will be less than Pdet max  and must be estimated in each case.

 

1.2.Properties and choice of a polynomial generator key 

The polynomials generator keys are generally classified into the following three types, regardless of their degree:

· Irreducible polynomials

  • A polynomial of degree k is irreducible if it is not divisible by a polynomial of degree less than k, except by 1.

For instance,

a(x) = x4  x3  x2  x  1 of degree 4, and

b(x) = x8  x4  x3  x  1  of degree 8,

are irreducible polynomials.

primitive polynomials

  • An irreducible polynomial of degree k is primitive if it is a divisor of the polynomial xn - 1 with n = 2k - 1, without the fact to be a divisor of all polynomial of xm - 1 where m < n. This polynomial may also be divisor of certain polynomials xp - 1 where p> n.

Irreducible or primitive polynomials are obtained by a complex mathematical algorithm that is difficul to perform manually if the level is high. Note that there are still primitive polynomials whatever the desired degree.

· The arbitrary polynomial (nor irreducible nor primitive)

Detect the maximum possible error generator polynomial key aims. For that he must comply with the following rules:

1 - if we take a key s(x) corresponding to the input sequence m(x) and obtained using the generator polynomial g(x) primitive and irreducible. If we call e(x) an error sequence, ie which has "1" in the wrong locations and "0" elsewhere, then the origin message m(x) and the wrong message m(x)  e(x) have the same key s(x) if and only if e(x) is a multiple of g(x) modulo 2.

Therefore, the smaller erroneous sequence is g(x), where:

2 - If the generator polynome is of degree k, any errors that may affect an input sequence of length n £ k are detected. It is then possible to determine the number of undetected erroneous ³ for n k sequences, as they are all multiples of g (x).

The length input sequences correspond to n polynomials of degree n-1. There is thus 2 n -1 possible sequences for each polynomial has "n" and these coefficients may take "0" or "1" values.

We prove by induction that there are N = 2 nk - 1 erroneous sequences that are undetectable. Thus, for m = n, we obtain  erroneous base sequences,  erroneous sequences by combining base pairs, etc ..., and finally an erroneous sequences by combining all base sequence sequences or  . From where:

2 -     Si le polynôme générateur est de degré k, toutes les erreurs pouvant affecter une séquence d’entrées de longueur n £ k sont détectées. Il est alors possible de déterminer le nombre de séquences erronées non détectées pour n ³ k, car elles sont toutes multiples de g(x).

Les séquences d’entrée de longueur n correspondent à des polynômes de degré n-1. Il y a donc 2n-1 séquences possibles car chaque polynôme a « n » coefficients et ceux-ci peuvent prendre les valeurs « 0 » ou « 1 ».

On démontre par récurrence qu’il y a N = 2n-k - 1 séquences erronées qui sont indétectables. Ainsi, pour m = n, on obtient séquences erronées de base, séquences erronées en combinant les séquences de base deux à deux, etc..., et enfin une séquence erronée en combinant toutes les séquences de base, soit . D’où :

3 -     Pour une séquence d’entrées de longueur « n » avec « n-k », la probabilité pour que le générateur de clé de degré k ne détecte pas d’erreurs, est égale à [2n-k-1]/[2n-1]. Ceci en supposant que toutes les séquences erronées ont une probabilité d’apparition identique. Lorsque n est très grand, le rapport tend vers 2-k. Cette propriété est la plus importante car elle caractérise avec précision le pouvoir de détection P de l’analyse de clé qui est presque uniquement fonction du degré du polynôme générateur choisi.

4 -     Tout générateur de clé construit à partir d’un polynôme générateur qui possède au moins deux coefficients non nuls, détecte toutes les erreurs simples.

5 -     Tout générateur de clé construit à partir d’un polynôme générateur contenant (xhÅ1) en facteur, détecte toutes les erreurs impaires.

6 -     Tout générateur de clé construit à partir d’un polynôme générateur primitif de degré k, détecte toutes les erreurs simples et doubles si la séquence d’entrée est de longueur au plus égale à 2- 1.

7 -     Tout générateur de clé construit à partir d’un polynôme générateur de la forme g(x)=(x Å 1).f(x) avec f(x) polynôme primitif de degré k, détecte toutes les erreurs impaires et doubles, donc en particulier les erreurs simples, doubles et triples, si la séquence d’entrée est de longueur au plus égale 2k - 1.

8 -     Tout générateur de clé construit à partir d’un polynôme générateur de degré k, détecte tous les types d’erreurs de longueur inférieure ou égale à k dans un message de longueur n (n > k).

9 -     Si le polynôme générateur choisi est irréductible de degré k, alors il détecte ces erreurs répétitives avec une probabilité proche de 1-2-k. S’il n’est pas irréductible, il ne détecte ce type d’erreurs qu’avec une probabilité de 1-2-k/b, où b est la plus forte puissance relative au polynôme décomposé en facteurs irréductibles.

Ces propriétés montrent que le polynôme générateur joue un rôle capital, selon son type (primitif, irréductible, quelconque) et selon son degré. Même avec un polynôme de degré réduit, on constate que la vérification de clé est dotée d’un pouvoir de détection de fautes élevé, quelles que soient les hypothèses de fautes supposées.

A chaque code détecteur d'erreur est associé une distance caractéristique "la distance minimale de HAMMING".

Un code de distance dmin détectera toute les configurations de dmin - 1 erreurs.


1.3.Réalisation d’un C.R.C.

Deux méthodes permettent de réaliser le C.R.C :

·      La division polynomiale

·      La méthode du OU-Exclusif

1.3.1.La division polynomiale

Une suite d’informations numériques représente un message. A tout message est associé une représentation algébrique, c’est-à-dire un polynôme de degré n-1 si le message comprend n informations.

Le message m = [ an-1 a n-2 an-3 ... a 2 a1 a0 ] est associé à un polynôme m(x) :

m(x) = an-1.Xn-1  an-2.Xn-2  ... a1.X1  a0 ,

où X est une variable muette et les coefficients ai des valeurs binaires. Par convention, le coefficient an-1 est le premier bit transmis et correspond au plus fort degré du polynôme. Par exemple, le message n=(1,0,1,1,0,0,0,1) donne le polynôme n(x) = x7  x5  x4  1.

L’opérateur (OU Exclusif) est utilisé car nous ne considérons que des valeurs binaires, chaque bit du message étant traité séparément. L’algèbre sur les polynômes modulo 2 se définit par les opérations binaires suivantes : l’addition, la soustraction, la multiplication et la division. En arithmétique modulo 2, l’addition et la soustraction sont identiques.

La division polynomiale est une division de m(x) par g(x) qui consiste à chercher un quotient q(x) et un reste r(x) de degré inférieur à celui de g(x) tel que :

m(x) = [q(x) g(x)] r(x)

ou :   m(x) - [q(x) g(x)] = r(x)

Cette deuxième forme fait apparaître un mécanisme de soustractions successives où g(x) est décalé de k rangs à gauche, c’est-à-dire multiplié par xk, afin d’atteindre le monôme de plus haut degré de m(x). Alors, q(x) possède un monôme en xk.

Une séquence d’informations correspondant au polynôme m(x) est comprimée au moyen d’une division de m(x) par un polynôme générateur g(x). La clé est le résultat de cette division : soit le quotient, soit le reste selon le type de réalisation effectué.

Posons le polynôme générateur g(x) = xk + bn-1.xn-1 + ... + b1.x1 + 1

 

1.1.Réalisation pratique d’un C.R.C. par la méthode de la division polynomiale

Le principe de la division de deux polynômes peut être mis en oeuvre de façon :

  • ·      matériel,
  • ·      logiciel.

D’un point de vue matériel, des registres à décalage et des OU-exclusifs sont utilisés, et peuvent avoir une structure série ou parallèle. La clé est alors le reste de la division de m(x) par g(x) qui est le contenu final du registre à décalage

Pour la structure série, les données arrivent en série sur la ligne d’informations. Avec une structure parallèle, les données arrivent sur plusieurs lignes d’informations.

Les générateurs de clé série (voir Figure 8) et parallèle (voir Figure 9) construits à partir d’un même polynôme générateur primitif ont la même efficacité.

Dans la structure série présentée ci-dessous, les informations arrivent par l’entrée E, puis entrent dans le registre à décalage.

           Figure : Générateur de clé de type SERIE avec g(x) = x4 + x + 1

Soit la séquence d’entrée (1,1,0,1,0,0,0,1,1), soit le polynôme d’entrée E(x) = x8+x7+x5+x+1.

Les états des registres sont décrits pour chaque front d’horloge dans le Tableau  : Etats des registres d'un générateur de clé de type série.

 

 

Tableau  : Etats des registres d'un générateur de clé de type série

E(x) = [ q(x) g(x) ] r(x)

d’où E(x)/g(x) = q(x) [ r(x)/g(x) ]

Le contenu final du registre est égal au reste de la division. De la même manière, nous pouvons construire un générateur à entrées parallèles.

Les informations en série notées (an-1, an-2, an-3, an-4, ..., a1, a0) sont groupées par bloc de 4, soient (an-1, an-2, an-3, an-4 / an-5, an-6, an-7, an-8 / ...) et présentées en parallèle sur chacune des entrées E0, E1,.E2, E3. En un seul front d’horloge, le résultat est établi pour ces 4 valeurs.

Soient y0, y1,.y2, y3, les états des registres à un instant « t » quelconque et a, b, c, d, les valeurs des registres évoluant suivant le Tableau  : Etats des registres d'un générateur de type parallèle.

Figure  : Générateur de clé à entrées parallèles avec g(x) = x4 + x + 1

La division polynomiale nécessite plus de cycles machine puisqu’il faut autant d’opérations (décalage + OU exclusif) que de bits contenus dans le message à contrôler.


 

Tableau  : Etats des registres d'un générateur de type parallèle

Nous obtenons les 4 fonctions suivantes :

a = a’  y2 = an-1  y y2

b = b’  y1 = an-2  y2  y1

c = c’  y0 = an-3  y3  y1  y0

d = d’ = an-4  y0  y3

1.3.2.Méthode et réalisation pratique d’un CRC à partir d’un OU-Exclusif

Il existe une autre méthode afin de déterminer la clé. L’obtention d’une clé peut faire appel à un registre à décalage où certains bits sont bouclés sur l’entrée par l’intermédiaire de la fonction logique OU-Exclusif (Figure : Registre à décalage 16 bits (générateur de clé)). La clé correspond alors au contenu final du registre après passage en série de l’ensemble des informations. La longueur du registre ainsi que les bouclages sont fonction du polynôme P(x) qui, pour des applications usuelles, est généralement de degré 8 ou 16.

Soit P(x) = x16 + a15.x15 + a14.x14 + ... + a1.x + a0       avec a15 ... a0 = « 0 » ou « 1 »,

Figure  : Registre à décalage 16 bits (générateur de clé)

La clé obtenue grâce à ce principe peut également être calculée par logiciel avec un algorithme adapté. L’algorithme correspondant s’obtient en observant le contenu du registre après 16 décalages successifs.

Si (y... y15) est l’état initial du registre à t = 0 et (a0 ... a15) les 16 bits du message d’entrée, alors le contenu final du registre (x0 ... x15) après 16 décalages sera :

 

 

La colonne 1 représente la totalité du message d’entrée, les colonnes 2 à 7, hors la zone encadrée, représentent le contenu initial du registre décalé respectivement de 5 à 0 rangs.

Il ne reste plus alors qu’à effectuer les OU-Exclusifs entre les valeurs issues du bouclage (partie encadrée). Celles-ci seront déterminées par simple lecture à l’adresse pointée par l’adresse du début du tableau indexée de la valeur du quintuplet.

Ces valeurs seront lues directement dans un tableau de 25 = 32 éléments dans lequel on trouve, pour chaque quintuplet (x15, x14, x13, x12, x11), la valeur correspondante de x15, x15  x14, x15  x14  x13  , x15  x14  x13  x12, x15  x14  x13  x12  x11.

 


1.4.      Les codes correcteurs d’erreurs (codes de HAMMING).

Il est important de ne pas confondre distance de Hamming et codes de Hamming. Les codes de Hamming sont des codes détecteurs et correcteurs d’erreurs particuliers.

1.4.1.Principe des codes de Hamming

Si un transfert d’informations n’est pas correct, il existe 2 manières de retrouver la donnée :

·      en demandant la ré-émission du message,

·      en corrigeant l’erreur.

Il n’est pas toujours possible de demander une ré-émission du message. Le fait de corriger ces erreurs de transmission est alors primordial.

Les codes de Hamming permettent de détecter et de corriger les erreurs simples.

La méthode consiste à ajouter à un message de M digits, K digits de contrôle constituant ainsi un ensemble de (M + K) digits.

Il faut pour un mot de taille M digits, K digits de contrôle, or K digits ne permettent de définir que 2K combinaisons, d’où :

M + K + 1 £ 2K

Ce qui donne pour différente valeur de M, le tableau suivant :

M     (bits)

K (contrôle)

4

3

8

4

16

5

32

6

1.4.2.Utilisation des Codes de Hamming

Pour montrer de quelle façon sont élaborés ces digits, on se place dans le cas d’un message de 4 digits (M = 4) ; 3 digits de contrôle (K = 3) sont alors nécessaires.

L’ensemble du message codé constitue alors un mot de 7 digits écrit sous la forme :

X = [ a1, a2, a3, a4, a5, a6, a7 ]

Les K digits de contrôle sont placés en a1, a2 et a4 sous réserve qu’il y en ait au moins un dans chacun des groupes (a1, a3, a5, a7), (a2, a3, a6, a7) et (a4, a5, a6, a7). Les digits de contrôle sont disposés de façon à ne contrôler que des digits « informatifs » et non d’autres digits de contrôle.

Les états des digits de contrôle sont donnés par les relations suivantes :

a1 = a3  a5  a7

a2 = a3  a6  a7

a4 = a5  a6  a7

Les codes de Hamming sont des codes détecteurs d’erreurs mais aussi correcteur d’erreurs. Il est donc possible de déterminer la position de l’erreur dans un message codé.

Dans notre exemple, l’erreur peut se placer parmi 7 positions. Il est alors nécessaire d’utiliser 3 bits pour coder en binaire la position de l’erreur dans le message codé.

La position de l’erreur se note en binaire [e3, e2, e1 ], et les états e3, e2 et esont définis de la manière suivante :                         e1 = a1  a3  a5  a7

                                                           e2 = a2  a3  a6  a7

                                                           e3 = a4  a5  a6  a7

A l’émission, les égalités e3 = e2 = e1 = 0 doivent être assurées, ce qui signifie qu’aucune erreur n’est détectée.

Si les égalités ne sont pas vérifiées, la position de l’erreur est indiquée dans le Tableau 3 : Position du bit erroné en fonction du code binaire de l’erreur ci-dessous :

Position de l’erreur

e3

e2

e1

digit n°

Aucune erreur

0

0

0

 

a1

0

0

1

1

a2

0

1

0

2

a3

0

1

1

3

a4

1

0

0

4

a5

1

0

1

5

a6

1

1

0

6

a7

1

1

1

7

Tableau : Position du bit erroné en fonction du code binaire de l’erreur

X = [ a1, a2, a3, a4, a5, a6, a7 ]       E = [e3, e2, e1 ]

Les codes de Hamming ne détectent pas l’erreur double. Pour y parvenir, un digit de contrôle supplémentaire (bit de parité) portant sur l’ensemble du message X est ajouté. La longueur du nouveau message codé est alors (M+K+1) digits. Après transmission du message, ce dernier est contrôlé afin de vérifier sa validité.

Si la parité globale est inexacte : le test précédent est appliqué au mot de (M+K) digits :

  • · Si une erreur simple, double ou triple est détectée, elle est corrigée.
  • · Si aucune erreur n’est détectée le mot est exact, l’erreur porte sur le dernier digit de contrôle global.

Si la parité globale est exacte, il y a 0 ou 2 erreurs ; le test de Hamming est appliqué.

  • · Si aucune erreur n’est détectée, il n’y a pas d’erreur.
  • · Si une erreur est détectée, il y a 2 erreurs, le mot est faux et aucune correction n’est possible.
English

Functional safety - error codes detection - parity and chechsum

Description of error detection codes and error correction codes

The Hamming distance and the residual error rates are different for all methods that allows to detect and to correct these transmission errors.  Coding methods that are be presented in the first part of this paper are:

  • error detection codes by parity
  • error detection code method by CHECKSUM
  • error detection codes by CRC 
  • Detectors codes and error correcting codes by HAMMING

The first two methods are described below. CRC and detectors / error correcting codes are described in another article

1 Parity

1.1.Principle of the parity

This is a modulo-2 sum of information bits. 

  • Parity is said even when the parity code equals to the sum modulo 2 of the information bits
  • Parity is said odd when parity code is equal to the complement of that.sum

This method of control is used in the RS-232 series transmission and in microprocessors.

Before each transmission of a word, an extra digit added. It is called a parity bit. After transmission, the presence of a simple error change the parity value and makes the error detectable .

1.2.Détection of errors - Residual Errors

If we call  'p' the probability of individual error of a single digit, and "n" the word length.

The probability of having a single error is

P(1) = P[1st digit wrong and the (n-1 following digit) valid] + P[1st digit valid  and 2nd digit false and the (n-2 following digits) valid] + ...

P(1) = p(1-p)n-1 + (1-p)p(1-p)n-2 + (1-p)2p(1-p)n-2 + ...

P(1) = n[p(1-p)n-1]

Similarly we obtain:

Parity can detect all odd errors with a power of detection PD that is :

PD = P(1) + P(3) + ... + P(2k-1)

Undetected errors are all even errors so a power not to detect failure PND:

PND = P(2) + P(4) + ... + P(2k)

These numbers are a function of p, and must be calculated depending of environments.

Assuming that we have a frame consisting of 8 data bits of information and a key verification by parity bit (n = 9). By changing an even number of bits e = {2, 4, 6, 8} in the frame we have another frame in which errors are not detected by the verification key.

The residual error rate can be calculated in accordance with the information provided on page http://www.industry-finder.com/machinery-directive/functional-safety-and-safety-fieldbus.html, and the probability of residual error is equal to the sum of the even errors :

which is :

R = R1*R2

with : R1=P(2)+P(4)+P(6)+P(8) - the probability of error on the datagiven on

R2 = probability that the frame delimiters are correct. In the case of single-parity, we have two delimiters (start and end of frame) and a probability q * q = q2

Either:

 


2. The CHECKSUM

There are several methods used to achieve a CHECKSUM such as:

  • The method of the modified sum control,
  • The arithmetic addition of the contents of the message,
  • Parity Codes interleaved .

This last method will be detailled.

2.1.Principle of the CHECKSUM

The principle of parity code interleaved CHECKSUM is a message M digits, written in the form of an array of "L" lines and "C" column (M = CL). In this table, is added (L + 1) th row and a (C + 1) th column constructed such that the words that are read horizontally and vertically are even.

The method then determines the number of "1" contained in a row (column). If this number is even, is assigned to the intersection of that line (column) and the last column (row), the state "0" or "1" if the number is odd. This operation is performed for each row and each column. The digit located at the crossroads of the last row and the last column is selected to ensure parity of the entire message.

For example, a message M = 49 digits (7 words of 7 bit):

 

 

 

 

 

 

 

A set of 64 digits (49 digits for the transmittion of the informationand 15 digits reserved to the control) is transmitted and received in series in the transmission channel. At the reception, the table is restored. Only one mistake can make the parity check of the line and the corresponding column, therefore, detection and correction are possible.

A double or quadruple error is detected but can not be corrected, the triple error is not always detected.

This code is more powerful than parity but requires additional hardware resources. A message which originally M digits (M = CL) requires C + L + 1 check digits, so an increase or redundancy DR:

For 7 data bits,     M = 49                        DR = 15/49 = 0.31

For 15 bits of information,  M = 225                      DR = 31/225 = 0.14

2.2.Detection coverage of a CHECKSUM - Residual Errors

The power detection/coverage of a CHECKSUM is calculated differently depending on the number of bytes, first from a main method or from a probabilistic method.

2.2.1.Main Méthode

This method is is based an accurate count of all combinations of "N" words "P" bits of a sequence of information leading to an amount equal to the sum "S" obtained on the "N" word of the sequence containing the original information.

The theoretical power of detection P D is defined as the ratio (expressed as a percentage) between the number of detected errors NDET by the control and the total number of errors NTOT that can occur on the sequence of information to control.

The expression of the total number of errors NTOT represents the number of combinations that can take the (P x N) bits of the sequence to control, which gives for N bytes:

TOT = 2 PN

The determination of the number NDET or of the number of undetected errors NNDET  where NDET + NNDET = 2P.N

 

NNDET correspond to the number   of combinations of "N" bytes whose sum is identical to the sum "S" obtained on the bytes of the sequence of information in the absence of errors.

The power of detection can then be expressed as follows:

Calculations that performs enumeration of all combinations of "N" bytes of an information sequence whose sum is "S" leads to the following formulas:

for S = 0;  ; P = 8 bits (the number of bits in a byte)

PD = 100 x (1-1/28.N) » 100 %

for  S   [ k (2P - 1),(k+1) (2P - 1)] with

integer k  [ 0 , N-1-INT(N/2P)]

         0! = 1

where  

Numerical calculation is possible only for small values ​​of N (N <40). Beyond this level, it is necessary to use the "probabilistic" method.

2.2.2."Probabilistic Method" 

With the "probabilistic method", each byte may be considered as a random variable. A given configuration is the sequence of "N" corresponding random variables.

Considering that the probability distribution is the same for each byte (average value m, variance s 2), where "N" is large (N> 50), the distribution of the sum "S" of random variables is very substantially normal (follow the normal law used in statistics), average value Nm and variance  Ns2 and there the probability that the sum of "N" variable is equal to a value "s":

 

The probabilité of an entire value « s » is :

with          and         

The probability is maximum for the Nm value of "s":

Applied to memories of 8-bit data size (P = 8) and for a given data capacity, we get the following results:

For P = 8

    and  

    and         

For:

N = 512 o   Pr[ S = 65280 ]           ~          0.0002386       (99.976 %)

N = 16 ko   Pr[ S = 2088960 ]       ~          0.000042         (99.9958 %)

N = 1 ko     Pr[ S = 130560 ]         ~          0.0001688       (99.983 %)

N = 32 ko   Pr[ S = 4177920 ]       ~          0.0000298       (99.997 %)

English

Functional safety and safety fieldbus

The evolution of communications in industrial environments

Automated industrial applications have evolved over 20 years. There are less than 10 years, the connection between each element of an industrial system was completely realized with a multitude of cables, where each information was conveyed by a unique cable.

Nowadays, industrial applications communications are increasingly made with networks. This category of industrial LANs is called fieldbus because closer to the machine, closer to the ground. In this context, the objective of a network ("communications system") is to ensure communication between devices connected to the manufacturing process (sensors, actuators, machines, ...) and their control. Network development field is related to the following parameters:

  • Fieldbus allows for communication entities at a cost and a relatively small sizes of the devices (smart devices) compared to the organs in which they should be integrated.
  • Fieldbus offers the advantage of flexibility of use and implementation, and reduces total cost of ownership during the lifetime of an application system.
  • Fieldbus creates another dimension in terms of distances. If the areas of parallel or serial transmission are in an area of ​​about ten meters, the fieldbus extends from 100 m to 5 km.
  • Fieldbus allows up to several hundred subscribers. In addition, several fieldbus can be connected to a system. The fieldbus is becoming the technology solution for connecting a range of simple or sophisticated products requiring harmonization of interfaces connections.
  • Fieldbus allows to take into account the architecture of critical cases in which exchange is performed through shared variables which are used by all the bus subscribers.
  • The fieldbus can convey more information and make that same information is available to all the organs that needs it. With its extensive features, it allows different devices to interoperate with each other.
  • The fieldbus can change and adapt configurations in applications where the manufacturing process is very long (several months to several years). It allows you to configure some distant organs.

However, the use of a field bus must in no way be detrimental transfer times information. In addition, the information provided must be reliable and safe. Approaches, focusing aspects, have led to different bus architectures field.

1 networks and transmission of information

1.1. The field networks in a communication system

The communications through Fieldbus covers both a communication system within the overall control system of an enterprise levels. The figure below details the communication scheme overall business.

Figure  : Fieldbus and global communication

  • Factory level 
  • Cell- level
  • Field level

Communications at the cell level are typically the one hand, between the cell level controllers and subordinate control and secondly, between controllers (PLC) and other organs.

At the sensors / actuator levels, the information flow is generally "vertical" ie between controllers and sensors / actuators.

The system for transmission of data must enable a reliable and efficient transfer of information in an imperfect environment. A High data integrity and high-speed transmission are often contradictory properties. In this case, the increased requirements for integrity can be achieved only at the expense of a reduction in the actual flow of information. Therefore, the requirements for the transmission rate and data integrity must be selected consistently with the accuracy of this system.

1.2. Network and protocol architecture: The reduced OSI Reference Model

The OSI (Open Systems Interconnection) is a reference model for open systems interconnection. It is a reference model for developing interconnection standards and cooperation of distributed systems. The OSI model defines a layered architecture and is applicable to all types of networks. A system is said to be open when it enables communication between devices of different types within the rules of communication in an OSI environment.

For transmission systems that require particularly short reaction (on networks with bandwidths of reduced transmission), an architecture for improved performance (EPA) has been designed. Frames based on this architecture using only three layers, namely the physical layer, data link layer and the application layer. The protocols that are based on this reference model are defined in EN 60870-5 standards series and also in EN 61784-X series.

 

Figure  : Fieldbus in the OSI model

Field networks are based on a protocol architecture, generally oriented OSI model (Open System Interconnection) model which is composed of seven layers.

In safety applications, the transmission of information must be secure. This notion of safety means two aspects:

  • integrity of the information provided; 
  • controlled transmission time information.

1.3. Measurements of the quality of a transmission information

The fundamental purpose of the communication function in monitoring and process control, is to achieve the maximum coherence of the system, that is to say, consistency between the physical state of a process and its image in the database of the transmission system.

In digital systems, the exchange of information is carried out digitally by a succession of "0" and "1". In stressed environments (EMI, potential differences between earth, component aging, etc.) , this succession of logic levels can be changed.

The data transmission must be done correctly in the presence of harsh environmental conditions. It is therefore necessary to ensure effective protection message against :

  • Undetected errors (on the bits and frames)
  • Losses undetected of information
  • The inclusion of unwanted information (message simulation by parasites, etc.),
  • Separation or disruption of coherent information.

1.3.1. Transmission of information and classes of integrity

The efficiency and the level of integrity of a coding system will be compared with integrity classes defined in the EN 60870-5-1,standard, standard related to remotely safety control systems .

EN 60870-5-1 gives requirements in terms of residual error rate or residual error probability. This concept, which is close to the one of the rate of error detection, is howeverdifferent because it is intended to "count" the residual errors. In the case of the transmission of information three characteristics must be taken into account: the quality of the transmission; the variable frame length; the delimiters of the frame.

  • The quality of the transmission involves an "error probability on binary elements (bits)."
  • The variable length of the frame leads to calculate a coverage taking into account the various possible cases.
  • The delimiters of the frame are information of start and end of the frame that allows to "synchronize" exchanges.

EN 60870-5-1 standard defines three classes of integrity of transmissions (see Figure: Classes integrity related to the transmission channel). This figure provides a graphical representation of the integrity of the transmission as a function of three parameters:

  • probability of error on the binary elements,
  • Hamming distance "d" of a code,
  • residual error rate R,

The curves defines the upper limits of residual error probability (or residual error rate R) depending on the error probability on binary elements (bits). These curves stop at an error rate on the bits p = 0.5, which corresponds to a reception random bits (received signal without noise). The slope of the curves for p <10 -4 represent the Hamming distance of the code "d" that is used.

Three classes of integrity I1, I2 and I3 were set for the data transmission. The use of each class depending on the nature of the data.

Figure  : integrity Classes related to the transmission channel

The quality of the transmission paths (that defines the probability error on binary elements - bits) should be monitored to ensure a lower limit that is acceptable for probability of error on the binary elements.

1.3.1.1.Probabilité error on the bits

This protection must include the physical characteristics of the transmission of information including:

· The source that generates the message to be transmitted.

  • The transmitter that puts the signal to be transmitted has a defined level (electrical, optical, ...).
  • The transmission channel which transports the information.
  • The receiver which converts the information into a message.

· The receiver that processes the received message.

Sources, recipients and transmission lines are the basics of transmission problems. In order to solve this problem it is necessary to choose the transmitter and receiver. For transmissions without channel transmission problems are negligible, it is otherwise in the case of communications transmission channels. Indeed, due to the disturbances present on the transmission channel, the information provided to the recipient is not always identical to the information provided by the source. It appears that transmission errors are defined by the term "bit error rate - BER".The transmission quality is qualified as better when the rate BER is low. The transmission quality is measured by means of the bit error probability. on binary elements

The received signal R (t) is the sum of the transmitted signal and the interfering transmission noise R (t) = S (t) + N (t). (N (t) is white Gaussian noise).

The probability of error by binary elements (or bit error rate - BER) is a characteristic of the transmission. This BER is identical to the average error probability. This is the ratio of energy per bit to noise density  .

Experimentally, the BER is defined by the ratio of the number of erroneous bits after demodulation and decoding on the number of bits transmitted during a given time interval.

A poor quality connection has a BER on the order of 10 -4  for a telephone line and 10 -7 for data transmission.

1.3.1.2.Codage and Hamming distance

A first means for defining the effectiveness of a coding system for a secure transmission is the "Hamming distance". This distance is used to study the similarity between two words of same length. This distance is more or less effective filter that allows the transmission of "mistakes" during transmission. These "errors" can be quantified by a measure: the residual error rate.

Coding objective is to detect and correct transmission errors. The choice of the detection / correction code is done from the minimization of the distance between the modulated and encoded signals. This distance is the HAMMING distance. This parameter is used to characterize the resemblance between words of equal length. This distance may be defined as the number of bits by which the two words differ. The Hamming distance is a function of the coding technique used and the length of words.

if we consider two words:

The binary Hamming distance  dH(Ci, Cj of two binary code words Ci, Cj  is defined as the number of bits by which two different combinations are different.

The HAMMING distance is defined as the weight of the word HAMMING Ci  Cj sum made ​​component by component (number of bits equal to "1" of this amount). HAMMING distance d(u,v) of these two words is calculated as the arithmetic sum of the digits modulo 2 that are of the same rank, taken in pairs.

For example: d(u v) with

d(u,v) = 2 because there are two different digits (digits 2 and 4).

The minimum Hamming distance is the minimum number of inversed bits required for one codeword turns into another codeword. Without encoding, the distance is "1", for the encoding with parity this distance is "2".

To calculate the residual error rate using formulas provided in the standard EN 60870-1 and this, for frame formats defined and for specific encodings, we need to know the HAMMING distance and the minimum HAMMING distance .

A minimum Hamming distance of an encoding system C(N, K) linear is defined by :

with: wH the number of codewords of weight H,  wH characterizes the ability of the coding system to detect errors and allows to characterize the performance of the code.

The minimum Hamming distance is specific to each detector error code. In order to know this minimum distance HAMMING, there are two solutions:

· To choose the frame formats defined in standard EN 60870-5-1.

· Either iteratively calculate this distance depending on the chosen code and the length of the frame. This is the solution that will be developped later to define the minimum distance-HAMMING specific to the CRC code and to the organization of the frame sequence information.

For BCH codes and CRC type wm is approximated as follows for m  d min:

A linear code C(N, K) of minimum distance dmin can correct with a binary decoding a maximum likelihood of any configuration comprising "t" errors such that dmin  2.t +1.

The code can also detect any configuration of "m"  errors so that  dminH+1 and simultaneously correct any configuration t  H errors such that dminH+t+1

1.3.1.3.Residual error rate

In order to measure the characteristics of a transmission network in terms of error detection coverage, it is necessary to integrate the error detection rate, the probability of occurrence of errors. The calculated result is called the "residual error rate."

The notation (n, j) below conforms with the standard EN 60-870-1. In these circumstances, and with the assumptions of the following paragraph for a CRC coding, we get:

Probability of occurrence of errors m from N binary elements

We will no longer talk of detection rate of errors, but of probability for there to be "1, 2, ... m" errors. The calculation of this parameter introduce the probability to have "m" errors (m = 1, 2, 3, ...) from "N" symbols (bits)  .

In the case of a binary symetric channel without memory where a disturbance is thermal noise origin (Gaussian white noise - which is the case of transmission networks) we get:

The number of combinations is calculated using the following formula:

When a code C (N, K) with HAMMING distance dmin is used in error detection, it is able to detect all error configurations that lead to a word receiveddifferent from the codeword (that is to say, the configurations of a weight of less than d min).

Considering the error of given configurations as equally likely, the probability of an error pattern of weight m  d min is a codeword is given by 

 

Or,

Hence, R =  (Ratio of the sum of non-detected faults on the set of all possible cases).

or else

The CRC (Cyclic Redundancy Check) prohibiting certain types of errors corresponding to the minimum HAMMING distance between two frames, we get the following general formula:

 

Effectiveness of the coding system

In order to ensure the integrity of the transmission in terms of information content, and to detect and / or correct transmission errors, techniques that consist to introduce redundancy into the message to be transmitted (at of the issuer) are implemented.

The coding of a signal allows to adapt this signal according to the physical device of the transmission channel. The coding takes into account the channel bandwidth and the signal to noise ratio.

The design of a transmission system must involve two parameters:

  • the modulation
  • the correcting coding.

The coding transforms a binary word Mi of K symbols {m i,k} in a binary word Cof N symbols  {ci,n} called codeword. The encoder introduces a redundancy which results in an increase in the symbol rate between input Mi and output of the encoder Ci . Encoder establishes a correspondence between the symbols on the output of the encoder and the transmitted signals. In the case of our study, the symbols in the input and output are binary.

The integrity control of the transmission is performed first by checking the frame format and also by controlling the accuracy of the verification key. The control of the format consists of checking the delimiters and the number of bits in the incoming frame.

There are several key verification, which can improve the safety level of the transmission by introducing redundancy into the transmitted frame. At the receiver level, the thing to do is just to check if the coding rule used in transmission is satisfied. This encoding allows to detect transmission errors.

The errors considered are:

  • Errors entering in the information data.
  • Errors entering in the verification key.
  • The errors on the delimiter and on the number of bits constituting the frame.

The code efficiency characterizes the number of additional information not relevant to the transmitted parameters but necessary to ensure its integrity. This efficiency is the ratio of the number of bits correctly transmitted to the total number of bits constituting the codeword or data frame:

with:

  • k = number of information bnary elements per frame,
  • q = probability to receive correct bits,
  • n = total number of bits per frame including frame delimiters and bits.

1.3.2. Response time of a communication system

Another characteristic parameter of the transmission of information is the time duration between the information is sent and the time where the information is used. Depending on the nature of the messages (monitoring the evolution of a parameter, stopping a cycle ...), and the types of reactions, the times durations must be limited.

 

 

English

Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on personal protective equipment

Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on personal protective equipment

http://eur-lex.europa.eu/legal-content/FR/TXT/?uri=CELEX:52014PC0186

 

 

English

IECEx Quality System Requirements for Manufacturers - OD 005

This Operational Document, OD 005 sets out the IECEx System requirements for manufacturer’s quality system, relating to the production of Ex products. 
OD 005, IECEx Quality System Requirements for Manufacturers, has now been published in two parts: 

  • Part 1: Guidance on the establishment and maintenance of a quality system 
  • Part 2: Audit Checklist 

This Document needs to be read in conjunction with ISO 9001:2008. 

You can download these documents on the www.iecex.com website at the following address : 


​Part 1: Guidance on the establishment and maintenance of a quality system 

The content of the first part of the document is very similar to the content of ISO/IEC 80079-34:2011 Explosive atmospheres -- Part 34: Application of quality systems for equipment manufacture

CONTENTS OF part 1 of OD 005

Hereafter is given the content of the document with the different color signification

  • text similar both to ISO 9001:2008 and EN ISO/CEI 80079-34:2011
  • text that is not applicable to EN ISO/CEI 80079-34:2011
  • Text that is added to EN ISO/CEI 80079-34:2011
  • text of ISO 9001:2008 with additionnal requirements introduced by ISO/CEI 80079-34:2011

 

 

INTRODUCTION
1 Scope
1.1 General
1.2 Permissible exclusions
2 Normative references
3 Terms and definitions
4 Quality management system
4.1 General requirements
4.2 Documentation requirements
4.2.1 General
4.2.2 Quality manual

4.2.3 Control of documents
4.2.4 Control of records
5 Management responsibility
5.1 Management commitment
5.2 Customer focus
5.3 Quality policy
5.4 Planning
5.4.1 Quality objectives

5.4.2 Quality management system planning
5.5 Responsibility, authority and communication
5.5.1 Responsibility and authority
5.5.2 Management representative
5.5.3 Internal communication

5.6 Management review
5.6.1 General
5.6.2 Review input
5.6.3 Review output
6 Resource management
6.1 Provision of resources
6.2 Human resources
6.2.1 General

6.2.2 Competence, training and awareness
6.3 Infrastructure
6.4 Work environment

7 Product realization
7.1 Planning of product realization
7.2 Customer-related processes
7.2.1 Determination of requirements related to the product

7.2.2 Review of requirements related to the product
7.2.3 Customer communication
7.3 Design and development
7.3.1 Design and development planning 
7.3.2 Design and development inputs
7.3.3 Design and development outputs
7.3.4 Design and development review
7.3.5 Design and development verification
7.3.6 Design and development validation

7.3.7 Control of design and development changes (in the scope of EN ISO/CEI 80079-34:2011 but not in the scope of OD 005)
7.4 Purchasing
7.4.1 Purchasing process
7.4.2 Purchasing information
7.4.3 Verification of purchased product
7.5 Production and service provision
7.5.1 Control of production and service provision
7.5.2 Validation of processes for production and service provision
7.5.3 Identification and traceability
7.5.4 Customer property
7.5.5 Preservation of product
7.6 Control of monitoring and measuring equipmentt
8 Measurement, analysis and improvement 
8.1 General
8.2 Monitoring and measurement
8.2.1 Customer satisfaction

8.2.2 Internal audit
8.2.3 Monitoring and measurement of processes
8.2.4 Monitoring and measurement of product
8.3 Control of nonconforming product
8.4 Analysis of data 
8.5 Improvement
8.5.1 Continual improvement
8.5.2 Corrective action
8.5.3 Preventive action

Annex A (informative) Information relevant to particular types of protection and specific products
Annex B (informative) Verification criteria for elements with non-measurable paths used as an integral part of a type of protection
Bibliography

 


Part 2: Audit Checklist 

This second part of the document is a checklist based this time on the OD 005-1 document.

This document is a basis of most ExCB where performing their audits.

English

Pages

Subscribe to Machinery directive 2006/42/CE - Functional safety & ATEX directive 2014/34/EU RSS