Sunday, 6 April 2014

Failure investigation report.
     
     Failure investigation & analysis is done by Sheds in case of line failures of diesel locomotives and investigation reports are prepared. Review of failure investigation reports indicates that most of the reports do not meet basic objective of whole exercise. Reports generally remain confined to symptoms and rarely penetrate root cause. Suggestive remedial measures remain flippant and only obliquely address key objectives.
It is essential to understand the basic objective of the whole exercise so that meaningful reports are prepared and corrective actions are taken. Reports should not leave a reader speculating the root cause and wondering further course of action. This write up is an attempt to address these issues and educate our supervisory staffs who have not been formally trained in this extremely important field.
Failure investigation report should
·        Tell us exactly why the component has failed
·        What we have learnt from this analysis
·        what has to be done to prevent a recurrence
Every failure leaves clues as to why it happened. In most of failure cases a trained person can use the basic techniques of failure analysis to diagnose the mechanical causes behind a failure, without having to resort to expensive and sophisticated analytical tools like electron microscopy. Then, knowing how a failure happened, the investigator can arrive at the root causes of why it happened.
      The most common reasons for failure of components include:
·  Service or operating conditions (use and misuse)
·  Improper maintenance (intentional or unintentional)
·  Improper testing or inspection
·  Assembly errors
·  Fabrication/manufacturing errors
·  Design errors (stress, materials selection, and assumed material condition or properties)
Failure analysis and prevention, Vol 11, ASM Handbook, ASM International 2002 recommends Nine Steps of a Failure Investigation - 


(2)
1.     Understand and negotiate goals of the investigation
2.    Obtain clear understanding of the failure
3.    Objectively and clearly identify all possible root causes
4.    Objectively evaluate likelihood of each root cause
5.    Converge on the most likely root cause(s)
6.     Objectively and clearly identify all possible corrective actions
7.    Objectively evaluate each corrective action
8.    Select optimal corrective action(s)
9.    Evaluate effectiveness of selected corrective action(s)
  Failure Analysis Procedures
The principal task of a failure analyst during a physical-cause investigation is to identify the sequence of events involved in the failure. Like the basic process of the scientific method, failure analysis is an iterative process of narrowing down the possible explanations for failure by eliminating those explanations that do not fit the observations. The basic steps are:
1.    Collect data
2.    Identify damage modes present
3.    Identify possible damage mechanisms
4.    Test to identify actual mechanisms that occurred
5.    Identify which mechanism is primary and which is/are secondary
6.    Identify possible root causes
7.    Test to determine actual root cause
8.    Evaluate and implement corrective actions
     Generally, a failure analyst will start with a broad range of possible explanations but, over time, will narrow and refine the existing possibilities. The failure analyst must repeatedly ask the following questions as an investigation develops possible explanation(s) for actual events:
·  What characteristics are present in the failed/damaged component?
·  What characteristics are present or expected in an undamaged component?
·  What are the possible explanations that would account for the differences between damaged and undamaged components?
· What test(s) can be performed to confirm or eliminate possible explanations and refine knowledge about the observed damage?
Synthesis of failure:
Before concluding the investigation, study all the facts and evidence of the failure, both positive and negative, in order to provide the answers to the typical questions for mechanical failures of components are given below:-
·        Was the part properly installed?
·        Was the part maintained properly?
·        How long was the part in service?
·        What was the nature of the stress at the time of failure?
·        Is there any vibration noted on the part before failure?
·        Was the part subjected to overload?
·        Was it subjected to service abuse?




(3)

·        Where there are any changes in the environment before failure?
·        Was the part properly maintained during schedules?
·        Was the failure is due to ductility, brittleness, or a combination of both?
·        Did the crack or defect start recently or had it been growing for a long time?
·        Did the failure start at one point, or did it originate at several points?
·        Did the failure start at or below the surface?
The investigator must understand the potential ways a component could be damaged, the clues that would differentiate between these various scenarios, and the physical meaning each of these clues would have. Comparison of observations with characteristics of expected damage and mechanisms will enable the analyst to narrow down the possible failure explanations and understand the meaning of the observations made.
  Limiting conditions that refine the scope of explanations for observed damage can be defined by using the following two rules of thumb:
·  The Sherlock Holmes Rule: When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
·  Occam's Razor: When two or more explanations exist for a sequence of events, the simple explanation will more likely be the correct one.
This combined with the theoretical analysis should indicate the problem that caused the failure.
Logic Tree:
To interpret a failure accurately, one has to gather all pertinent facts and then decide what caused them. To be consistent, it is essential to develop and follow a logic path and start to build a "Logic Tree." that ensures a critical feature will not be over looked. A logic tree is a tool that uses deductive logic to guide thought processes used to draw correct conclusions. A logic tree is a disciplined methodology that prompts the user to answer questions that will eventually identify the root causes of a failure event.
     The first step in building a logic tree is to properly define the failure event to ensure that the analyst is truly working on the problem and not the symptoms. To do this, one must identify the failure event in the top block, and the modes of the failure event on the second level of the tree. 
     Next step is clearly defining the failure and proceeding to analyze its root causes. Questioning to build the logic tree is simple and consistent. One must keep asking "How can the preceding event occur? One has to start out very broad getting more and more specific while vertically extending the tree. 
     As the investigator continues to each level and keeps asking the same question of "How Can?", he forces himself to look at all the "cause" possibilities instead of looking only at the most likely possibility. As these possibilities are explored it is necessary to verify whether they actually occurred or not. If they did indeed occur the analyst would go to the next level by asking "How Can" again. The process of hypothesizing and verifying continues until the various root causes are discovered. 
Initiative logic to jump to a conclusion must be avoided even if it appears most obvious.


(4)
     Component Roots (or Physical Roots) are the tangible things that fail. These are the loco components that generally fail, and these are typically the roots that are most familiar to us. The Human Roots are the points of inappropriate human intervention. This is generally where a human did something wrong or forgot to do something. i.e. act of omission or commission. For this reason it is necessary to ask "why did the person decide to do what they did?" What was the rationale for the decision? People do not generally wake up in the morning and say, "I think I will go to work today and fail miserably!" The answer lies in why people do what they do. What about the systems in which we operate, allowed the person to do what they did? The answers to these questions are defined as the latent roots or the organizational system roots.
Road map for failure investigation is outlined below:
1.   Find out what happened. The most important step in failure analysis is to seek answers soon after it happened and talk to the people involved. Persons involved in maintenance and overhauling are able to throw lot of light from their direct and tacit knowledge on the issue. Try to understand exactly what happened and the sequence of events leading up to it. 
2.    Make a preliminary investigation. Examine the broken parts, looking for clues. Do not clean them yet because cleaning could wash away vital information. Document the conditions accurately and take photographs from a variety of angles of both the failed parts and the surroundings.
3.    Gather background data. Check the drawing, OEM manual, specification, schedule forms relevant to failure. Note down the current operating condition and relevant parameters; time, booster pressure, rack, temperatures, amperage, voltage, load, throttle condition, pressure, lubricants, materials, corrosives, vibration, etc. Compare the difference between actual operating conditions and design conditions. Look at everything that could have an effect on locomotive working. The first step in any failure investigation is to gain good understanding of the condition under which the part was operating. Collect all data regarding any repair done before the failure, any important work carried out prior to the failure, and the service period of the component, as well as any problems noticed during the operation of that particular component.
4.    Check the trend: look for trend of relevant data. See the graph of current, voltage, fuel oil pressure, lube oil pressure etc to see the slope of the graph. Is graph exhibiting positive or negative slope or it remains flat? Download the data in locos equipped with MBCS, MCBG, event recorder, TCC and CCB. Carefully see the trend and examine all fault messages logged. Check the recurrence, frequency & origin of failure. Is the problem endemic to one Shed only, then find out reason, what is done wrongly and what are not being done. Variation study in maintenance practices will reveal clues for cause of failures.
5.    Determine what failed. Look at the initial evidence and decide what failed first—the primary failure—and what secondary failures resulted from it. Sometimes these decisions are very difficult because of the size of analysis that is necessary. Find out what changed. Compare current operating conditions with those in the past.
6.    Examine and analyze the primary failure. Clean the component and look at it under low-power magnification, 5x to 50x. What does the failure face look like? There are often "chevron marks" on the face of a brittle fracture

(5)
that show the progression of the failure across the piece. These chevrons or "arrows" always point to where the crack started from the failure face, determine the forces that were acting on the part. Important surfaces should
be photographed and preserved for reference. The most important point to understand when doing failure analysis on a fractured part is that the crack always grows perpendicular to the plane of maximum stress.
7.    Lab testing of  the failed piece and the support material. Laboratory studies in the investigation of metal failure include verification that the chemical composition of the material that failed is within the specified limits. The studies also include the checking of dimensions and physical properties of the failed component Perform hardness test, Non-destruction testing(NDT) like dye penetrant & ultrasonic examination, lubricant analysis, alloy analysis, Macroscopic examination, Chemical analysis etc.
8.    Incident Causation Scenario: Determine the failure type and the forces that caused it. Draw Incidence time line and logic diagram Review all the steps listed. Leaving any questions unasked or unanswered reduces the accuracy of the analysis. Alternative Incident Causation Scenarios should be explored and ruled out logically.
9.    Simulate the situation: recreate the situation where feasible and validitate your hypothesis wherever possible.
10. Determine the root causes.  Categorise root causes into technical, human and management/maintenance system and always ask, "Why did the failure happen in the first place?" this question usually leads to human factors and management systems. Each root cause may be required to be dealt with differently; people will have to recognize personal errors and to change the way they think and act.
After completion of failure investigation,

The completed failure analysis report should include the following sections:
a) Description of the failed component
b) Service condition at the time of failure
c) Prior service history
d) Manufacturing and processing history of component
e) Mechanical and metallurgical study of failure
f) Metallurgical evaluation of quality
g) Summary of failure causing mechanism
h) Recommendations for prevention of similar failures

When the cause of a failure has been determined, a corrective action plan should be developed, documented, and implemented to eliminate or reduce recurrences of the failure. To minimize the possibility of an unmanageable backlog of open failures, all open reports, analyses, and corrective action suspension dates should be reviewed to ensure closure. A failure report is closed out when the corrective action is implemented and verified or when the rationale is documented for any instances that are being closed without corrective action.

Recommendation should pin point specific action to be taken in improving product quality, method of inspection, knowledge and skill level of staff involved. Action plan should be both short term and long term. All line and Shed failure incidences should be investigated and investigation report should be prepared.

     It is suggested that Sr.DME should discuss this article with all their Shed officials including all supervisors. Power point presentation and interactive session will help in disseminating key concepts of failure investigation and report making. ACMT should be closely associated in failure analysis and report preparation.


No comments:

Post a Comment