Failure
investigation report.
Failure investigation & analysis is done by Sheds in case of line failures of diesel locomotives and investigation reports are prepared. Review of failure investigation reports indicates that most of the reports do not meet basic objective of whole exercise. Reports generally remain confined to symptoms and rarely penetrate root cause. Suggestive remedial measures remain flippant and only obliquely address key objectives.
Failure investigation & analysis is done by Sheds in case of line failures of diesel locomotives and investigation reports are prepared. Review of failure investigation reports indicates that most of the reports do not meet basic objective of whole exercise. Reports generally remain confined to symptoms and rarely penetrate root cause. Suggestive remedial measures remain flippant and only obliquely address key objectives.
It is essential to understand the
basic objective of the whole exercise so that meaningful reports are prepared
and corrective actions are taken. Reports should not leave a reader speculating
the root cause and wondering further course of action. This write up is an
attempt to address these issues and educate our supervisory staffs who have not
been formally trained in this extremely important field.
Failure
investigation report should
·
Tell us
exactly why the component has failed
·
What we have
learnt from this analysis
·
what has to be
done to prevent a recurrence
Every failure leaves clues as to why
it happened. In most of failure cases a trained person can use the basic
techniques of failure analysis to diagnose the mechanical causes behind a
failure, without having to resort to expensive and sophisticated analytical
tools like electron microscopy. Then, knowing how a failure happened, the
investigator can arrive at the root causes of why it happened.
The most common reasons for failure of
components include:
· Service or operating conditions (use and misuse)
· Improper maintenance (intentional or unintentional)
· Improper testing or inspection
· Assembly errors
· Fabrication/manufacturing errors
· Design errors (stress, materials selection, and assumed material
condition or properties)
Failure analysis and prevention, Vol 11, ASM Handbook, ASM
International 2002 recommends Nine Steps of a
Failure Investigation -
(2)
1.
Understand and negotiate
goals of the investigation
2.
Obtain clear understanding
of the failure
3.
Objectively and clearly
identify all possible root causes
4.
Objectively evaluate
likelihood of each root cause
5.
Converge on the most likely
root cause(s)
6.
Objectively and clearly
identify all possible corrective actions
7.
Objectively evaluate each
corrective action
8.
Select optimal corrective
action(s)
9.
Evaluate effectiveness of
selected corrective action(s)
The principal task of a failure analyst during a physical-cause
investigation is to identify the sequence of events involved in the failure.
Like the basic process of the scientific method, failure analysis is an
iterative process of narrowing down the possible explanations for failure by
eliminating those explanations that do not fit the observations. The basic
steps are:
1.
Collect data
2.
Identify damage modes
present
3.
Identify possible damage
mechanisms
4.
Test to identify actual
mechanisms that occurred
5.
Identify which mechanism is
primary and which is/are secondary
6.
Identify possible root
causes
7.
Test to determine actual
root cause
8.
Evaluate and implement
corrective actions
Generally, a failure analyst will start with
a broad range of possible explanations but, over time, will narrow and refine
the existing possibilities. The failure analyst must repeatedly ask the
following questions as an investigation develops possible explanation(s) for
actual events:
· What characteristics are present in the failed/damaged component?
· What characteristics are present or expected in an undamaged
component?
· What are the possible explanations that would account for the
differences between damaged and undamaged components?
· What test(s) can be performed to confirm or eliminate possible
explanations and refine knowledge about the observed damage?
Synthesis of failure:
Before
concluding the investigation, study all the facts and evidence of the failure,
both positive and negative, in order to provide the answers to the typical
questions for mechanical failures of components are given below:-
·
Was
the part properly installed?
·
Was
the part maintained properly?
·
How
long was the part in service?
·
What
was the nature of the stress at the time of failure?
·
Is
there any vibration noted on the part before failure?
·
Was
the part subjected to overload?
·
Was
it subjected to service abuse?
(3)
·
Where
there are any changes in the environment before failure?
·
Was
the part properly maintained during schedules?
·
Was
the failure is due to ductility, brittleness, or a combination of both?
·
Did
the crack or defect start recently or had it been growing for a long time?
·
Did
the failure start at one point, or did it originate at several points?
·
Did
the failure start at or below the surface?
The investigator must understand the potential ways a component
could be damaged, the clues that would differentiate between these various
scenarios, and the physical meaning each of these clues would have. Comparison
of observations with characteristics of expected damage and mechanisms will
enable the analyst to narrow down the possible failure explanations and
understand the meaning of the observations made.
Limiting conditions that refine the scope of explanations for observed damage
can be defined by using the following two rules of thumb:
· The Sherlock Holmes Rule: When you
have eliminated the impossible, whatever remains, however improbable, must be
the truth.
· Occam's Razor: When two
or more explanations exist for a sequence of events, the simple explanation
will more likely be the correct one.
This combined with the theoretical
analysis should indicate the problem that caused the failure.
|
Logic Tree:
To interpret a failure accurately, one has to gather all pertinent
facts and then decide what caused them. To be consistent, it is essential to
develop and follow a logic path and start to build a "Logic Tree."
that ensures a critical feature will not be over looked. A logic tree is a
tool that uses deductive logic to guide thought processes used to draw
correct conclusions. A logic tree is a disciplined methodology that prompts
the user to answer questions that will eventually identify the root causes of
a failure event.
|
|
The first step in building a logic
tree is to properly define the failure
event to ensure that the analyst is truly working on the problem and not
the symptoms. To do this, one must identify the failure event in the top
block, and the modes of the failure event on the second level of the
tree.
|
|
Next step is clearly
defining the failure and proceeding to analyze its root causes. Questioning to build
the logic tree is simple and consistent. One must keep asking "How can
the preceding event occur? One has to start out very broad getting more and
more specific while vertically extending the tree.
|
|
As the investigator continues to each level and keeps asking the same
question of "How Can?", he forces himself to look at all the
"cause" possibilities instead of looking only at the most likely
possibility. As these possibilities are explored it is necessary to verify
whether they actually occurred or not. If they did indeed occur the analyst
would go to the next level by asking "How Can" again. The process
of hypothesizing and verifying continues until the various root causes are
discovered.
Initiative logic to jump to a conclusion must be avoided even if it
appears most obvious.
(4)
|
|
Component Roots (or Physical Roots)
are the tangible things that fail. These are the loco components that
generally fail, and these are typically the roots that are most familiar to
us. The Human Roots are the points of
inappropriate human intervention. This is generally where a human did
something wrong or forgot to do something. i.e. act of omission or
commission. For this reason it is necessary to ask "why did the person
decide to do what they did?" What was the rationale for the decision? People
do not generally wake up in the morning and say, "I think I will go to
work today and fail miserably!" The answer lies in why people do what
they do. What about the systems in which we operate, allowed the person to do
what they did? The answers to these questions are defined as the latent roots
or the organizational system roots.
|
Road map for failure
investigation is outlined below:
1.
Find
out what happened. The most important step in failure
analysis is to seek answers soon after it happened and talk to the people
involved. Persons involved in maintenance and overhauling are able to throw lot
of light from their direct and tacit knowledge on the issue. Try to understand
exactly what happened and the sequence of events leading up to it.
2.
Make
a preliminary investigation. Examine the broken
parts, looking for clues. Do not clean them yet because cleaning could wash
away vital information. Document the conditions accurately and take photographs
from a variety of angles of both the failed parts and the surroundings.
3.
Gather
background data. Check the drawing, OEM manual,
specification, schedule forms relevant to failure. Note down the current
operating condition and relevant parameters; time, booster pressure, rack,
temperatures, amperage, voltage, load, throttle condition, pressure,
lubricants, materials, corrosives, vibration, etc. Compare the difference
between actual operating conditions and design conditions. Look at everything
that could have an effect on locomotive working. The first step in any failure
investigation is to gain good understanding of the condition under which the
part was operating. Collect all data regarding any repair done before
the failure, any important work carried out prior to the failure, and the
service period of the component, as well as any problems noticed during the
operation of that particular component.
4.
Check
the trend: look for trend of relevant data. See
the graph of current, voltage, fuel oil pressure, lube oil pressure etc to see
the slope of the graph. Is graph exhibiting positive or negative slope or it
remains flat? Download the data in locos equipped with MBCS, MCBG, event
recorder, TCC and CCB. Carefully see the trend and examine all fault messages
logged. Check the recurrence, frequency & origin of failure. Is the problem
endemic to one Shed only, then find out reason, what is done wrongly and what
are not being done. Variation study in maintenance practices will reveal clues
for cause of failures.
5.
Determine
what failed. Look at the initial evidence and
decide what failed first—the primary failure—and what secondary failures
resulted from it. Sometimes these decisions are very difficult because of the
size of analysis that is necessary. Find out what changed. Compare current
operating conditions with those in the past.
6.
Examine
and analyze the primary failure. Clean the component
and look at it under low-power magnification, 5x to 50x. What does the failure
face look like? There are often "chevron marks" on the face of a
brittle fracture
(5)
that
show the progression of the failure across the piece. These chevrons or
"arrows" always point to where the crack started from the failure
face, determine the forces that were acting on the part. Important surfaces
should
be
photographed and preserved for reference.
The most important point to understand when doing failure analysis on a
fractured part is that the crack always grows perpendicular to the plane of
maximum stress.
7.
Lab testing of the failed piece
and the support material. Laboratory studies in
the investigation of metal failure include verification that the chemical
composition of the material that failed is within the specified limits. The
studies also include the checking of dimensions and physical properties of the
failed component Perform hardness test, Non-destruction testing(NDT) like dye
penetrant & ultrasonic examination, lubricant analysis, alloy analysis,
Macroscopic examination, Chemical analysis etc.
8.
Incident
Causation Scenario: Determine the failure type and the
forces that caused it. Draw Incidence time line and logic
diagram Review all the steps listed. Leaving any questions unasked or
unanswered reduces the accuracy of the analysis. Alternative Incident Causation
Scenarios should be explored and ruled out logically.
9.
Simulate
the situation: recreate the situation where feasible
and validitate your hypothesis wherever possible.
10. Determine the root causes. Categorise root causes into technical, human
and management/maintenance system and always ask, "Why did the failure
happen in the first place?" this question usually leads to human factors
and management systems. Each root cause may be required to be dealt with
differently; people will have to recognize personal errors and to change the
way they think and act.
After
completion of failure investigation,
The completed
failure analysis report should include the following sections:
a) Description of the failed component
a) Description of the failed component
b) Service
condition at the time of failure
c) Prior service
history
d) Manufacturing
and processing history of component
e) Mechanical and
metallurgical study of failure
f) Metallurgical
evaluation of quality
g) Summary of
failure causing mechanism
h)
Recommendations for prevention of similar failures
When the cause of a
failure has been determined, a corrective action plan should be developed,
documented, and implemented to eliminate or reduce recurrences of the failure.
To minimize the possibility of an unmanageable backlog of open failures, all
open reports, analyses, and corrective action suspension dates should be
reviewed to ensure closure. A failure report is closed out when the corrective
action is implemented and verified or when the rationale is documented for any
instances that are being closed without corrective action.
Recommendation should pin point specific action to be taken in
improving product quality, method of inspection, knowledge and skill level of
staff involved. Action
plan should be both short term and long term. All line and Shed failure
incidences should be investigated and investigation report should be prepared.
It is suggested that Sr.DME should discuss
this article with all their Shed officials including all supervisors. Power
point presentation and interactive session will help in disseminating key
concepts of failure investigation and report making. ACMT should be closely
associated in failure analysis and report preparation.
No comments:
Post a Comment