Fault Tree Analysis (FTA) is a method for discovering the root causes of failures or potential failures. FTA is a top-down approach that helps you understand how to fix or prevent a failure.
FTA starts with a top-level event like a service outage. Then one goes down to detailing all the contributing faults and the causes of faults for that failure. You use the resulting FTA diagram to establish countermeasures to eliminate the causes of the outage.
This can be used for Availability and Continuity activities in search of what can go wrong with the IT infrastructure. Thus it may help isolating weak spots on the infrastructure which in turn enables countermeasures identification and implementation.
It is also useful for proactive Problem management geared towards Incident prevention as a result of removing infrastructure errors.
This method was developed back in 1961 by H.A. Watson of Bell Laboratories and the Minuteman Launch Control System was the first major subject were FTA was applied.
It has been used in Aeronautical projects, Six Sigma (Analyze phase of the Six Sigma business improvements process) and… ITIL projects.
A FTA article by Hank Marquis makes a good job on introducing the Fault Tree Analysis method. It explains the method in “six easy steps” with an example.
You can create FTA diagrams with Microsoft Visio 2007.
For another take on FTA take a look at this Health and Safety Briefing 26 – Quantified Risk Assessment Techniques from The Institution of Engineering and Technology that uses a Crash at Main Road Junction example.
Fault Tree Analysis has been used for NASA’s Apollo program and for some review studies regarding the Three Mile Island nuclear power accident. More on History of FTA by Clifton A. Ericson II from Boeing.