## Decision Matrix Definition

Imagine you are in the market to buy a new house and there are several options available to you. You go thru all those options and pick your top 5 or 10 options which is fairly easy. The hard part is choosing the one final choice from the top N. Since you are going to buy only one house and going to live there for the next several years, you can’t simply choose one randomly based on gut feelings or recommendations from realtors. Decision matrix will help you decide your final choice rationally and be confident about it.

When you have several options available to you then you use decision matrix to compare these options against selection criteria in order to make an objective choice. This involves deciding what criteria are most important and then using them as a basis for reaching an acceptable  decision.  The alternate solutions are each compared against the selection criteria, and then the options are scored for each criterion.  The team uses the final ranking as an input to their decision making.

## The Method – How to create and use decision making matrix

1. Develop a list of the top several alternative solutions.
2. Develop relevant, specific selection criteria against which the alternatives will be scored.  Use weighting for the more important criteria.  Separate the criteria into Musts and Wants.
3. Create a matrix, with alternatives across the top and the Must/Want criteria down the left side.  The Musts all have to be satisfied.  The Want criteria should each be assigned an Impact weighting value from 1 to 10, with 10 being the most important.
4. Score the alternatives against the criteria. Musts are mandatory. The Wants for the remaining alternatives are scored on a 1 to 10 basis.  If an alternative provides a complete/best solution to the Want, it scores a 10.  Multiply the score by the Impact value for that Want.  Total the scores for each alternative.
5. Consider looking at the better features of the various alternatives to see if a new and even better solution can be synthesized, i.e. with the best features of different solutions can be combined together for a completely new solution.
6. This resulting matrix is called Decision Matrix or Grid Analysis Matrix, Pugh Matrix Analysis, and Multi-Attribute Utility Theory

The house on Princess Street not only met all the musts, but it scored the highest on the Wants. Hence we can decide to go with choice.

For each Want, we come up with a score for each house that first has met all of the Musts.  The 1-10 score for each house is multiplied by the 1-10 weighting Impact for that Want.  The higher the score for a house, the better it meets that Want.  The Impact weightings pertain only to the significance of each Want.

Why was the Bendejo street house not even scored?

Because it did not meet one of the Musts.  It was over \$300,000.

## Understanding Process Variation thru examples

Let’s try to understand Process Variation using below examples. Take two shooters Joe and Max and understand their shooting. Joe on average is dead on whereas Max on average is way off, as shown below.

Who is more likely to be consistently on target over the long haul? How do we improve their shooting?

If Joe’s first shot were high, what might you be tempted to do with his site? Adjust it down, right?

But if his next shot would have been below the target due to his natural shooting variation, look at what would happen if you had adjusted the sight (see next picture).

The amount of process adjustment actually moves the second shot an equal amount further from where it would have been without the adjustment.  This harmful and unnecessary adjusting is called tampering.

## Who is a better bowler?

Lets take another example of two bowlers, Jane and Pat and see their bowling scores in the below picture. If X is the bowling score, then X (“x-bar”) is the average score.

Although Jane’s average score (x-bar) of 140 is lower than Pat’s, she is at least more consistent. When there is a lack of consistency between measured responses, there is less certainty (i.e., more risk) about what you can expect over time.

## Another Example: Comfort of a Chef

In this example lets take a chef who works in his kitchen all day. On average, is the chef comfortable with the temperature?

If X is temperature, what is average temperature?

Avg(x) = (130 + 10) / 2 = 70

So the average doesn’t explain his discomfort. What would be a better measure?

The range, or difference between the largest and smallest values would be a better measure, indicating the amount of variation:

R  =  130 – 10 = 120

That large of amount of variation explains why he is so uncomfortable.

## DFMEA Quiz

1. In the DFMEA process, the potential or known failure modes are identified, then the causes are identified, and then the effects are identified.
Answer:Incorrect. The effects are identified before the causes in the DFMEA process.
2. A DFMEA would normally contain the causes that occur in production but are due to the design and causes that are due to errors or flaws in aproduction process that was called out by the designer.
Answer:Incorrect.  A DFMEA should not include causes that are due to execution errors or flaws in a production process (unless the product cannot be economically/reasonably manufactured by the process called out by the designer).
3. It is a good idea to start the PFMEA before the DFMEA is finished.
4. The following would be good examples of a part function for a light bulb – no out-of-box failures and provide bright light.
Answer:Incorrect.   Better examples would be to provide 99.7% defect-free turn-on reliability, and to provide 75 watts of light for 450 hours.  These are more exact, and clearer to the DFMEA team.
5. Asking “What does the customer experience as a result of the failure mode of…” or “Will the component or assembly be inoperative,intermittently operative, noisy, not durable, etc.?” are examples of questions asked in the step in which the effects are determined.
[ASLR2]
6. Asking what could happen to cause a loss of function, such as “How could this part fail?”, or “Could it break, deform, wear, corrode, bind, leak, short, open, etc.?” are examples of questions asked in the step in which the causes are determined.
Answer: Incorrect. These questions should be asked in the step in which failure modes are identified.
7. When describing the effects, it is important to use the customer’s  terminology.
Answer: Incorrect. Use the perspective of the external or internal customer, but it isn’t necessary to use the customer’s terminology here.
8. Do not consider the likelihood that the defect will occur in scoring the severity rating criterion.
9. In the DFMEA process, the potential or known failure modes are identified, then the causes are identified, and then the effects are identified.
Answer: Incorrect. The effects are identified before the causes in the DFMEA process.
10. Generally, to reduce the probability of the failure mode happening, better design verifications/controls will be required.
Answer: Incorrect. Generally, to reduce the probability of the failure mode happening, a design change will be required.
11. For the Occurrence rating, a 1 reflects a remote possibility of it happening and a 10 indicates a high probability of the cause and its failure occurring in major proportions.
12. If none of failure mode causes are rated high, e.g., none have RPN scores over 500 or 600, then there is a good chance you missed some key failure modes or causes.
Answer: Incorrect. The absolute RPN scores do not mean anything. The relative ranking provides prioritization of the cause’s preventive/corrective action.

## FMEA Risk Priority Number

* The Risk Priority Number (RPN) is calculated by multiplying together the values of the severity, occurrence, and detection risk assessment
criteria.

* The RPN provides an indicator of the relative seriousness and priority of each failure mode. The higher the RPN, the more relatively serious is the failure mode.

## FMEA Risk Priority Number Formula

RPN = Severity  x  Occurrence  x Detection

Although some FMEA instruction manuals say the RPN values are used to rank and compare the seriousness of failure modes, the scoring is actually done for each cause. The Corrective/Preventive Actions will be generated for the higher priority causes. The purpose of generating the RPN scores is to prioritize for which causes there should be Corrective and/or Preventive Actions developed and completed.

Note: the absolute values of the RPN do not mean anything. As long as you are consistent in applying the criteria, then their RPN scores can be used
for relative ranking and prioritizing of the causes and their actions within the project. It is notadvised to compare/rank actions between projects from separate FMEAs unless the teams followed the same ranking criteria scales, and did so very consistently.

## Recommended Corrective/Preventive Action

* The Recommended Corrective/Preventive Actions for the root causes of the prioritized failure modes should be described specifically and briefly.
* The intent of any recommended action is to reduce the severity, occurrence, and then detection rankings.
* It should be stated in a positive way, e.g., “Increase spring wire diameter by .001 in.”  or  “Conduct trials to identify the optimum annealing method” or “Develop a more robust engagement tab design”.

To reduce a severity ranking, a design revision will be required. A design revision will also be required to reduce an occurrence rating, by removing or controlling one or more of the failure mode’s causes. Increasing DV actions only is not as desirable since it does not address the severity or occurrence of the failure mode.

The solutions should either be based on data, or they should call for a trial or investigation to generate the necessary data for determining the best solution(s) to solve the root causes. One of the risks in the FMEA process is for the team to generate Corrective/Preventive Actions that are not proven with observed data. The Actions should end up calling for revised tolerances, specifications, etc.

## Responsibility Information for the Corrective/Preventive Action

• The Recommended Corrective/Preventive Actions should each show the person or department responsible for resolution, and the Estimated
Completion Date.
• The action items should be reported on and monitored.
• The actual actions taken later are described. The severity, occurrence, and detection criteria are re-evaluated after the actions are taken, then a new RPN is then calculated.

The responsible party should usually be someone on the project team, even though the person(s) actually doing the tasks may be someone else.

## Quantifying FMEA: Risk Assessment

This is an alternate set of rating guidelines. Customized guidelines should be developed for your industry and company.

This summary of alternate ranking criteria can also be used as a guideline for developing your own standardized set of values. In the table, P refers to the Probability of an event. For example, for a rank of 6 in Occurrence, the probability of the cause and the  failure mode happening is approximately 1 in 80.

At a level of 6 in Severity (of the failure mode’s effect), there would be some customer dissatisfaction, starting to approach a high level with some interoperability of the part. For a rank of 6 in Detectability, the probability of missing the failure mode would be somewhere around 1 in 100. These probability values might be appropriate for some industries but not for others.

## Detectability or Probability of Detection of the Potential Failure Mode

* The probability that the failure mode caused by the design would be detected by the DV before the part is released to production on a 1 to 10 scale, where a 1 indicates a high probability and a 10 indicates a low probability.
* Ask  “If there is a design weakness, how likely is it to be detected by our current DVs before the part is released to production?”

Detectability is the third of the 3 Risk Priority Number (RPN) rating criteria. It is an assessment of the ability of the DV program to identify a potential design weakness before release to production. It is a relative ranking (within the FMEA) of the ability of the DV to assure the design adequacy for the failure mode and/or cause or failure mechanism under study.

A high score means a low probability of detection, and a low score indicates a high probability of detection. Because of this possible confusion, some users call this criterion the Probability of Non-Detection.

A sample of a Probability of Detection ranking scale is shown on the next page.

 Ranking Description 1-2 There is a very high probability that the design weakness, when it exists, will be detected before the part is released to production, usually by the DV program. A very low probability of non-detection. 3-4 High probability that the design-based cause of the failure will be detected before the part’s release to production, usually by the DV program. A low probability of non-detection. 5-6 Moderate probability that the design-based cause of the failure will be detected before the part’s release to production, usually by the DV program. A moderate probability of non-detection 7-8 Low probability that the design-based cause of the failure will be detected before the part’s release to production. High probability of non-detection. 9-10 Very low probability that the design-based failure will be detected before the part’s release to production. Very high probability of non-detection.

The company should adopt these sample rankings to their industry and include specific probability values. A low score means that there is a low probability that the failure mode or defect based on a design weakness, when the failure mode does exist*, will escape the DV controls before the design is released to Production and the defect has a chance to reach the customer. *Do not consider the likelihood that the defect will occur in scoring this criterion; this was evaluated in the second criterion (Occurrence).

A high score for Detectability means that there is a high probability that the failure mode or defect, when it exists, will escape the DV controls before the design is released to Production. If there is more than one design control that could prevent or detect a given failure mode or its cause under study, consider in the scoring the one that would provide the best prevention or detection.

## DFMEA – Probability Of Occurrence Ranking

One possible source of reference data is the reject and service history for similar components. A sample of a Probability of Occurrence ranking scale is shown below:

## Probability Of Occurrence Ranking

 Ranking Description 1 Remote possibility of happening. Where similar parts have been used for similar functions in previous design or processes, failures have been non-existent. 2-3 Low failure rate with similar parts having similar functions in previous designs or processes. 4-6 Moderate failure rate with similar parts having similar functions in previous designs or processes. It is generally a failure that had occurred occasionally in the past but not in major proportions 7-9 Frequent failure rate with similar parts having similar functions in previous designs or processes. 10 High probability of failure.

The company should adopt these sampling ratings to their industry and include specific probability values. This table of a Probability of Occurrence ranking scale provides guidance for assigning criteria values to the causes and their failure modes on a consistent basis. It is only a sample and should not be used without adaptation to the industry and company.
[ASLR2]
The project team should evaluate the likelihood of each cause and its associated failure mode and reach a consensus on their ranking values. The idea is to find comparative differences among the causes so they can later be prioritized (with the total RPN scores).
* The design verification steps that have been or are being used with the same or similar designs.

* List all current design controls which are intended to prevent the specific design-based causes(s) of the potential failure modes from occurring, or are intended to detect the design-based causes(s) of the potential failure or the resultant failure mode.

Examples: design checklists, peer reviews, reliability analysis, variation stack-up analysis, FEA, rapid prototyping, hand off check lists, environmental stress analysis, etc. Design controls are intended to assure the design adequacy for the failure mode and/or cause under consideration.

A suggested technique when listing the Design Controls in their column is to add the prefix ‘p’ to DVs which are intended to prevent the failure mode cause or the prefix ‘d’ to DVs which are intended to detect the failure mode or its cause.  It is not uncommon for a company just beginning to use FMEA to find that it does not have many DVs.

## DFMEA – design based root causes

The company should adopt these sample ratings to what is appropriate to their industry. This table of a Severity ranking scale provides guidance for assigning criteria values to the failure mode effects on a consistent basis. It is only a sample and should not be used without adaptation to the industry and company. The project team should evaluate each effect and reach a consensus on their ranking values. The idea is to find comparative differences among the effects so that their associated causes can later be prioritized. Failure modes with a severity rank of 1 probably do not need to be analyzed further.
It is an indication of a design weakness, the consequence of which is the failure mode. Consider the possible design mechanisms and/or causes of each failure mode. Analyze what conditions can help bring about the failure modes.

Make sure the list of causes is thorough. This helps point the way toward preventive/corrective actions for all pertinent causes. Consider the difference between effects, contributing causes, and root causes.

## Examples of causes:

* Wrong polymer specified → moisture absorption → loss of dielectric
* Excess annealing → malleable base material alloy → bent tab
* Maximum material condition stack-up → worn mating surface
* Insufficient gold plating specified →  loss of signal transmission integrity
* Least material condition stack-up → gap → Tab edge does not engage cam

Here you are looking for the potential design-based root causes of the potential failure modes. This includes causes that occur in production but are due to the design. Do not include causes that are due strictly to errors or flaws in the production process – save these for the PFMEA. Again, refer to the completed FTA to help ensure the list of causes is as thorough as possible, and that the root causes are identified. Other possible causes could include improper material specified for process or end-use operating environment, incorrect algorithm, incorrect software specification, insufficient re-lubrication capability, incorrect cam path, etc. Other possible failure mechanisms could include creep, fatigue, wear, galvanic action, EMI (electromagnetic interference), etc.

* The probability that the cause will happen, and that it will result in the failure mode, on a 1 to 10 scale. A 1 reflects a remote possibility of it happening and a 10 indicates a high probability of the cause and its failure occurring in major proportions.
* Ask “How likely to happen is the cause?” and then “How likely is it to result in the potential failure mode?”
* Generally, to reduce the probability of the failure mode happening, a design change will be required.

Occurrence is the second of the 3 Risk Priority Numbers (RPN) rating criteria. Think of the relative likelihood or probability of the cause actually
happening during the design life of the part and then resulting in the failure mode.

## DFMEA – Severity ranking scale

A sample of a Severity ranking scale is shown here:

## Severity Ranking

 Ranking Description 1 The effect of the failure is of such a minor nature as to be undetectable by the customer. For example, the part may be out of specification on a non-key quality characteristic but not have any noticeable effect on the system. 2-3 The failure’s effect is of a minor nature, although it is detectable by the customer, it causes only slight annoyance without any noticeable degradation in the system performance. 4-6 The failure’s effect causes some customer dissatisfaction and some system degradation. 7-9 The failure’s effect causes major customer dissatisfaction and major system degradation. serious safety/legal implications. 10 Sudden, catastrophic failure without any prior warning. very serious legal implications.

## DFMEA – Chain of events

The loss of dielectric is an example of a failure mode resulting in a low withstanding voltage, which in turn results in the assembly sorting out (the effect). Using a chain of events is a good technique if it helps you better explain the failure modes and their effects. Look for the outcomes or consequences of the failure mode on the part, assembly, other parts, end-user, etc.  You should also include safety or regulatory non-compliance outcomes.
Other examples include unpleasant odor, unstable, regulatory non-compliance, intermittent operation, or poor appearance.
Assuming that the failures have occurred, give specific descriptions of the ways in which customers could observe each failure. Use the perspective of the external or internal customer, but it isn’t necessary to use the customer’s terminology here.
Note that there could be more than one effect for a given failure mode, or, an effect could be the result of several failure modes. Don’t forget to refer back to your FTA. Note: MTBF stands for mean time between failures.

The severity or estimated consequence of the effect on a 1 to 10 scale, where a 1 is a minor nuisance and a 10 indicates a severe total failure of the system.

When weighing the consequence (effect) of the failure, ask “How serious would the effect of this failure be to the customer, assuming it has appended?”

To reduce the severity of the effect of a product failure mode, a part design action is usually required.

The part design action to reduce a high rating may include design additions or revisions that mitigate the resultant severity of the failure, e.g., seat belts in a car. Severity is the first of the 3 Risk Priority Numbers (RPN) rating criteria. Think of the estimated consequence or seriousness of the effect, assuming it does exist*, on the external and/or internal customer. If the effect is critical, the severity is probably high. *Do not consider the likelihood that the defect will occur in scoring this criterion; this will be evaluated in the second criterion (Occurrence).