## DPMO – Defects Per Million opportunities

We have calculators developed in JavaScript as well as in Excel. You can play with them below or download for offline practice. DPMO is a measure of process performance. A defect happens when a product’s quality characteristic such as color, weight, size is not in conformance to the product’s specification.  Observed during an average production run,  defects per million opportunities is defined as the average number of defects per unit divided by the number of opportunities to make a defect on the product during that run normalized to one million. Defects per million opportunities (DPMO) is also known as nonconformities per million opportunities (NPMO). We calculate DPMO using the below formula:

$DPMO = \frac{Defects\hspace{2mm}*\hspace{2mm}1000000}{Defect\hspace{2mm}Opportunities\hspace{2mm}Per\hspace{2mm}Unit\hspace{2mm}* \hspace{2mm}Number\hspace{2mm}Of\hspace{2mm}Opportunities}$

### DPMO Calculator

Number Of Defects:
Number Of Opportunities:
Number Of Defect Opportunities per Unit:
Click to Calculate DPMO:

Six Sigma:

## DPM – Defects Per Million

We calculate DPM using the below formula:
$DPM = \frac{Defects\hspace{2mm}*\hspace{2mm}1,000,000}{Sample\hspace{2mm}Size}$

### DMP Calculator

Number Of Defects:
Sample Size:
Click to Calculate DPM:

Six Sigma:

## Sample Size

Sample Size is calculated using the below formula:

Where CI is the confidence Interval in decimal format. If the Confidence Interval is ±4% then use 0.04.
p is Percentage of the sample picking a choice, expressed in decimal.

Sample Size Calculation:  For a Confidence Level of 95%, which is the most commonly used Confidence Level by researchers, we use a p = 0.50 and  Z value = 1.96. A confidence level of 95% means that you are 95% certain about the outcome.

### Sample Size Correction For finite Population

The above formula works for bigger populations. For smaller or finite populations, you need to correct the value obtained from the above using the below formula.

### Sample Size Calculator

Confidence Level: 95%
Confidence Interval:
Population:
Click to Calculate Sample Size:

### Defect Metrics Calculators in Microsoft Excel Spreadsheet

Further reading: SixSigma Daily DPMO, iSixSigma discussion on how to calculate DPMO and Sigma

## Pareto Analysis using Pareto Chart

Pareto chart is a Lean and Six Sigma tool. Pareto chart can be used in Pareto Analysis to perform root cause analysis. Pareto rule is also known as 80/20 rule since it states that 80% of the problems are caused by 20% of the causes or issues. Pareto Chart or Pareto Diagram is named after the Italian economist Vilfredo Pareto of the 19 th century.

Using this Pareto tool, one can visually identify the most occurring defects, most important factors or the most common problems. These “Most Important” factors are also known as “The vital few“.

## Data for Pareto Analysis

Lets take the example of customer returns of toys made by a toy manufacturer. Lets start collecting data about these rejection over a fixed time period, say one month. 854 data points have been collated and grouped into 5 categories:

1. Category 1 (Example: Damaged Packaging) 155 Occurrences
2. Category 2 (Example: Color Faded) 221 Occurrences
3. Category 3 (Example: Missing Part) 33 Occurrences
4. Category 4 (Example: Missing Brochure) 112 Occurrences
5. Category 5 (Example: Wrong Toy Sent) 333 Occurrences

Now lets take the above defect categories and sort them in descending order based on the frequency of the problem occurance.

1. Category 5 → 333 Occurrences → percentage of total occurrences = 333 ÷ 854 = 38.99 %
2. Category 2 → 221 Occurrences → percentage of total occurrences = 221 ÷ 854 = 25.88 %
3. Category 1 → 155 Occurrences → percentage of total occurrences = 155 ÷ 854 = 18.15 %
4. Category 4 → 112 Occurrences → percentage of total occurrences = 112 ÷ 854 = 13.11%
5. Category 3 → 33 Occurrences → percentage of total occurrences = 33 ÷ 854 = 3.86%

Lets create an excel worksheet and place this data into it, sorted high to low. Column one is the Defect category, column 2 is the frequency of occurrence, column 3 is the percentage from the above list.

Lets create one more column and place the cumulative percentage of the frequency. Notice that for the Category 5 the cumulative frequency percentage is same as the frequency percentage. For the next category Category 2, its the some of category 5 and 2 which amounts to 38.99+ 25.88 = 64.87. For the next category in list the cumulative is 38.99+ 25.88 + 18.15 = 83.02 and so on. Finally the last category, category 3 has a cumulative frequency percentage of 100%.

The below picture shows the screenshot of the excel table that we just created. We used “Format As Table” feature to format the data into a table so we can sort on the columns easily by clicking on the arrow next to the column headings in Excel.

Next thing we need to do is create a Pareto Chart to represent the above the data graphically. Pareto chart is a bar chart with line chart overlay-ed on top of the bar chart. The bar chart represents the categories with the frequencies in descending order. We use the left Y Axis to represent the values and X Axis to represent the categories. The line chart represents the “Cumulative Frequency percentage”. We used the secondary Y axis to represent the line chart values since the scales or ranges of these values do not match.

The above chart shows all the defect categories clearly in descending order in a bar chart along with the cumulative frequency of occurrence as a line chart with values on the secondary Y axis.

## How to do Pareto Analysis based on the above Pareto Chart

Pareto Analysis is analyzing the data from the above chart and finding out where the line graph crosses 80% mark on the secondary Y axis (right hand side). Then find out all the categories to the left of that point which are “vital few” or most significant factors. The remaining categories (or factors) are called “Useful Many” and they are less significant.

Vital Few = Categories with cumulative freq 80 below = Categories 5,2,1

Useful Many = Categories with cumulative freq 80 above = Categories 3,4

## Pareto Chart and Analysis Summary

• Pareto Charts help finding which issues are causing most problems
• Pareto Charts are used in root cause analysis
• Pareto Charts are a decision making tool and do not contain data such as detail data analysis and costs of failure.
• Pareto analysis results in finding the efforts where it will have the most impact.
• Pareto Charts decide the order in which the issues will be addressed.

## What is cost of quality defects?

Cost of quality defects is the cost incurred due to poor quality. The cost can be measured as a percentage of lost sales. For many companies, the costs for quality defects amounts up to 40% of total sales turnover. In other words, you can improve your sales by up to 40% by simply improving your quality.

It is difficult to increase sales turnover in bad economic conditions, recession etc but one can increase the company’s profit by reducing the company’s unnecessary costs for deficient product and/or information quality.

## What is Hidden factory ? The visible costs of quality versus hidden costs of quality

The cost of quality can be divided into the visible costs of quality and hidden or unmeasured costs of quality.

• Scrap
• Returns
• Re-working
• Refurbishing

### The hidden cost of quality

Most of the costs of poor quality are hidden from our normal quality measures. These costs of quality are often referred to as the Hidden Factory. This includes:

• Unhappy customers
• Schedule interruptions
• Fire Fighting
• Unnecessary Procedures
• Equipment Failures
• Extra Operations such as touch ups and trimming
• Distracted Engineers
• Expediting time
• Poorly performing product
• Extra inspection and testing
• Wasted materials and energy
• Sorting
• Extra inventory
• Unexplained budget variations
• Missed shipments
• Complaint Investigation Cost

Many companies only work on eliminating the visible costs. They usually miss the hidden ones. The hidden costs of quality defects are often hidden as an overhead cost which is smoothed over or thrown together with other costs, which means that the cost of the quality defect is difficult to identify. In 1977, the quality guru Armand Feigenbaum estimated the cost within the hidden factory can be 10% to 40% of total company effort. Many companies do not know where quality defects originate, nor how to identify them. Systematically focusing on identifying and eliminating visible as well as hidden costs for quality defects can give big results for relatively little effort.

## What is COPQ

The term Cost Of Poor Quality (COPQ) is also used to represent the above two costs. The COPQ can be estimated by multiplying the number of defects per period of time by the average unit cost to fix a defect (labour and materials). Such a benign calculation however omits costs such as loss of goodwill and loss of competitiveness, and other matters such as warranty costs and legal damages.

## What is a Fishbone diagram and Fishbone Analysis?

Fishbone diagram is an analysis tool to provide systematic way of understanding effects and the causes that create those effects. The design of the diagram looks like the skeleton of a fish, hence it is referred to as the fishbone diagram.

Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone diagram. Also referred to as the Ishikawa diagram. Also it is referred to as a cause-and-effect diagram.

It provides a great value in assisting teams in categorizing the many potential causes of problems or issues in a systematic way and helps identifying root causes.

## Fishbone diagram can be used when:

• The team needs to study a problem to determine the root cause
• Want to study all the possible reasons why a process is having difficulties, problems, or breakdowns in the initial stages of the process.
• Need to identify areas for data collection
• To study why a process is not performing properly and/or producing the expected results

## Creating a Fishbone Diagram

1. Draw a fishbone diagram
2. List the problem/issue to be studied in the head of the fish
3. Label each bone of the fish. The major categories typically used are:
• The 6 M’s: Methods, Machines, Materials, Manpower, Measurement, Management
• The 4 P’s: Place, Procedure, People, Policies
• The 4 S’s: Surroundings, Suppliers, Systems, Skills
4. Repeat this procedure with each factor under the category to produce sub-factors. Continue asking, “Why is this happening?” and put additional segments each factor and subsequently under each sub-factor.
5. Continue until you no longer get useful information as you ask, “Why is that happening?”
6. Analyze the results of the fishbone after team members agree that an adequate amount of detail has been provided under each major category. Do this by looking for those items that appear in more than one category. These become the ‘most likely causes”.
7. For those items identified as the “most likely causes”, the team should reach consensus on listing those items in priority order with the first item being the most probable” cause.

Just fill in your Effects and Causes to create your own Fish-bone diagram/chart/graph. Very useful for your six sigma projects.

## Six Sigma Defect Metrics – DPO, DPMO, PPM, DPU Conversion table

### What Is DPO? What Is DPMO?

A unit of product can be defective if it contains one or more defects. A unit of product can have more than one opportunity to have a defect.

• Determine all the possible opportunities for problems
• Pare the list down by excluding rare events, grouping similar defect types, and avoiding the trivial
• Define opportunities consistently between different locations

### Proportion Defective (p):

p = Number Of Defective Units / Total Number of Product Units

## Yield (Y1st-pass or Yfinal or RTY)

Y = 1 – p The Yield proportion can converted to a sigma value using the Z tables

### Defects Per Unit – DPU, or u in SPC

DPU = Number Of Defects / Total Number Of Product Units The probability of getting ‘r’ defects in a sample having a given dpu rate can be predicted with the Poisson Distribution.

### Defects Per Opportunity – DPO

DPO = no. of defects / (no. of units X no. of defect opportunities per unit)

### Defects Per Million Opportunities (DPMO, or PPM)

DPMO = dpo x 1,000,000 Defects Per Million Opportunities or DPMO can be then converted to sigma & equivalent Cp values in the next page. The DPMO, DPM, Sample Size, CI Calculator will help you calculate the metrics.

If there are 9 defects among 150 invoices, and there are 8 opportunities for errors for every invoice, what is the dpmo? dpu = no. of defects / total no. of product units = 9/150 = .06 dpu dpo = no. of defects / (no. of units X no. of defect opportunities per unit) = 9/(150 X 8) = .0075 dpo dmpo = dpo x 1,000,000 = .0075 X 1,000,000 = 7,500 dpmo What are the equivalent Sigma and CP values? See Sigma Table.

## Converting Yield to sigma & Cp Metrics – Example

Given: a proportion defective of 1%

• Yield = 1 – p = .990
• Z Table value for .990 = 2.32σ
• Estimate process capability by adding 1.5 σ to reflect the ‘real-world’ shift in the process mean 2.32σ + 1.5σ = 3.82σ
• This σ value can be converted to an equivalent CP by dividing it by 3σ : CP = 3.82σ/3σ = 1.27 Note: Cpk cannot be estimated by this method

## Sigma conversion table

This six sigma conversion table converts yield to dpmo, sigma, copq etc.

 Yield dpmo Sigma (σ) Cp Equiv. COPQ (Cost of Poor Quality) .840 160,000 2.50 0.83 40% .870 130,000 2.63 0.88 .900 100,000 2.78 0.93 .930 70,000 2.97 0.99 .935 65,000 3.01 1.00 .940 60,000 3.05 1.02 .945 55,000 3.10 1.03 30% .950 50,000 3.14 1.05 .955 45,000 3.20 1.06 .960 40,000 3.25 1.08 .965 35,000 3.31 1.10 .970 30,000 3.38 1.13 .975 25,000 3.46 1.15 .980 20,000 3.55 1.18 20% .985 15,000 3.67 1.22 .990 10,000 3.82 1.27 .995 5,000 4.07 1.36 .998 2,000 4.37 1.46 .999 1,000 4.60 1.53 10% .9995 500 4.79 1.60 .99975 250 4.98 1.66 5% .9999 100 5.22 1.74 .99998 20 5.61 1.87 .9999966 3.4 6.00 2.00

DPMO, DPM, Sample Size, CI Calculator will help you calculate the metrics.

## t Confidence Interval for a Variance – Example

Calculate a 95% C.I. on variance for a sample (n = 35) with an S of 2.3″

This interval represents the most likely distribution of population variances, given the sample’s size and variance. 95% of the time, the population’s variance will fall in this interval.

This Z Confidence Interval for Proportions applies to an average proportion (which is from a binomial distribution).

## Z Confidence Interval for Means – Example

Calculate a 95% C.I. on the mean for a sample (n = 35) with an x-bar of 15.6″and a known s of 2.3 ”

This interval represents the most likely distribution of population means, given the sample’s size, mean, and the population’s standard deviation. 95% of the time, the population’s mean will fall in this interval.

Use the t distribution for the confidence interval for a mean if the sample size n is relatively small (< 30), and/or s is not known. The confidence interval (C.I.) includes the shaded area under the curve in between the critical values, excluding the tail areas (the a risk). The entire curve represents the most likely distribution of population means, given the sample’s size, mean, and standard deviation.

Use the χ2 (chi-squared) distribution for the confidence interval for the variance The confidence interval (C.I.) includes the area under the curve in between the critical values, excluding the tail areas (the a risk). The entire curve represents the most likely distribution of population variances (sigma squared), given the sample’s size and variation.

## Six Sigma Z Confidence Intervals for Means

Z Confidence Interval for Means applies to a mean from a normal distribution of variable data. Use the normal distribution for the confidence interval for a mean if the sample size n is relatively large (= 30), and s is known. The confidence interval (C.I.) includes the shaded area under the curve in between the critical values, excluding the tail areas (the a risk). The entire curve represents the most likely distribution of population means, given the sample’s size, mean, and the population’s standard deviation.

Here we are making an assumption that the underlying data we are working with is distributed like the bell curve shown. The most common confidence interval used in industry is probably the 95% confidence interval. If we were to use its formula on many sets of data from the population, then 95% of the intervals would contain the unknown population mean that we are trying to estimate. And 5% of the intervals would not contain the population mean. 2.5% of the time, the interval would be low, and 2.5% of the time, the interval would be too high. The probability is 95% that the interval contains the population parameter. The 95% value is the confidence coefficient, or the degree of confidence. The end points of the interval are called the confidence limits. In the graphic on the top, the endpoints are defined by

## Confidence Limits

Confidence limits are the lower and upper boundaries of a confidence interval. In our Acme example, the limits were 20 and 24.

## Confidence Level

The confidence level is the probability value attached to a given confidence interval. It can be expressed as a percentage (in our example it is 95%) or a number (0.95).

## Confidence Interval for a Mean

A confidence interval for a mean is a range of values within which the mean (unknown population parameter) may lie. Examples of Confidence Interval for a Mean* A Web master who wishes to estimate her mean daily hits on a certain webpage. * An environmental health and safety officer who wants to estimate the mean monthly spills.

## Confidence Interval for the Difference between Two Means

A confidence interval for the difference between two means specifies a range of values within which the difference between the means of the two populations may lie. Examples of Confidence Interval for the Difference between Two Means* A Web master who wishes to estimate her difference in mean daily visitors between two websites. * An environmental health and safety officer who wants to estimate the difference in mean monthly spills between two production sites.

When we calculate a statistic for example, a mean, a variance, a proportion, or a correlation coefficient, there is no reason to expect that such point estimate would be exactly equal to the true population value, even with increasing sample sizes. There are always sampling inaccuracies, or error. In most Six Sigma projects, there are at least some descriptive statistics calculated from sample data. In truth, it cannot be said that such data are the same as the population’s true mean, variance, or proportion value.

There are many situations in which it is preferable instead to express an interval in which we would expect to find the true population value. This interval is called an interval estimate. A confidence interval is an interval, calculated from the sample data that is very likely to cover the unknown mean, variance, or proportion. For example, after a process improvement a sampling has shown that its yield has improved from 78% to 83%. But, what is the interval in which the population’s yield lies? If the lower end of the interval is 78% or less, you cannot say with any statistical certainty that there has been a significant improvement to the process.

There is an error of estimation, or margin of error, or standard error, between the sample statistic and the population value of that statistic. The confidence interval defines that margin of error. The next page shows a decision tree for selecting which formula to use for each situation. For example, if you are dealing with a sample mean and you do not know the population’s true variance (standard deviation squared) or the sample size is less than 30, than you use the t Distribution confidence interval. Each of these applications will be shown in turn.

## Six Sigma Confidence Intervals

When we calculate a statistic for example, a mean, a variance, a proportion, or a correlation coefficient, there is no reason to expect that such point estimate would be exactly equal to the true population value, even with increasing sample sizes. There are always sampling inaccuracies, or error. In most Six Sigma projects, there are at least some descriptive statistics calculated from sample data. In truth, it cannot be said that such data are the same as the population’s true mean, variance, or proportion value. There are many situations in which it is preferable instead to express an interval in which we would expect to find the true population value. This interval is called an interval estimate. A confidence interval is an interval, calculated from the sample data, that is very likely to cover the unknown mean, variance, or proportion. For example, after a process improvement a sampling has shown that its yield has improved from 78% to 83%. But, what is the interval in which the population’s yield lies? If the lower end of the interval is 78% or less, you cannot say with any statistical certainty that there has been a significant improvement to the process. There is an error of estimation, or margin of error, or standard error, between the sample statistic and the population value of that statistic. The confidence interval defines that margin of error. The next page shows a decision tree for selecting which formula to use for each situation. For example, if you are dealing with a sample mean and you do not know the population’s true variance (standard deviation squared) or the sample size is less than 30, than you use the t Distribution confidence interval. Each of these applications will be shown in turn.

## Confidence Intervals in Six Sigma Methodology

Confidence intervals are very important to Six Sigma methodology. To understand Confidence Intervals better, consider this example scenario: Acme Nelson, a leading market research firm conducts a survey among voters in USA asking them whom would they vote if elections were to be held today. The answer was a big surprise! In addition to Democrats and Republicans, there is this surprise independent candidate, John Doe who is expected to secure 22% of the vote. We asked Acme, how sure are you? In other words how accurate is this prediction? Their answer: “Well, we are 95% confident that John Doe will get 22% plus or minus 2% vote” In the statistical world, they are saying that John Doe will get a vote between 20% and 24% (also known is Confidence Range) with a probability of 95% (Confidence Level).

## Definition of Confidence Intervals

According to University of Glasgow Department of Statistics, Confidence Interval is defined as: A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter. In our Acme research example

•   The confidence interval is the range 20 to 24
• The confidence level is 95%
• The confidence limits are 20 (lower limit) and 24 (upper limit)
• The unknown population parameter is “What percentage of the total vote will John Doe Get”