Normal Distribution

In Part 1 of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results. Horse and Cart sign

The common data assumptions are: random samples, independence, normality, equal variance, stability, and that your measurement system is accurate and precise.

I addressed random samples and statistical independence last time. Now let’s consider the assumptions of Normality and Equal Variance.

What Is the Assumption of Normality?

Before you perform a statistical test, you should find out the distribution of your data. If you don’t, you risk selecting an inappropriate statistical test. Many statistical methods start with the assumption your data follow the normal distribution, including the 1- and 2-Sample t tests, Process Capability, I-MR, and ANOVA. If you don’t have normally distributed data, you might use an equivalent non-parametric test based on the median instead of the mean, or try the Box-Cox or Johnson Transformation to transform your non-normal data into a normal distribution.

Normal and Skewed Curves

But keep in mind that many statistical tools based on the assumption of normality do not actually require normally distributed data if the sample sizes are at least 15 or 20. But if sample sizes are less than 15 and the data are not normally distributed, the p-value may be inaccurate and you should interpret the results with caution.

There are several methods to determine normality in Minitab, and I’ll discuss two of the tools in this post: the Normality Test and the Graphical Summary.

Minitab’s Normality Test will generate a probability plot and perform a one-sample hypothesis test to determine whether the population from which you draw your sample is non-normal. The null hypothesis states that the population is normal. The alternative hypothesis states that the population is non-normal.

Choose Stat > Basic Statistics > Normality Test

Normality Test

When evaluating the distribution fit for the normality test:

The plotted points will roughly form a straight line. Some departure from the straight line at the tails may be okay as long as it stays within the confidence limits.
The plotted points should fall close to the fitted distribution line and pass the “fat pencil” test. Imagine a "fat pencil" lying on top of the fitted line: If it covers all the data points on the plot, the data are probably normal.
The associated Anderson-Darling statistic will be small.
The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).

The Anderson-Darling statistic is a measure of how far the plot points fall from the fitted line in a probability plot. The statistic is a weighted squared distance from the plot points to the fitted line with larger weights in the tails of the distribution. For a specified data set and distribution, the better the distribution fits the data, the smaller this statistic will be.

Minitab’s Descriptive Statistics with the Graphical Summary will generate a nice visual display of your data and calculate the Anderson-Darling & p-value. The graphical summary displays four graphs: histogram of data with an overlaid normal curve, boxplot, and 95% confidence intervals for both the mean and the median.

Choose Stat > Basic Statistics > Graphical Summary

Probability Plot

When interpreting a graphical summary report for normality:

The data will be displayed as a histogram. Look for how your data is distributed (normal or skewed), how the data is spread across the graph, and if there are outliers.
The associated Anderson-Darling statistic will be small.
The associated p-value will be larger than your chosen α-level (commonly chosen levels for α include 0.05 and 0.10).

For some processes, such as time and cycle data, the data will never be normally distributed. Non-normal data are fine for some statistical methods, but make sure your data satisfy the requirements for your particular analysis.

What Is the Assumption of Equal Variance?

In simple terms, variance refers to the data spread or scatter. Statistical tests, such as analysis of variance (ANOVA), assume that although different samples can come from populations with different means, they have the same variance. Equal variances (homoscedasticity) is when the variances are approximately the same across the samples. Unequal variances (heteroscedasticity) can affect the Type I error rate and lead to false positives. If you are comparing two or more sample means, as in the 2-Sample t-test and ANOVA, a significantly different variance could overshadow the differences between means and lead to incorrect conclusions.

Minitab offers several methods to test for equal variances. Consult Minitab Help to decide which method to use based on the type of data you have. You can also use the Minitab Assistant to check this assumption for you. (Tip: When using the Assistant, click “more” to see data collection tips and important information about how Minitab calculates your results.)

Hypothesis Assistant

After the analysis is performed, check the Diagnostic Report for the test interpretation and the Report Card for alerts to unusual data points or assumptions that were not met. (Tip: When performing the 2-Sample t test and ANOVA, the Assistant takes a more conservative approach and uses calculations that do not depend on the assumption of equal variance.)

Assistant Reports

The Real Reason You Need to Check the Assumptions

You will be putting a lot of time and effort into collecting and analyzing data. After all the work you put into the analysis, you want to be able to reach correct conclusions. Some analyses are robust to departures from these assumptions, but take the safe route and check! You want to be confident that you can tell whether observed differences between data samples are simply due to chance, or if the populations are indeed different!

It’s easy to put the cart before the horse and just plunge in to the data collection and analysis, but it’s much wiser to take the time to understand which data assumptions apply to the statistical tests you will be using, and plan accordingly.

In my next blog post, I will review the common assumptions about stability and the measurement system.

by Matt Barsalou, guest blogger

I know that Thanksgiving is always on the last Thursday in November, but somehow I failed to notice it was fast approaching until the Monday before Thanksgiving. This led to frantically sending a last-minute invitation, and a hunt for a turkey.

I live in Germany and this greatly complicated the matter. Not only is Thanksgiving not celebrated, but also actual turkeys are rather difficult to find.

turkey

I looked at a large grocery store’s website and found 15 types of cat and dog food that contain turkey, but the only human food I could find was one jar of baby food.

Close, but not close enough. I wanted a whole turkey, not turkey puree.

The situation was even more complicated due to language: Germans have one word for a male turkey and a different word for a female turkey. I did not realize there was a difference, so I wound up only looking for a male turkey. My conversation with the store clerk would sound like this if it were translated into English, where there is only one word commonly used for turkey:

Me: Do you carry turkey?

Clerk: No. We only have turkey.

Me: I don’t need turkey. I’m looking for turkey.

Clerk: Sorry, we don’t carry turkey, but we have turkey if you want it.

Me: No thank you. I need turkey, not turkey.

Eventually, I figured out what happened and returned to buy the biggest female turkey they had. It weighed 5 pounds.

This was not the first time I cooked a turkey, but my first attempt resulted in The Great Turkey Fireball of 1998. (Cooking tip: Don’t spray turkey juice onto the oven burner). My second attempt resulted in a turkey that still had ice in it after five hours in the oven. (Life hack: The inside of a turkey is a good place to keep ice from melting.)

This year, to be safe, I contacted an old friend who explained how to properly cook a turkey, but I was told I would need to figure out the cooking time on my own. This was not a problem...or so I thought. I looked online and found turkey cooking times for a stuffed turkey, but my turkey was too light to be included in the table.

Graphing the Data

I may not know much about cooking, but I do know statistics, so I decided to run a regression analysis to determine the correct cooking time for my bird. The weights and times were in a table for ranges so I selected the times that corresponded to the low and high weight ranges and entered the data into a Minitab worksheet as shown in Figure 1.

worksheet 1

Figure 1: Worksheet with weight and times

I like to look at my data before I analyze it so I created a scatterplot to see how time compares to weight. Go to Graph > Scatter Plot and select Simple. Enter Time as the Y variable and Weight as the X variable.

Visually, it looks as if there may be a relationship between weight and cooking time, so I then performed a regression analysis (see Fig. 2).

scatterplot of time vs. weight

Figure 2: Scatter plot of weight and times

Performing Regression Analysis

Go to Stat > Regression > Regression > Fit Regression Model... and select Time for the response and Weight as the continuous predictor. Click on Graphs and select Four in One, then OK out of the dialog boxes.

The P-value is < 0.05 and the adjusted r-squared (adjusted) is 97.04% so it looks like I have a good model for time versus weight (See Fig. 3).

regression analysis of time versus weight

Figure 3: Session window for regression analysis for time versus weight

The residual plots for time shown in Figure 4 include a normal probability plot with residuals that look like they are normally distributed. My data did not need to follow the normal distribution, but the residuals should. But something seemed odd to me when I looked at the other three plots. Suddenly, I was not so sure my model was as good as I thought it was.

Residual Plots for Time

Figure 4: Residual plots for time

Regression Analysis with the Assistant

I then used the Minitab Assistant to perform another regression analysis. Since I was uncertain about my first model, I could use the reports generated by the Assistant to better assess my data and the resulting analysis.

Go to Assistant > Regression and select Simple Regression. Select Time for the Y column and Weight for the X column and select OK.

The first report provided by the Minitab Assistant is the summary report, shown in Figure 5. The report indicates a statistically significant relationship between time and weight using an Alpha of 0.05. It also tells me that 99.8% of the variability in time is caused by weight. This does not match my previous results and I can see why: I previously performed linear regression and the Minitab Assistant identified a quadratic model for the data.

The regression equation is Y = 0.9281 +0.3738X -0.005902(X2).

Time = 0.9281 +0.3738(5) -0.005902(52) =

0.9281 + 1.869 – 1.8692 =

2.7971 – 0.0008708401 = 2.796 hours

That means the cooking time is 2 hours and 48 minutes.

regression for time vs. weight summary report

Figure 5: Summary report for time versus weight

Figure 6 depicts the model selection report, which includes a plot of the quadratic model and the r-squared (adjusted) for both the quadratic model and a linear model.

regression model selection report

Figure 6: Model Selection report for time versus weight

The diagnostic report in Figure 7 is used to assess the residuals and guidance on the interpretation of the report is provided on the right side.

regression for time vs weight diagnostic report

Figure 7: Diagnostic report for time versus weight

The prediction report in Figure 8 shows the prediction plot with the 95% prediction interval.

regression for time vs weight prediction report

Figure 8: Prediction report for time versus weight

The report card shown in Figure 8 helps us to assess the suitability of the data. Here, I saw a problem: my sample size was only six. Minitab still provided me with results, but it warned me that the estimate for the strength of the relationship may not be very precise due to the low number if values I used. Minitab recommended I use 40 or more values. My data did not include any unusual data points, but using less than 15 values means the P-value could be incorrect if my results were not normally distributed.

regression for time vs weight report card

Figure 9: Report card for time versus weight

It looks like my calculated cooking time may not be as accurate as I’d like it to be, but I don’t think it will be too far off since the relationship between weights and cooking time is so strong.

It is important to remember not to extrapolate beyond the data set when taking actions based on a regression model. My turkey weighs less than the lowest value used in the model, but I’m going to need to risk it. In such a situation, statistics alone will not provide us an answer on a platter (with stuffing and side items such as cranberry sauce and candied yams), but we can use the knowledge gained from the study to help us when making judgment calls based on expert knowledge or previous experience. I expect my turkey to be finished in around two and a half to three hours, but I plan to use a thermometer to help ensure I achieve the correct cooking time.

But first, it looks like I am going to need to perform a Type 1 Gage Study analysis, once I figure out how to use my kitchen thermometer.

About the Guest Blogger

Matthew Barsalou is a statistical problem resolution Master Black Belt at BorgWarner Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time, Statistics for Six Sigma Black Belts and The ASQ Pocket Guide to Statistics for Six Sigma Black Belts.

If you’re familiar with Lean Six Sigma, then you’re familiar with DMAIC.

DMAIC is the acronym for Define, Measure, Analyze, Improve and Control. This proven problem-solving strategy provides a structured 5-phase framework to follow when working on an improvement project.

This is the first post in a five-part series that focuses on the tools available in Minitab Statistical Software that are most applicable to each phase, beginning with Define.

The DEFINE Phase Defined

DMAIC begins once you have identified a problem to solve. The goal of this first phase is to define the project goals and customer deliverables. This includes developing a problem statement and identifying objectives, resources and project milestones.

Cause-and-Effect Diagram

Cause-and-Effect Diagram

Also known as a fishbone (because it resembles a fish skeleton) or Ishikawa (named after its creator Kaoru Ishikawa) diagram, this graphical brainstorming tool can help you and your team organize and investigate possible causes of a problem.

In a C&E diagram, the problem is identified on the far right, while the causes are arranged into major categories. For manufacturing applications, categories may include Personnel, Machines, Materials, Methods, Measurements, and Environment. Service applications often include Personnel, Procedures, and Policies.

Minitab location: Stat > Quality Tools > Cause-and-Effect

Pareto Chart

A Pareto chart is a basic quality tool used to highlight the most frequently occurring defects, or the most common causes for a defect.

This specialized type of bar chart is named for Vilfredo Pareto and his 80-20 rule. By ordering the bars from largest to smallest, a Pareto chart can help you separate the "vital few" from the "trivial many." These charts reveal where the largest gains can be made.

Minitab location: Stat > Quality Tools > Pareto Chart

Pareto Chart

Boxplot

Also known as a box-and-whisker plot, the boxplot shows you how a numeric, continuous variable is distributed, including its shape and variability. You can create a single boxplot for a single data set, or you can create multiple boxplots to compare multiple data sets. Boxplots also help you identify outliers.

Minitab location: Graph > Boxplot

Boxplot

Histogram

Like boxplots, histograms reveal the shape and spread of a data set, and can help you assess if your data follow a normal distribution, are left-skewed, right-skewed, unimodal, etc.

Histograms divide data into bins of a given range plotted along the horizontal x-axis, and display the number of data points within each bin on the vertical y-axis.

Minitab location: Graph > Histogram

Histogram

Run Chart

Run charts graph your data over time, presuming it was collected and recorded in chronological order. This special type of time series plot can be used to determine if there are any patterns and non-random behavior in your process, such as trends, oscillation, mixtures and clustering.

Minitab location: Stat > Quality Tools > Run Chart

Run Chart

Descriptive Statistics

This is the first non-graphical tool in this post. This tool provides a summary of your data, and can include such statistics as the mean, median, mode, minimum, maximum, standard deviation, range, etc.

Minitab location: Stat > Basic Statistics > Display Descriptive Statistics

Descriptive Statistics

Graphical Summary

This Minitab feature provides a comprehensive summary for a numeric, continuous dataset, including a histogram, boxplots, descriptive statistics, and more. It offers a comprehensive snapshot of your data and includes tests for normality, the mean, and the median.

Minitab location: Stat > Basic Statistics > Graphical Summary

Graphical Summary

No Two Projects are Created Equal

While this post focuses on the Define tools available in Minitab, other Define tools—such as the project charter and SIPOC—are available in Quality Companion. Not every project includes the same exact set of tools, so it’s quite possible that a given Define phase for a given project includes only a few of the tools above. Moreover, not all tools discussed above are used solely within the Define phase of DMAIC. For example, histograms may also be used in other phases.

The tools you will use within each phase depend upon the type of project you’re working on, leadership preferences, and what types of tools best communicate your data and the problem you’re trying to solve.

snow As we enter late December, snow is falling here on the East Coast of the United States. The official start to winter is on December 21, 2016, but it’s certainly not uncommon to see snowflakes flying before this date.

If you live in the U.S., you know the winter of 2015 was one for the record books. In fact, more than 90 inches of snow fell in Boston in the winter of 2015! Have you ever wondered how likely of an occurrence this was?

Dr. Diane Evans, Six Sigma Black Belt and professor of engineering management at Rose-Hulman Institute of Technology, and Thomas Foulkes, National Science Foundation Graduate Research Fellow in the electrical and computer engineering department at the University of Illinois at Urbana-Champaign, also wondered. They set out to explore the rarity of the 2015 Boston snowfall by examining University of Oklahoma meteorologist Sam Lillo’s estimate of the likelihood of this event occurring. Below I’ll outline some points from their article, A Statistical Analysis of Boston’s 2015 Record Snowfall.

Meteorologist’s Analysis of Boston’s Historic Snowfall in The Washington Post

Following this historic snowfall of 94.4 inches in a 30-day period in 2015 Lillo analyzed historical weather data from the Boston area from as far back as 1938 in order to determine the rarity of this event.

Lillo developed a simulated set of one million hypothetical Boston winters by sampling with replacement snowfall amounts gathered over 30-day periods. Eric Holthaus, a journalist with The Washington Post, reported that Lillo’s results indicated that winters like the 30 days of consecutive snowfall from January 24 to February 22, 2015 should “only occur approximately once every 26,315 years” in Boston:

Snowfall Image

To assess Lilo’s findings, Evans and Foulkes obtained snowfall amounts in a specified Boston location from 1891 to 2015 via the National Oceanic and Atmospheric Administration (NOAA) for comparison with his simulated data.

Recreating the Simulated Data

On March 15, 2015, the cumulative Boston snowfall of 108.6 inches surpassed the previous Boston record of 107.6 inches set in the winter of 1996. In the figure below, a graphical display of Boston snow statistics from 1938 to 2015 illustrates the quick rise in snowfall amounts in 2015 as compared to record setting snowfalls in years 1996, 1994, and 1948:

Snowfall Image 03

Also included in the figure is the annual average Boston snowfall through early June. The final tally on Boston’s brutal snowfall in 2015 clocked in at 110 inches!

The dashed rectangular region inserted in the graphic highlights the 30 days of snowfall from January 24 to February 22, 2015, which resulted in 94.4 inches of snow. In order to obtain hypothetical 30-day Boston snowfall amounts, Lillo first generated one million resampled winters by:

... stitching together days sampled from past winters. A three-day period was chosen, to represent the typical timescale of synoptic weather systems. In addition, to account for the effect of long-term pattern forcing, the random selection of 3-day periods was weighted by the correlation between consecutive periods. Anomalies tended to persist across multiple periods, such that there’s a better chance that a snowier than normal three days would follow a similarly snowy three days. This is well observed (and in extreme cases, like this year), so it’s important to include in the simulation.

After generating the one million resampled winters, Lillo recorded the snowiest 10-period stretches, i.e., 30 days, from each winter. Percentile ranges of the resampled distribution were compared to the distribution of observed winters to check the validity of the simulated data. In simulating the winters’ snowfalls in this manner, Lillo had to assume that consecutive winters and winter snow patterns within a particular year were independent and identically distributed (IID). Evans and Foulkes recognize that these assumptions are not necessarily valid.

Since they were unable to obtain Lillo’s simulated data and used actual historical data for their own Sigma Level calculations, Evans and Foulkes used a digitizer and the graphical display of Boston snow statistics above to simply create a “copy” of his data for further analysis.

Once they had data values for “Maximum 30-day snowfall (inches)” and “Number of winters,” they added them to Minitab and created histograms of the snowfall amounts with overlaid probability plots to offer reasonable distributions to fit the data:

Minitab Histograms

For more on how they used Minitab distributions to fit the snowfall data and how they determined Sigma levels for the 2015 Boston snowfall using Lilo’s data, check out the full article here.

And if you’re still in the mood for more winter weather, here’s a roundup of some of our other snowy statistical posts!

The language of statistics is a funny thing, but there usually isn't much to laugh at in the consequences that can follow when misunderstandings occur between statisticians and non-statisticians. We see these consequences frequently in the media, when new studies—that usually contradict previous ones—are breathlessly related, as if their findings were incontrovertible facts. words

Similar, though less visible, misinterpretations abound in meeting rooms throughout the business world. When people who work with data and know statistics share their analyses with colleagues who aren't well-versed in the world of data, the message that gets received may be very different than the one the analyst tried to send.

There are two equally vital solutions to this problem. One is encouraging and instilling greater statistical literacy in the population. Obviously, that's a big challenge that can't be solved by any one statistician or analyst. But we individuals can control the second solution, which is to pay more attention to how we present the results of our analyses, and enhance our sensitivity to the statistical knowledge possessed by our audiences.

I've written about the challenges of statistical communication before, but I've been thinking about it anew after a friend sent me a link to this post and subsequent discussion about replacing the term "statistical significance." I won't speculate on the likelihood of that proposal, but it felt like a good time to review some words or phrases that mean one thing in statistical vernacular, but may signify something very different in a popular context.

Here's what I came up with, presented in a tabular form:

Say the word... Statisticians mean... Most people mean... Assumptions Constraints within which we can do a particular analysis, such as data needing to follow a normal distribution. Bias, prejudices, opinions or foregone conclusions about the topic or question under discussion. Confidence A measurement of the uncertainty in a statistical analysis. The strength with which a person believes or places faith in his or her abilities or ideas. Confounded Variables whose effects cannot be distinguished. Confused, perplexed, or inconvenient. Critical value The cutoff point for a hypothesis test. An measurement, sum, or number with great practical importance—such as a minimum cash balance in a checking account. Dependent A variable that's beyond our control—such as the outcome of an experiment. An outcome or thing we can control or influence. "Going to the party is dependent on completing my work." Independent A factor we can control or manipulate. An outcome or thing we cannot control or influence. "They will make the decision independent of whatever we might recommend." Interaction When the level of one factor depends on the level of another. Communications and social engagements with others. Mean The sum of all the values in your data divided by the number of values (sX/n). An adjective signifying hostility or, in slang, positivity: "That mean response surprised us all." Mode The most frequent value in a data set. A manner or method of performing a task. "You'll finish faster if you change your operating mode." Median A data set's middle value. Intermediate or average. So-so. Normal Data that follow a bell-shaped curve. Something that is commonplace, ordinary, plain, or unexceptional. Power The capability to detect a significant effect. Degree of control or influence. Random A sample captured such that all individuals in a population have equal odds of selection. Unpredictable; beyond control. Range The difference between the lowest and highest values in a data set. An array or collection. Regression Predicting one variable based on the values of other variables. Retreat or loss. Moving backwards. Residuals The differences between observed and fitted values. Leftovers. Scraps. Significance The odds that the results observed are not just a chance result. Importance or seriousness.

Can you add to my list? What statistical terms have complicated your efforts to communicate results?

by Matthew Barsalou, guest blogger

The great Dr. Seuss tells of Mr. Plunger who is the custodian at Diffendoofer School on the corner of Dinkzoober and Dinzott in the town of Dinkerville. The good Mr. Plunger “keeps the whole school clean” using a supper-zooper-flooper-do.

Unfortunately, Dr. Seuss fails to tell us where the supper-zooper-flooper-do came from and if the production process was capable.

Let’s assume the broom boom length was the most critical dimension on the supper-zooper-flooper-do. The broom boom length drawing calls for a length of 55.0 mm with a tolerance of +/- 0.5 mm. The quality engineer has checked three supper-zooper-flooper-do broom booms and all were in specification, so he concludes that there is no reason to worry about the process producing out of specification parts. But we know this not true. Perhaps the fourth supper-zooper-flooper-do broom boom will be out of specification. Or maybe the 1,000th.

It’s time for a capability study, but don’t fire up your Minitab Statistical Software just yet. First we need to plan the capability study. Each day the supper-zooper-flooper-do factory produces supper-zooper-flooper-do broom booms with a change in broom boom material batch every 50th part. A capability study should have a minimum of 100 values and 25 subgroups. The subgroups should be rational: that means the variability within each subgroup should be less than the variability between subgroups. We can anticipate more variation between material batches than within a material batch so we will use the batches as subgroups, with a sample size of four.

Once the data has been collected, we can crank up our Minitab and perform a capability study by going to Stat > Quality Tools > Capability Analysis > Normal. Enter the column containing the measurement values. Then either enter the column containing the subgroup or type the size of the subgroup. Enter the lower specification limit and the upper specification limit, and click OK.

Process Capability Report for Broom Boom Length

We now have the results for the supper-zooper-flooper-do broom boom lengths, but can we trust our results? A capability study has requirements that must be met. We should have a minimum of 100 values and 25 subgroups, which we have. But the data should also be normally distributed and in a state of statistical control; otherwise, we either need to transform the data, or identify the distribution of the data and perform capability study for nonnormal data.

Dr. Seuss has never discussed transforming data so perhaps we should be hesitant if the data do not fit a distribution. Before performing a transformation, we should determine if there is a reason the data do not fit any distribution.

We can use the Minitab Capability Sixpack to determine if the data is normally distributed and in a state of statistical control. Go to Stat > Quality Tools > Capability Sixpack > Normal. Enter the column containing the measurement values. Then either enter the column containing the subgroup or type the size of the subgroup. Enter the lower specification limit and the upper specification limit and click OK.

Process Capability Sixpack Report for Broom Boom Length

There are no out-of-control points in the control chart and the P value is greater than 0.05 so we can reject the null hypothesis of “The data is not normally distributed.” The data is suitable for a capability study.

The within subgroup variation is also known as short term capability and is indicated by Cp and Cpk. The between subgroup variability is also known as long term capability is given as Pp and Ppk. The Cp and Cpk fail to account for the variability that will occur between batches; Pp and Ppk tell us what we can expect from the process over time.

Both Cp and Pp tell us how well the process conforms to the specification limits. In this case, a Cp of 1.63 tells us the spread of the data is much narrower than the width of the specification limits, and that is a good thing. But Cp and Pp alone are not sufficient. The Cpk and Ppk indicate how spread out the data is relative to the center of the specification limits. There is an upper and lower Cpk and Ppk; however, we are generally only concerned with the lower of the two values.

In the supper-zooper-flooper-do broom boom length example, a Cpk of 1.10 is an indication that the process is off center. The Cpk is 1.63, so we can reduce the number of potentially out-of-specification supper-zooper-flooper-do broom booms if we shift the process mean down to center the process while maintaining the current variation. This is a fortunate situation as it is often easier to shift the process mean than to reduce the process variation.

Once improvements are implemented and verified, we can be sure that the next supper-zooper-flooper-do the Diffendoofer School purchases for Mr. Plunger will have a broom boom that is in specification if only common cause variation is present.

About the Guest Blogger

In my last post on DMAIC tools for the Define phase, we reviewed various graphs and stats typically used to define project goals and customer deliverables. Let’s now move along to the tools you can use in Minitab Statistical Software to conduct the Measure phase.

Measure Phase Methodology

The goal of this phase is to measure the process to determine its current performance and quantify the problem. This includes validating the measurement system and establishing a baseline process capability (i.e., sigma level).

I. Tools for Continuous Data Gage R&R

Before you analyze your data, you should first make sure you can trust it, which is why successful Lean Six Sigma projects begin the Measure phase with Gage R&R. This measurement systems analysis tool assesses if measurements are both repeatable and reproducible. And there are Gage R&R studies available in Minitab for both destructive and non-destructive tests.

Minitab location:Stat > Quality Tools > Gage Study > Gage R&R Study OR Assistant > Measurement Systems Analysis.

Gage Linearity and Bias

When assessing the validity of our data, we need to consider both precision and accuracy. While Gage R&R assesses precision, it’s Gage Linearity and Bias that tells us if our measurements are accurate or are biased.

Minitab location: Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study.

Gage Linearity and Bias

Distribution Identification

Many statistical tools and p-values assume that your data follow a specific distribution, commonly the normal distribution, so it’s good practice to assess the distribution of your data before analyzing it. And if your data don’t follow a normal distribution, do not fear as there are various techniques for analyzing non-normal data.

Minitab location: Stat > Basic Statistics > Normality Test OR Stat > Quality Tools > Individual Distribution Identification.

Distribution Identification

Capability Analysis

Capability analysis is arguably the crux of “Six Sigma” because it’s the tool for calculating your sigma level. Is your process at a 1 Sigma, 2 Sigma, etc.? It reveals just how good or bad a process is relative to specification limit(s). And in the Measure phase, it’s important to use this tool to establish a baseline before making any improvements.

Minitab location: Stat > Quality Tools > Capability Analysis/SixpackOR Assistant > Capability Analysis.

Process Capability Analysis

II. Tools for Categorical (Attribute) Data Attribute Agreement Analysis

Like Gage R&R and Gage Linearity and Bias studies mentioned above for continuous measurements, this tool helps you assess if you can trust categorical measurements, such as pass/fail ratings. This tool is available for binary, ordinal, and nominal data types.

Minitab location: Stat > Quality Tools > Attribute Agreement AnalysisOR Assistant > Measurement Systems Analysis.

Capability Analysis (Binomial and Poisson)

If you’re counting the number of defective items, where each item is classified as either pass/fail, go/no-go, etc., and you want to compute parts per million (PPM) defective, then you can use binomial capability analysis to assess the current state of the process.

Or if you’re counting the number of defects, where each item can have multiple flaws, then you can use Poisson capability analysis to establish your baseline performance.

Minitab location:Stat > Quality Tools > Capability Analysis OR Assistant > Capability Analysis.

Binomial Process Capability

Variation is Everywhere

As I mentioned in my last post on the Define phase, Six Sigma projects can vary. Every project does not necessarily use the same identical tool set every time, so the tools above merely serve as a guide to the types of analyses you may need to use. And there are other tools to consider, such as flowcharts to map the process, which you can complete using Minitab’s cousin, Quality Companion.

T'was the season for toys recently, and Christmas day found me playing around with a classic, the Etch-a-Sketch. As I noodled with the knobs, I had a sudden flash of recognition: my drawing reminded me of the Empirical CDF Plot in Minitab Statistical Software. Did you just ask, "What's a CDF plot? And what's so empirical about it?" Both very good questions. Let's start with the first, and we'll save that second question for a future post. etch-a-sketch

The acronym CDF stands for Cumulative Distribution Function. If, like me, you're a big fan of failures, then you might be familiar with the cumulative failure plot that you can create with some Reliability/Survival tools in Minitab. (For an entertaining and offbeat example, check out this excellent post, What I Learned from Treating Childbirth as a Failure.) The cumulative failure plot is a CDF.

Even if you're not a fan of failure plots and CDFs, you're likely very familiar with the CDF's famous cousin, the PDF or Probability Density Function. The classic "bell curve" is no more (and no less) than a PDF of a normal distribution.

For example, here's a histogram with a fitted normal PDF for PinLength.MTW, from Minitab's online Data Set Library.

zzz

To create this plot, do the following:

Download the data file, PinLength.MTW, and open it in Minitab.
Choose Graph > Histogram > With Fit, and click OK.
In Graph variables, enter Length.
Click the Scale button.
On the Y-Scale Type tab, choose Percent.
Click OK in each dialog box.

The data are from a sample of 100 connector pins. The histogram and fitted line show that the lengths of the pins (shown on the x-axis) roughly follow a normal distribution with a mean of 19.26 and a standard deviation of 2.154. You can get the specifics for each bin of the histogram by hovering over the corresponding bar.

zzz

The height of each bar represents the percentage of observations in the sample that fall within the specified lengths. For example, the fifth bar is the tallest. Hovering over the fifth bar reveals that 18% of the bins have lengths that fall between 18.5 mm to 19.5 mm. Remember that for a moment.

Now let's try something a little different.

Double-click the y-axis.
On the Type tab, select Accumulate values across bins.
Click OK.

zzz

It looks very different, but it's the exact same data. The difference is that the bar heights now represent cumulative percentages. In other words, each bar represents the percentage of pins with the specified lengths or smaller.

zzz zzz

For example, the height of the fifth bar indicates that 55% of the pin lengths are less than 19.5 mm. The height of the fourth bar indicates that 37% of pin lengths are 18.5 or less. The difference in height between the 2 bars is 18, which tells us that 18% of the pins have lengths between 18.5 and 19.5. Which, if you remember, we already knew from our first graph. So the cumulative bars look different, but it's just another way of conveying the same information.

You may have also noticed that the fitted line no longer looks like a bell curve. That's because when we changed to a cumulative y-axis, Minitab changed the fitted line from a PDF to... you guessed it, a cumulative distribution function (CDF). Like the cumulative bars, the cumulative distribution function represents the cumulative percentage of observations that have values less than or equal to X. Basically, the CDF of a distribution gives us the cumulative probabilities from the PDF of the same distribution.

I'll show you what I mean. Choose Graph > Probability Distribution Plot > View Probability, and click OK. Then enter the parameters and x-value as shown here, and click OK.

zzz zzz

zzz

The "Left Tail" probabilities are cumulative probabilities. The plot tells us that the probability of obtaining a random value that is less than or equal to 16 is about 0.065. That's another way of saying that 6.5% of the values in this hypothetical population are less than or equal to 16.

Now we can create a CDF using the same parameters:

Choose Graph > Empirical CDF > Single and click OK.
In Graph variables, enter Length.
Click the Distribution button.
On the Data Display tab, select Distribution fit only.
Click OK, then click the Scale button.
On the Percentile Lines tab, under Show percentile lines at data values, enter 16.

zzz

The CDF tells us that 6.5% of the values in this distribution are less than or equal to 16, as did the PDF.

Let's try another. Double-click the shaded area on the PDF and change x to 19.26, which is the mean of the distribution.

zzz

Naturally, because we're dealing with a perfect theoretical normal distribution here, half of the values in the hypothetical population are less than or equal to the mean. You can also visualize this on the CDF by adding another percentile line. Click the CDF and choose Editor > Add > Percentile Lines. Then enter 19.26 under Show percentile lines at data values.

zzz

There's a little bit of rounding error, but the CDF tells us the same thing that we learned from the PDF, namely that 50% of the values in the distribution are less than or equal to the mean.

Finally, let's input a probability and determine the associated x-value. Double-click the shaded area on the PDF, but this time enter a probability of 0.95 as shown:

zzz

The PDF shows that the x-value that is associated with a cumulative probability of 0.5 is 22.80. Now right-click the CDF and choose Add > Percentile Lines. This time, under Show percentile lines at Y values, enter 95 for 95%.

zzz

Once again, other than a little rounding error, the CDF tells us the same thing as the PDF.

For most people (maybe everyone?), the PDF is an easier way to visualize the shape of a distribution. But the nice thing about the CDF is that there's no need to look up probabilities for each x-value individually: all of the x-values in the distribution and the associated cumulative probabilities are right there on the curve.

by Matthew Barsalou, guest blogger.

The old saying “if it walks like a duck, quacks like a duck and looks like a duck, then it must be a duck” may be appropriate in bird watching; however, the same idea can’t be applied when observing a statistical distribution. The dedicated ornithologist is often armed with binoculars and a field guide to the local birds and this should be sufficient. A statologist (I just made the word up, feel free to use it) on the other hand, is ill-equipped for the visual identification of his or her targets.

Normal, Student's t, Chi-Square, and F Distributions

Notice the upper two distributions in figure 1. The normal distribution and student’s t distribution may appear similar. However, the standard normal distribution is calculated using n and student’s t distribution is calculated using n-1. This may appear to be a minor difference, but when n is small, student’s t distribution displays much more peakedness. Student’s t distribution approaches the normal distribution as the sample size increases, but it never truly matches the shape of the normal distribution.

Observe the Chi-square and F distribution in the lower half of figure 1. The shapes of the distributions can vary and even the most astute observer will not be able to differentiate between them by eye. Many distributions can be sneaky like that. It is a part of their nature that we must accept as we can’t change it.

Distribution Field Guide Figure 1 Figure 1

Binomial, Hypergeometric, Poisson, and Laplace Distributions

Notice the distributions illustrated in figure 2. A bird watcher may suddenly encounter four birds sitting in a tree; a quick check of a reference book may help to determine that they are all of a different species. The same can’t always be said for statistical distributions. Observe the binomial distribution, hypergeometric distribution and Poisson distribution. We can’t even be sure the three are not the same distribution. If they are together with a Laplace distribution, an observer may conclude “one of these does not appear to be the same as the others.” But they are all different, which our eyes alone may fail to tell us.

Distribution Field Guide Figure 2 Figure 2

Weibull, Cauchy, Loglogistic, and Logistic Distributions

Suppose we observe the four distributions in figure 3.What are they? Could you tell if they were not labeled? We must identify them correctly before we can do anything with them. One is a Weibull distribution, but all four could conceivably be various Weibull distributions. The shape of the Weibull distribution varies based upon the shape parameter (κ) and scale parameter (λ).The Weibull distribution is a useful, but potentially devious distribution that can be much like the double-barred finch, which may be mistaken for an owl upon first glance.

Distribution Field Guide Figure 3 Figure 3

Attempting to visually identify a statistical distribution can be very risky. Many distributions such as the Chi-Square and F distribution change shape drastically based on the number of degrees of freedom. Figure 4 shows various shapes for the Chi-Square, F distribution and the Weibull distribution. Figure 4 also compares a standard normal distribution with a standard deviation of one to a t distribution with 27 degrees of freedom; notices how the shapes overlap to the point where it is no longer possible to tell the two distributions apart.

Although there is no definitive Field Guide to Statistical Distributions to guide us, there are formulas available to correctly identify statistical distributions. We can also use Minitab Statistical Software to identify our distribution.

Distribution Field Guide Figure 4 Figure 4

Go to Stat > Quality Tools > Individual Distribution Identification... and enter the column containing the data and the subgroup size. The results can be observed in either the session window (figure 5) or the graphical outputs shown in figures 6 through 9.

In this case, we can conclude we are observing a 3-parameter Weibull distribution based on the p value of 0.364.

Distribution Field Guide Figure 5

Figure 5

Distribution Field Guide Figure 6 Figure 6

Distribution Field Guide Figure 7 Figure 7

Figure 8

Figure 9

About the Guest Blogger

cake!

As a person who loves baking (and eating) cakes, I find it bothersome to go through all the effort of baking a cake when the end result is too dry for my taste. For that reason, I decided to use a designed experiment in Minitab to help me reduce the moisture loss in baked chocolate cakes, and find the optimal settings of my input factors to produce a moist baked chocolate cake. I’ll share the details of the design and the results in this post.

Choosing Input Factors for the Designed Experiment

Because I like to use premixed chocolate cake mixes, I decided to use two of my favorite cake mix brands for the experiment. For the purpose of this post, I’ll call the brands A and B. Thinking about what could impact the loss of moisture, it is likely that the baking time and the oven temperature will affect the results. Therefore, the factors or inputs that I decided to use for the experiment are:

Cake mix brand: A or B (categorical data)
Oven temperature: 350 or 380 degrees Fahrenheit (continuous data)
Baking time: 38 or 46 minutes (continuous data)

Measuring the Response

Next, I needed a way to measure the moisture loss. For this experiment, I used an electronic food scale to weigh each cake (in the same baking pan) before and after baking, and then used those weights in conjunction with the formula below to calculate the percent of moisture lost for each cake:

% Moisture Loss = 100 x initial weight – final weight
initial weight

Designing the Experiment

For this experiment, I decided to construct a 23 full factorial design with center points to detect any possible curvature in the response surface. Since the cake mix brand is categorical and therefore has no center point between brand A and brand B, the number of center points will be doubled for that factor. Because of this, I’d have to bake 10 cakes which, even for me, is too many in a single day. Therefore, I decided to run the experiment over two days. Because differences between the days on which the data was collected could potentially introduce additional variation, I decided to add a block to the design to account for any potential variation due to the day.

To create my design in Minitab, I use Stat> DOE> Factorial > Create Factorial Design:

select create factorial design

Minitab 17 makes it easy to enter the details of the design. First, I selected 3 as the number of factors:

select three factors

Next, I clicked on the Designs button above. In the Designs window, I can tell Minitab what type of design I’d like to use with my 3 factors:

select type of design

In the window above, I’ve selected a full 23 design, and also added 2 blocks (to account for variation between days), and 1 center point per block. After making the selections and clicking OK in the above window, I clicked on the Factors button in the main window to enter the details about each of my factors:

factors

Because center points are doubled for categorical factors, and because this design has two blocks, the final design will have a total of 4 center points. After clicking OK in the window above, I ended up with the design shown below with 12 runs:

design data

Performing the Experiment and Analyzing the Data

After spending an entire weekend baking cakes and calculating the moisture loss for each one, I entered the data into Minitab for the analysis. I also brought in a lot of cake to share with my colleagues at Minitab!

data

With the moisture loss for each of my 12 cakes recorded in column C8 in the experiment worksheet, I’m ready to analyze the results.

In Minitab, I used Stat> DOE> Factorial > Analyze Factorial Design... and then entered the Moisture Loss column in the Responses field:

Analyze factorial DOE

In the window above, I also clicked on Terms to make sure I’m only including the main effects and two-way interactions. After clicking OK in each window, Minitab produced a Pareto chart of the standardized effects that I could use to reduce my model:

pareto of standardized effects

I can see from the above graph that the main effects (A, B and C) all significantly impact the moisture of the cake, since the bars that represent those terms on the graph extend beyond the red vertical reference line. All of the two-way interactions (AB, AC and BC) are not significant.

I can also see the same information in the ANOVA table in Minitab’s session window:

ANOVA results

In the above ANOVA table, we can see that the cake mix brand, oven temp, and baking time are all significant since their p-values are lower than my alpha of 0.05.

We can also see that all of the 2-way interactions have p-values higher than 0.05, so I’ll conclude that those interactions are not significant and should be removed from the model.

Interestingly, the p-value for the blocksis significant (with a p-value of 0.01). This indicates that there was indeed a difference between the two days in which the data was collected which impacted the results. I'm glad I accounted for that additional variation by including a block in my design!

Analyzing the Reduced Model

To analyze my reduced model, I can go back to Stat> DOE> Factorial > Analyze Factorial Design. This time when I click the Terms button I’ll keep only the main effects, and remove the two-way interactions. Minitab displays the following ANOVA table for the reduced model:

ANOVA for reduced model

The table shows that all the terms I’ve included (mix brand, oven temp, and baking time) are significant since all the p-values for these terms are lower than 0.05. We can also see that the test for curvature based on the center points is not significant (p-value = 0.587), so we can conclude that the relationship between the three factors and moisture loss is linear.

The r-squared, r-squared adjusted, and r-squared predicted are all quite high, so this model seems to be a very good fit to the data.

Checking the Residuals

Now I can take a look at the residual plots to make sure all the model assumptions for my model have been met:

residual plots

The residuals in the graph above appear to be normally distributed. The residuals versus fits graph appears to show the points are randomly scattered above and below 0 (which indicates constant variance), and the residuals versus order graph doesn’t suggest any patterns that could be due to the order in which the data was collected.

Now that I'm confident the assumptions for the model have been met, I’ll use this model to determine the optimal settings of my factors so that going forward all the cakes I make will be moist and fabulous!

Optimizing the Response

I can use Minitab’s Response Optimizer and my model to tell me exactly what combination of cake mix brand, oven temperature, and baking time I’ll want to use to get the moistest cake. I select Stat> DOE> Factorial> Response Optimizer:

response optimizer dialog

In the above window, I can tell Minitab what my goal is. In this case, I want to know what input settings to use so that the moisture loss will be minimized. Therefore, I choose Minimize above and then click OK:

response optimizer

In the above graph, the optimal settings for my factors are marked in red near the top. Using the model that I’ve fit to my data, Minitab is telling me that I can use Brand B with an oven temperature of 350 and a baking time of 38 minutes to minimize the moisture loss. Using those values for the inputs, I can expect the moisture loss will be approximately 3.3034, which is quite low compared to the moisture loss for the cakes collected as part of the experiment.

Success! Now I can use these optimal settings, and I’ll never waste my time baking a dry cake again.

If you’ve enjoyed this post about DOE, you may also like to read some of our other DOE blog posts.

Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no ordinary game of roulette. This is p-value roulette!

Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.

What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!

I’m sorry, but we can’t tell you which wheel we’re spinning.

Doesn’t that sound like a good game?

Not convinced yet? I assure you the odds are in your favor if you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—if we happen to spin the Null wheel.

histogram of p values for null hypothesis

What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:

histogram of p values from alternative hypothesis

And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.

histogram of p-values from popular alternative hypothesis

Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.

I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.

So, you’d like to play? Great! Which slot would you like to bet on?

Is this on the level?

No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a 1-sample t-test. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.

For just about any hypothesis test you do in Minitab Statistical Software, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.

Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.
If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A Type I error occurs if you incorrectly reject a true null hypothesis.
If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.
It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.
In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.
The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.

You Too Can Be a Winner!

To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s Assistant menu can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.

by Kevin Clay, guest blogger

In transactional or service processes, we often deal with lead-time data, and usually that data does not follow the normal distribution. why be normal

Consider a Lean Six Sigma project to reduce the lead time required to install an information technology solution at a customer site. It should take no more than 30 days—working 10 hours per day Monday–Friday—to complete, test and certify the installation. Following the standard process, the target lead time should be around 24 days.

Twenty-four days may be the target, but we know customer satisfaction increases as we complete the installation faster. We need to understand our baseline capability to meet that demand, so we can perform a capability analysis.

We know our data should fit a non-normal (positively skewed) distribution. It should resemble a ski-slope like the picture below:

In this post, I will cover five simple steps to understand the capability of a non-normal process to meet customer demands.

1. Collect data

First we must gather data from the process. In this scenario, we are collecting sample data. We pull 100 samples that cover the full range of variation that occurs in the process.

In this case the full range of variation comes from three installation teams. We will take at least 30 data points from each team.

2. Identify the Shape of the Distribution

We know that the data should fit a non-normal distribution. As Lean Six Sigma practitioners, we must prove our assumption with data. In this case, we can conduct a normality test to prove non-normality.

We are using Minitab as the statistical analysis tool, and our data are available in this worksheet. (If you want to follow along and don't already have it, download the free Minitab trial.)

From the menu, select “Normality Test” found under “Stat > Basic Statistics > Normality Test …”

Populate the “Variable:” field with LeadTime, and click OK as shown:

normality test dialog

You should get the following Probability Plot:

probability plot of lead time

Since the P-value (outlined in yellow in the above picture) is less than .05, we assume with 95% confidence the data fits a non-normal distribution.

3. Verify Stability

In a Lean Six Sigma project, we might find the answer to our problem anywhere on the DMAIC roadmap. Belts need to learn to look for the signals all throughout the project.

In this case, signals can come from instability in our process. They show up as red dots on a control chart.

To see if this lead time process is stable, we will run an I-MR Chart. In Minitab, select Stat > Control Charts > Variables Charts for Individuals > I-MR…”

Populate “Varibles:” with “LeadTime” in the dialog as shown below:

I-MR Chart dialog

Press OK, and you'll get the following “I-MR Chart of LeadTime”:

I-MR Chart of Lead Time

The I-MR Chart shows two signal of instability (shown as red dots) on both the Individual Chart on the top of the graph, and the Moving Range Chart on the bottom.

These data points indicate abnormal variation, and their cause should be investigated. These signals could offer great insight into the problem you are trying to solve. Once identified and resolved the causes of these points, you can take additional data or remove the points from the data set.

In this scenario, we will leave the two points in the data set (we will not remove the two points)

4. What Non-Normal Distribution Does the Data Best Fit?

There are several non-normal data distributions that the data could fit, so we will use a tool in Minitab to show us which distribution fits the data best. Open the “Individual Distribution Identification” dialog by going to Stat > Quality Tools > Individual Distribution Identification…

Populate “Single column:” and Subgroup size:” as follows:

individual distribution identification dialog

Minitab will output the four graphs shown below. Each graph includes four different distributions:

probability ID plots 1

probability id plots 2

probability id plots 3

probability id plots 4

Pick the distribution with the Largest P-Value (excluding the Johnson Transformation and the Box Cox Transformation). In this scenario, the exponential distribution fits the data best.

5. What Is the Process Capability?

Now that we know the distribution that best fits these data, we can perform the non-normal capability analysis. In Minitab, select Stat > Quality Tools > Capability Analysis > Nonnormal…

Populate the “Capability Analysis (Nonnormal Distribution)” dialogue box as seen below. Make sure to select “Exponential” next to Fit distribution. Then, Click on “Options”.

capability analysis dialog

Fill in the “Capability Analysis (Non Normal Distribution): Options” dialogue box with the following:

capability analysis options dialog

We chose “Percents” over “Parts Per Million” because in this scenario it would take years to produce over one million outputs (or data for each installation time).

OK out of the options and main dialog boxes, and you should get the following “Process Capability Report for LeadTime”:

process capability of lead time

We interpret the results of a non-normal capability analysis just as we do an analysis done on data with a normal distribution.

Capability is determined by comparing the width of the process variation (VOP) to the width of the specification (VOC).We would like the process spread to be smaller than, and contained within, the specification spread.

That’s clearly not the case with this data.

The Overall Capability index on the right side of the graph depicts how the process is performing relative to the specification limits.

To quickly determine whether the process is capable, compare Ppk with your minimum requirement for the indices. Most quality professionals consider 1.33 to be a minimum requirement for a capable process. A value less than 1 is usually considered unacceptable.

With a Ppk of .23, it seems our IT Installation Groups have work ahead to get their process to meet customer specifications. At least these data offer a clear understanding of how much the process can be improved!

About the Guest Blogger:

Kevin Clay is a Master Black Belt and President and CEO of Six Sigma Development Solutions, Inc., certified as an Accredited Training Organization with the International Association of Six Sigma Certification (IASSC). For more information visit www.sixsigmadsi.com or contact Kevin at 866-922-6566 or kclay@sixsigmadsi.com.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

In Parts 1 and 2 of Gauging Gage we looked at the numbers of parts, operators, and replicates used in a Gage R&R Study and how accurately we could estimate %Contribution based on the choice for each. In doing so, I hoped to provide you with valuable and interesting information, but mostly I hoped to make you like me. I mean like me so much that if I told you that you were doing something flat-out wrong and had been for years and probably screwed somethings up, you would hear me out and hopefully just revert back to being indifferent towards me.

For the third (and maybe final) installment, I want to talk about something that drives me crazy. It really gets under my skin. I see it all of the time, maybe more often than not. You might even do it. If you do, I'm going to try to convince you that you are very, very wrong. If you're an instructor, you may even have to contact past students with groveling apologies and admit you steered them wrong. And that's the best-case scenario. Maybe instead of admitting error, you will post scathing comments on this post insisting I am wrong and maybe even insulting me despite the evidence I provide here that I am, in fact, right.

Let me ask you a question:

When you choose parts to use in a Gage R&R Study, how do you choose them?

If your answer to that question required anymore than a few words - and it can be done in one word—then I'm afraid you may have been making a very popular but very bad decision. If you're in that group, I bet you're already reciting your rebuttal in your head now, without even hearing what I have to say. You've had this argument before, haven't you? Consider whether your response was some variation on the following popular schemes:

Sample parts at regular intervals across the range of measurements typically seen
Sample parts at regular intervals across the process tolerance (lower spec to upper spec)
Sample randomly but pull a part from outside of either spec

#1 is wrong. #2 is wrong. #3 is wrong.

You see, the statistics you use to qualify your measurement system are all reported relative to the part-to-part variation and all of the schemes I just listed do not accurately estimate your true part-to-part variation. The answer to the question that would have provided the most reasonable estimate?

"Randomly."

But enough with the small talk—this is a statistics blog, so let's see what the statistics say.

In Part 1 I described a simulated Gage R&R experiment, which I will repeat here using the standard design of 10 parts, 3 operators, and 2 replicates. The difference is that in only one set of 1,000 simulations will I randomly pull parts, and we'll consider that our baseline. The other schemes I will simulate are as follows:

An "exact" sampling - while not practical in real life, this pulls parts corresponding to the 5th, 15th, 25, ..., and 95th percentiles of the underlying normal distribution and forms a (nearly) "exact" normal distribution as a means of seeing how much the randomness of sampling affects our estimates.
Parts are selected uniformly (at equal intervals) across a typical range of parts seen in production (from the 5th to the 95th percentile).
Parts are selected uniformly (at equal intervals) across the range of the specs, in this case assuming the process is centered with a Ppk of 1.
8 of the 10 parts are selected randomly, and then one part each is used that lies one-half of a standard deviation outside of the specs.

Keep in mind that we know with absolute certainty that the underlying %Contribution is 5.88325%.

Random Sampling for Gage

Let's use "random" as the default to compare to, which, as you recall from Parts 1 and 2, already does not provide a particularly accurate estimate:

Pct Contribution with Random Sampling

On several occasions I've had people tell me that you can't just sample randomly because you might get parts that don't really match the underlying distribution.

Sample 10 Parts that Match the Distribution

So let's compare the results of random sampling from above with our results if we could magically pull 10 parts that follow the underlying part distribution almost perfectly, thereby eliminating the effect of randomness:

Random vs Exact

There's obviously something to the idea that the randomness that comes from random sampling has a big impact on our estimate of %Contribution...the "exact" distribution of parts shows much less skewness and variation and is considerably less likely to incorrectly reject the measurement system. To be sure, implementing an "exact" sample scheme is impossible in most cases...since you don't yet know how much measurement error you have, there's no way to know that you're pulling an exact distribution. What we have here is a statistical version of chicken-and-the-egg!

Sampling Uniformly across a Typical Range of Values

Let's move on...next up, we will compare the random scheme to scheme #2, sampling uniformly across a typical range of values:

Random vs Uniform Range

So here we have a different situation: there is a very clear reduction in variation, but also a very clear bias. So while pulling parts uniformly across the typical part range gives much more consistent estimates, those estimates are likely telling you that the measurement system is much better than it really is.

Sampling Uniformly across the Spec Range

How about collecting uniformly across the range of the specs?

Random vs Uniform Specs

This scheme results in an even more extreme bias, with qualifying this measurement system a certainty and in some cases even classifying it as excellent. Needless to say it does not result in an accurate assessment.

Selectively Sampling Outside the Spec Limits

Finally, how about that scheme where most of the points are taken randomly but just one part is pulled from just outside of each spec limit? Surely just taking 2 of the 10 points from outside of the spec limits wouldn't make a substantial difference, right?

Random vs OOS

Actually those two points make a huge difference and render the study's results meaningless! This process had a Ppk of 1 - a higher-quality process would make this result even more extreme. Clearly this is not a reasonable sampling scheme.

Why These Sampling Schemes?

If you were taught to sample randomly, you might be wondering why so many people would use one of these other schemes (or similar ones). They actually all have something in common that explains their use: all of them allow a practitioner to assess the measurement system across a range of possible values. After all, if you almost always produce values between 8.2 and 8.3 and the process goes out of control, how do you know that you can adequately measure a part at 8.4 if you never evaluated the measurement system at that point?

Those that choose these schemes for that reason are smart to think about that issue, but just aren't using the right tool for it. Gage R&R evaluates your measurement system's ability to measure relative to the current process. To assess your measurement system across a range of potential values, the correct tool to use is a "Bias and Linearity Study" which is found in the Gage Study menu in Minitab. This tool establishes for you whether you have bias across the entire range (consistently measuring high or low) or bias that depends on the value measured (for example, measuring smaller parts larger than they are and larger parts smaller than they are).

To really assess a measurement system, I advise performing both a Bias and Linearity Study as well as a Gage R&R.

Which Sampling Scheme to Use?

In the beginning I suggested that a random scheme be used but then clearly illustrated that the "exact" method provides even better results. Using an exact method requires you to know the underlying distribution from having enough previous data (somewhat reasonable although existing data include measurement error) as well as to be able to measure those parts accurately enough to ensure you're pulling the right parts (not too feasible...if you know you can measure accurately, why are you doing a Gage R&R?). In other words, it isn't very realistic.

So for the majority of cases, the best we can do is to sample randomly. But we can do a reality check after the fact by looking at the average measurement for each of the parts chosen and verifying that the distribution seems reasonable. If you have a process that typically shows normality and your sample shows unusually high skewness, there's a chance you pulled an unusual sample and may want to pull some additional parts and supplement the original experiment.

Thanks for humoring me and please post scathing comments below!

see Part I of this series
see Part II of this series

If you have a process that isn’t meeting specifications, using the Monte Carlo simulation and optimization tool in Companion by Minitab can help. Here’s how you, as a chemical technician for a paper products company, could use Companion to optimize a chemical process and ensure it consistently delivers a paper product that meets brightness standards.

paper The brightness of Perfect Papyrus Company’s new copier paper needs to be at least 84 on the TAPPI brightness scale. The important process inputs are the bleach concentration of the solution used to treat the pulp, and the processing temperature. The relationship is explained by this equation:

Brightness = 70.37 + 44.4 Bleach + 0.04767 Temp – 64.3 Bleach*Bleach

Bleach concentration follows a normal distribution with a mean of 0.25 and a standard deviation of 0.0095 percent. Temperature also follows a normal distribution, with a mean of 145 and a standard deviation of 15.3 degrees C.

Building your process model

To assess the process capability, you can enter the parameter information, transfer function, and specification limit into Companion's straightforward interface, and instantly run 50,000 simulations.

paper brightness monte carlo simulation

Understanding your results

monte carlo simulation output

The process performance measurement (Cpk) is 0.162, far short of the minimum standard of 1.33. Companion also indicates that under current conditions, you can expect the paper’s brightness to fall below standards about 31.5% of the time.

Finding optimal input settings

Quality Companion's smart workflow guides you to the next step for improving your process: optimizing your inputs.

paramater optimzation

You set the goal—in this case, maximizing the brightness of the paper—and enter the high and low values for your inputs.

optimization dialog

Simulating the new process

After finding the optimal input settings in the ranges you specified, Companion presents the simulated results for the recommended process changes.

optimized process output

The results indicate that if the bleach amount was set to approximately 0.3 percent and the temperature to 160 degrees, the % outside of specification would be reduced to about 2% with a Cpk of 0.687. Much better, but not good enough.

Understanding variability

To further improve the paper brightness, Companion’s smart workflow suggests that you next perform a sensitivity analysis.

sensitivity analysis

Companion’s unique graphic presentation of the sensitivity analysis gives you more insight into how the variation of your inputs influences the percentage of your output that doesn’t meet specifications.

sensitivity analysis of paper brightness

The blue line representing temperature indicates that variation in this factor has a greater impact on your process than variation in bleach concentration, so you run another simulation to visualize the brightness using the 50% variation reduction in temperature.

final paper brightness model simulation

The simulation shows that reducing the variability will result in 0.000 percent of the paper falling out of spec, with a Cpk of 1.34. Thanks to you, the outlook for the Perfect Papyrus Company’s new copier paper is looking very bright.

Getting great results

Figuring out how to improve a process is easier when you have the right tool to do it. With Monte Carlo simulation to assess process capability, Parameter Optimization to identify optimal settings, and Sensitivity Analysis to pinpoint exactly where to reduce variation, Companion can help you get there.

To try the Monte Carlo simulation tool, as well as Companion's more than 100 other tools for executing and reporting quality projects, learn more and get the free 30-day trial version for you and your team at companionbyminitab,com.

One highlight of writing for and editing the Minitab Blog is the opportunity to read your responses and answer your questions. Sometimes, to my chagrin, you point out that we've made a mistake. However, I'm particularly grateful for those comments, because it permits us to correct inadvertent errors.

opposites I feared I had an opportunity to fix just such an error when I saw this comment appear on one of our older blog posts:

You said a p-value greater than 0.05 gives a good fit. However, in another post, you say the p-value should be below 0.05 if the result is significant. Please, check it out!

You ever get a chill down your back when you realize you goofed? That's what I felt when I read that comment. Oh no, I thought. If the p-value is greater than 0.05, the results of a test certainly wouldn't be significant. Did I overlook an error that basic?

Before beating myself up about it, I decided to check out the posts in question. After reviewing them, I realized I wouldn't need to put on the hairshirt after all. But the question reminded me about the importance of a fundamental idea.

It Starts with the Hypothesis

If you took an introductory statistics course at some point, you probably recall the instructor telling the class how important it is to formulate your hypotheses clearly. Excellent advice.

However, many commonly used statistical tools formulate their hypotheses in ways that don't quite match. That's what this sharp-eyed commenter noticed and pointed out.

The writer of the first post detailed how to use Minitab to identify the distribution of your data, and in her example pointed out that a p-value greater than 0.05 meant that the data were a good fit for a given distribution. The writer of second post—yours truly—commented on the alarming tendency to use deceptive language to describe a high p-value as if it indicated statistical significance.

To put it in plain language, my colleague's post cited the high p-value as an indicator of a positive result. And my post chided people who cite a high p-value as an indicator of a positive result.

Now, what's so confusing about that?

Don't Forget What You're Actually Testing

You can see where this looks like a contradiction, but to my relief, the posts were consistent. The appearance of contradiction stemmed from the hypotheses discussed in the two posts. Let's take a look.

My colleague presented this graph, output from the Individual Distribution Identification:

Probability Plot

The individual distribution identification is a kind of hypothesis test, and so the p-value helps you determine whether or not to reject the null hypothesis.

Here, the null hypothesis is "The data follow a normal distribution," and the alternative hypothesis would be "The data DO NOT follow a normal distribution." If the p-value is over 0.05, we will fail to reject the null hypothesis and conclude that the data follow the normal distribution.

Just have a look at that p-value:

P value

That's a high p-value. And for this test, that means we can conclude the normal distribution fits the data. So if we're checking these data for the assumption of normality, this high p-value is good.

But more often we're looking for a low p-value. In a t-test, the null hypothesis might be "The sample means ARE NOT different," and the alternative hypothesis, "The sample means ARE different." Seen this way, the value or arrangement of the hypotheses is the opposite of that in the distribution identification.

Hence, the apparent contradiction. But in both cases a p-value greater than 0.05 means we fail to reject the null hypothesis. We're interpreting the p-value in each test the same way.

However, because the connotations of "good" and "bad" are different in the two examples, how we talk about these respective p-values appears contradictory—until we consider exactly what the null and alternative hypotheses are saying.

And that's a point I was happy to be reminded of.

The 1949 film A Connecticut Yankee in King Arthur's Court includes the song “Busy Doing Nothing,” and this could be written about the Null Hypothesis as it is used in statistical analyses.

The words to the song go:

We're busy doin' nothin'
Workin' the whole day through
Tryin' to find lots of things not to do

And that summarises the role of the Null Hypothesis perfectly. Let me explain why.

What's the Question?

Before doing any statistical analysis—in fact even before we collect any data—we need to define what problem and/or question we need to answer. Once we have this, we can then work on defining our Null and Alternative Hypotheses.

The null hypothesis is always the option that maintains the status quo and results in the least amount of disruption, hence it is “Busy Doin’ Nothin'”.

When the probability of the Null Hypothesis is very low and we reject the Null Hypothesis, then we will have to take some action and we will no longer be “Doin Nothin'”.

Let’s have a look at how this works in practice with some common examples.

Question

Null Hypothesis

Do the chocolate bars I am selling weigh 100g? Chocolate Weight = 100g

If I am giving my customers the right size chocolate bars I don’t need to make changes to my chocolate packing process.
Are the diameters of my bolts normally distributed?

Bolt diameters are normally distributed.

If my bolt diameters are normally distributed I can use any statistical techniques that use the standard normal approach.

Does the weather affect how my strawberries grow? Number of hours sunshine has no effect on strawberry yield

Amount of rain has no effect on strawberry yield

Temperature has no effect on strawberry yield

Note that the last instance in the table, investigating if weather affects the growth of my strawberries, is a bit more complicated. That's because I needed to define some metrics to measure the weather. Once I decided that the weather was a combination of sunshine, rain and temperature, I established my null hypotheses. These all assume that none of these factors impact the strawberry yield. I only need to control the sunshine, temperature and rain if the probability that they have no effect is very small.

Is Your Null Hypothesis Suitably Inactive?

So in conclusion, in order to be “Busy Doin’ Nothin’”, your Null Hypothesis has to be as follows:

A logical question.
Focused on one objective.
Requires action only if its probability of being true is low (typically 5%).

Last week I was fielding questions on social media about Minitab 18, the latest version of our statistical software. Almost as soon as the new release was announced, we received a question that comes up often from people in pharmaceutical and medical device companies:

pills "Is Minitab 18 FDA-validated?"

How Software Gets Validated

That's a great question. To satisfy U.S. Food and Drug Administration (FDA) regulatory requirements, many firms—including those in the pharmaceutical and medical device industries—must validate their data analysis software. That can be a big hassle, so to make this process easier, Minitab offers a Validation Kit.

We conduct extremely rigorous and extensive internal testing of Minitab Statistical Software to assure the numerical accuracy and reliability of all statistical output. Details on our software testing procedures can be found in the validation kit. The kit also includes an automated macro script to generate various statistical and graphical analyses on your machine. You can then compare your results to the provided output file that we have validated internally to ensure that the results on your machine match the validated results.

Intended Use

FDA regulations state that the purchaser must validate software used in production or as part of a quality system for the “intended use” of the software. FDA’s Code of Federal Regulations Title 21 Part 820.70(i) lays it out:

“When computers or automated data processing systems are used as part of production or the quality system, the manufacturer shall validate computer software for its intended use according to an established protocol.”

FDA provides additional guidance for medical device makers in Section 6.3 of “Validation of Automated Process Equipment and Quality System Software” in the Principles of Software Validation; Final Guidance for Industry and FDA Staff, January 11, 2002.

“The device manufacturer is responsible for ensuring that the product development methodologies used by the off-the-shelf (OTS) software developer are appropriate and sufficient for the device manufacturer's intended use of that OTS software. For OTS software and equipment, the device manufacturer may or may not have access to the vendor's software validation documentation. If the vendor can provide information about their system requirements, software requirements, validation process, and the results of their validation, the medical device manufacturer can use that information as a beginning point for their required validation documentation.”

Validation for intended use consists of mapping the software requirements to test cases, where each requirement is traced to a test case. Test cases can contain:

A test case description. For example, Validate capability analysis for Non-Normal Data.
Steps for execution. For example, go to Stat > Quality Tools > Capability Analysis > Nonnormal and enter the column to be evaluated and select the appropriate distribution.
Test results (with screen shots).
Test pass/fail determination.
Tester signature and date.

An Example

There is good reason for the “intended use” guidance when it comes to validation. Here is an example:

Company XYZ is using Minitab to estimate the probability of a defective part in a manufacturing process. If the size of Part X exceeds 10, the product is considered defective. They use Minitab to perform a capability analysis by selecting Stat > Quality Tools > Capability Analysis > Normal.

In the following graph, the Ppk (1.32) and PPM (37 defects per million) are satisfactory.

Not Validated for Non-Normal Capability Analysis

However, these good numbers would mislead the manufacturer into believing this is a good process. Minitab's calculations are correct, but this data is non-normal, so normal capability analysis was the wrong procedure to use.

Fortunately, Minitab also offers non-normal capability analysis. As shown in the next graph, if we choose Stat > Quality Tools > Capability Analysis > Nonnormal and select an appropriate distribution (in this case, Weibull), we find that the Ppk (1.0) and PPM (1343 defects per million) are actually not acceptable:

Validated for Non Normal Capability Analysis

Thoroughly identifying, documenting, and validating all intended uses of the software helps protect both businesses that make FDA-regulated products and the people who ultimately use them.

Software Validation Resources from Minitab

To download Minitab's software validation kit, visit http://www.minitab.com/support/software-validation/

In addition to details regarding our testing procedures and a macro script for comparing your results to our validated results, the kit also includes software lifecycle information.

Additional information about validating Minitab relative to the FDA guideline CFR Title 21 Part 11 is available at this link:

http://it.minitab.com/support/answers/answer.aspx?id=2588

If you have any questions about our software validation process, please contact us.

If you have a process that isn’t meeting specifications, using Monte Carlo simulation and optimization can help. Companion by Minitab offers a powerful, easy-to-use tool for Monte Carlo simulation and optimization, and in this blog we'll look at the case of product engineers involved in steel production for automobile parts, and how they could use Companion to improve a process.

steel production The tensile strength of Superlative Auto Parts’ new steel parts needs to be at least 600 MPa. The important inputs for this manufacturing process are the melting temperature of the steel and the amount of carbon, manganese, cobalt, and phosphorus it contains. The following transfer equation models the steel’s tensile strength:

Strength = -1434 + 1.1101*MeltTemp + 1495*Carbon + 174.3*Manganese - 7585*Cobalt - 3023*Phosphorus

Building your process model

To assess the process capability, you can enter information about your current process inputs into Companion’s straightforward interface.

Suppose that while you know most of your inputs follow a normal distribution, you’re not sure about the distribution of melting temperature. As long as you have data about the process, you can just select the appropriate column in your data sheet and Companion will recommend the appropriate distribution for you. (If you like to try this yourself, here's the tensile strength data set).

In this case, Companion recommends the Weibull distribution as the best fit and then automatically enters the "MeltTemp" distribution information into the interface.

Once you have entered all of your input settings, your transfer equation, and the lower specification limit, Companion completes 50,000 simulations for the steel production.

Understanding your results initial monte carlo simulation results

The process performance measurement (Cpk) for your process is 0.417, far short of the minimum standard of 1.33. Companion also indicates that under current conditions, 14 percent of your parts won’t meet the minimum specification.

Finding optimal input settings

The Companion Monte Carlo tool’s smart workflow guides you to the next step for improving your process: optimizing your inputs.

You set the goal—maximizing the tensile strength—and enter the high and low values for your inputs. Companion does the rest.

Simulating the new process

After finding the optimal input settings in the ranges you specified, Companion presents the simulated results for the recommended process changes.

The simulation indicates that the optimal settings identified by Companion will virtually eliminate out-of-spec product from your process, with a Cpk of 1.56—a vast improvement that exceeds the 1.33 Cpk standard. Thanks to you, Superlative Auto Parts’ steel products won’t be hitting any bumps in the road.

Getting great results

Figuring out how to improve a process is easier when you have the right tool to do it. With Monte Carlo simulation to assess process capability and Parameter Optimization to identify optimal settings, Companion can help you get there. And with Sensitivity Analysis to pinpoint exactly where to reduce variation, you can further improve your process and get the product results you need.

Have you ever had a probability plot that looks like this?

Probability Plot of Patient Weight Before and After Surgery

The probability plot above is based on patient weight (in pounds) after surgery minus patient weight (again, in pounds) before surgery.

The red line appears to go through the data, indicating a good fit to the Normal, but there are clusters of plotting points at the same measured value. This occurs on a probability plot when there are many ties in the data. If the true measurement can take on any value (in other words, if the variable is continuous), then the cause of the clusters on the probability plot is poor measurement resolution.

The Anderson-Darling Normality test typically rejects normality when there is poor measurement resolution. In a previous blog post (Normality Tests and Rounding) I recommended using the Ryan-Joiner test in this scenario. The Ryan-Joiner test generally does not reject normality due to poor measurement resolution.

In this example, the Ryan-Joiner p-value is above 0.10. A probability plot that supports using a Normal distribution would be helpful to confirm the Ryan-Joiner test results. How can we see a probability plot of the true weight differences? Simulation can used to show how the true weight differences might look on a probability plot.

The difference in weight values were rounded to the nearest pound. In effect, we want to add a random value from -0.5 to +0.5 to each value to get a simulated measurement. The steps are as follows:

Store simulated noise values from -0.5 to +0.5 in a column using Calc > Random Data > Uniform.
Use Calc > Calculator to add the noise column to the original column of data.
Create a normal probability plot using Stat > Basic Statistics > Normality Test.
Repeat steps 1-3 several times if you want to see how the results are affected by the simulated values.

The resulting graph from one iteration of these steps is shown below. It suggests that the Normal distribution is a good model for the difference in weights for this surgery.

Probability plot with simulated measurements

How many samples do you need to be “95% confident that at least 95%—or even 99%—of your product is good?

The answer depends on the type of response variable you are using, categorical or continuous. The type of response will dictate whether you 'll use:

Attribute Sampling: Determine the sample size for a categorical response that classifies each unit as Good or Bad (or, perhaps, In-spec or Out-of-spec).
Variables Sampling: Determine the sample size for a continuous measurement that follows a Normal distribution.

The attribute sampling approach is valid regardless of the underlying distribution of the data. The variables sampling approach has a strict normality assumption, but requires fewer samples.

In this blog post, I'll focus on the attribute approach.

Attribute Sampling

A simple formula gives you the sample size required to make a 95% confidence statement about the probability an item will be in-spec when your sample of size n has zero defects.

, where the reliability is the probability of an in-spec item.

For a reliability of 0.95 or 95%,

For a reliability of 0.99 or 99%,

Of course, if you don't feel like calculating this manually, you can use the Stat > Basic Statistics > 1 Proportion dialog box in Minitab to see the reliability levels for different sample sizes.

one-sample-proportion

1-proportion test

1-proportion output

These two sampling plans are really just C=0 Acceptance Sampling plans with an infinite lot size. The same sample sizes can be generated using Stat > Quality Tools > Acceptance Sampling by Attributes by:

Setting RQL at 5% for 95% reliability or 1% for 99% reliability.
Setting the Consumer’s Risk (β) at 0.05, which results in a 95% confidence level.
Setting AQL at an arbitrary value lower than the RQL, such as 0.1%.
Setting Producer’s Risk (α) at an arbitrary high value, such as 0.5 (note, α must be less than 1-β to run).

By changing RQL to 1%, the following C=0 plan can be obtained:

If you want to make the same confidence statements while allowing 1 or more defects in your sample, the sample size required will be larger. For example, allowing 1 defect in the sample will require a sample size of 93 for the 95% reliability statement. This is a C=1 sampling plan. It can be generated, in this case, by lowering the Producer’s risk to 0.05.

As you can see, the sample size for an acceptance number of 0 is much smaller—in this case, raising the acceptance number from 0 to 1 has raised the sample size from 59 to 93.

Check out this post for more information about acceptance sampling.

Common Assumptions about Data (Part 2: Normality and Equal Variance)

A Six Sigma Master Black Belt in the Kitchen

DMAIC Tools and Techniques: The Define Phase

Snowy Statistics!

17 Common Words with Precise Statistical Meanings...or, More Bewildering Things Statisticians Say

Strangest Capability Study: Super-Zooper-Flooper-Do Broom Boom

DMAIC Tools and Techniques: The Measure Phase

The Empirical CDF, Part 1: What's a CDF?

A Field Guide to Statistical Distributions

Using Designed Experiments (DOE) to Minimize Moisture Loss

P-value Roulette: Making Hypothesis Testing a Winner’s Game

5 Simple Steps to Conduct Capability Analysis with Non-Normal Data

Gauging Gage Part 3: How to Sample Parts

Making the World a Little Brighter with Monte Carlo Simulation

How Can a Similar P-Value Mean Different Things?

The Null Hypothesis: Always “Busy Doing Nothing”

Need to Validate Minitab per FDA Guidelines? Get Minitab's Validation Kit

Making Steel Even Stronger with Monte Carlo Simulation

What Does It Mean When Your Probability Plot Has Clusters?

How Many Samples Do You Need to Be Confident Your Product Is Good?