Normal Distribution

A while back, I offered an overview of process capability analysis that emphasized the importance of matching your analysis to the distribution of your data.

If you're already familiar with different types of distributions, Minitab makes it easy to identify what type of data you're working with, or to transform your data to approximate the normal distribution.

But what if you're not so great with probability distributions, or you're not sure about how or even if you should transform your data? You can still do capability analysis with the Assistant in Minitab Statistical Software. Even if you're a stats whiz, the Assistant's easy-to-follow output can make the task of explaining your results much easier to people who don't share your expertise.

Let's walk through an example of capability analysis with non-normal data, using the Assistant.

The Easy Way to Do Capability Analysis on Non-normal Data

For this example, we'll use a data set that's included with Minitab Statistical Software. (If you're not already using Minitab, download the free trial and follow along.) Click File > Open Worksheet, and then click the button labeled "Look in Minitab Sample Data folder." Open the dataset named Tiles.

This is data from a manufacturer of floor tiles. The company is concerned about the flexibility of the tiles, and the data set contains data collected on 10 tiles produced on each of 10 consecutive working days.

Select Assistant > Capability Analysis in Minitab:

Capability Analysis

The Assistant presents you with simple decision tree that will guide you to the right kind of capability analysis:

The first decision we need to make is what type of data we've collected—Continuous or Attribute. If you're not sure what the difference is, you can just click the "Data Type" diamond to see a straightforward explanation.

Attribute data involves counts and characteristics, while Continuous data involves measurements of factors such as height, length, weight, and so on, so it's pretty easy to recognize that the measurements of tile flexibility are continuous data. With that question settled, the Assistant leads us to the "Capability Analysis" button:

capability analysis option

Clicking that button brings up the dialog shown below. Our data are all in the "Warping" column of the worksheet. The subgroup size is "10", since we measured 10 samples on each day. Enter "8" as the upper spec limit, because that's the customer's guideline.

capability dialog

Then press OK.

Transforming Non-normal Data

Uh-oh—the Assistant immediately gives us a warning. Our data don't meet the assumption of normality:

normality test

When you click "Yes," the Assistant will transform the data automatically (using the Box-Cox transformation) and continue the analysis. Once the analysis is complete, you'll get a Report Card that alerts you if there are potential issues with your analysis, a Diagnostic Report that assesses the stability of your process and the normality of your data, a detailed Process Performance Report, and a Summary Report that captures the bottom line results of your analysis and presents them in plain language.

capability analysis summary report

The Ppk of .75 is below the typical industry acceptability benchmark of 1.33, so this process is not capable. Looks like we have some opportunities to improve the quality of our process!

Comparing Before and After Capability Analysis Results

Once we've made adjustments to the process, we can also use the Assistant to see how much of an impact those changes have had. The Assistant's Before/After Capability Analysis is just what we need:

Before/After Capability Analysis

The dialog box for this analysis is very similar to that for the first capability analysis we performed, but this time we can select a column of data from before we made improvements (Baseline process data), and a column of data collected after our improvements were implemented:

before-after capability analysis dialog box

Press OK and the Assistant will again check if you want to transform your data for normality before it proceeds with the analysis. Then it presents us with a series of reports that make it easy to see the impact of our changes. The summary report gives you the bottom line quickly.

before/after capability analysis summary report

The changes did affect the process variability, and this process now has a Ppk of 1.94, a vast improvement over the original value of .75, and well above the 1.33 benchmark for acceptability.

I hope this post helps you see how the Assistant can make performing capability analyses easier, and that you'll be able to get more value from your process data as a result.

As a Minitab trainer, one of the most common questions I get from training participants is "what should I do when my data isn’t normal?" A large number of statistical tests are based on the assumption of normality, so not having data that is normally distributed typically instills a lot of fear.

Many practitioners suggest that if your data are not normal, you should do a nonparametric version of the test, which does not assume normality. From my experience, I would say that if you have non-normal data, you may look at the nonparametric version of the test you are interested in running. But more important, if the test you are running is not sensitive to normality, you may still run it even if the data are not normal.

What tests are robust to the assumption of normality?

Several tests are "robust" to the assumption of normality, including t-tests (1-sample, 2-sample, and paired t-tests), Analysis of Variance (ANOVA), Regression, and Design of Experiments (DOE). The trick I use to remember which tests are robust to normality is to recognize that tests which make inferences about means, or about the expected average response at certain factor levels, are generally robust to normality. That is why even though normality is an underlying assumption for the tests above, they should work for nonnormal data almost as well as if the data (or residuals) were normal.

The following example that illustrates this point. (You can download the data set here and follow along if you would like. If you don't have Minitab, download the free 30-day trial version, too.)

Generating random data from a Gamma distribution with a scale of 1 and a shape of 2 will produce data that is bounded at 0, and highly skewed. The theoretical mean of this data is 2. It should be clear that the data is not normal—not even approximately normal!

What if I want to test the hypothesis that the population mean is 2? Would I be able to do it effectively with a 1-sample t-test? If normality is not strictly required, I should be able to run the test and have the correct conclusion about 95% of the time—or, in more technical terms, with roughly 95% confidence, right?

To test this, I am providing a little bit of code that will generate 40 samples from a Gamma (1, 2) and will store the p-value for a 1-sample t-test in column C9 of an empty worksheet. To reproduce similar results on your own, copy the following commands and paste them into Notepad. Save the file with the name “p-values.mtb,” and make sure to use the double quotes as part of the file name to ensure that the extension becomes MTB and not the default TXT.

Once the file is saved, choose File > Other Files > Run an Exec. In Number of times to execute the file, enter a relatively large number, such as 1000, and hit Select File, browse to the location on your computer where the p-values.MTB file is saved, and select the executable you just created. Click Open. Grab a cup of coffee, and once you get back to your desk, the simulation should be about done. Once the simulation is complete, create a histogram of column C9 (p-values) by choosing Graph > Histogram. This shows that the p-values are uniformly distributed between 0 and 1, just like they should when the null hypothesis is true.

What percentage of the time in the simulation above did we fail to reject the null? In the simulation I ran, this happened in 95.3% of the instances. Amazing, huh? What does this really mean, though? In layman’s terms, the test is working with approximately 95% confidence, despite the fact that the data is clearly not normal!

For more on simulated p-values for a 1-sample t-test, be sure check out Rob Kelly’s recent post on “p-value roulette.”

Note that if you’re following along and running your own simulation, the histogram and output above will most likely not match yours precisely. This is the beauty of simulated data—it’s going to be a little different each time. However, even with slightly different numbers, the conclusion you reach from the analysis should be about the same.

At this point, I hope you feel a little more comfortable about using these tests that are robust to normality, even if your data don't meet the normality assumption. The Assistant menu can also help you, as some of these “rules of thumb” are built into the Report Card, which informs you that this particular test would be accurate even with nonnormal data.

It is also worth mentioning that the unusual data check in the Assistant even offers a warning about some unusual observations. These unusual observations could have been outliers if the data were normally distributed. In this case though, since we know this data was generated at random, we can be confident that they are not outliers, but proper observations the reflect an underlying nonnormal distribution.

Whenever a normality test fails, an important skill to develop is to determine the reason for why the data is not normal. A few common reasons include:

The underlying distribution is nonnormal.
Outliers or mixed distributions are present.
A low discrimination gauge is used.
Skewness is present in the data.
you have a large sample size.

These are some of the topics that Daniel Griffith and I have been researching and presenting at conferences recently, and they are among the topics that are discussed in detail in Minitab's newest training course, Analysis of Nonnormal Data for Quality.

What tests are not robust to the normality assumption?

If tests around means are—in general—robust to the normality assumption, then when is normality a critical assumption? In general, tests that try to make inferences about the tails of the distribution will require the distribution assumption to be met.

Some examples include:

1.   Capability Analysis and determining Cpk and Ppk
2.   Tolerance Intervals
3.   Acceptance Sampling for variable data
4.   Reliability Analysis to estimate low or high percentiles

The tests for equal variances also are known to be extremely sensitive to the normality assumption.

If you would like to learn how to assess normality by understanding the sensitivity of normality tests under different scenarios, or to analyze nonnormal data when the assumption is extremely critical, or not so critical, you should check out our new course on Analysis of Nonnormal Data for Quality.

by Matthew Barsalou, guest blogger.

The old saying “if it walks like a duck, quacks like a duck and looks like a duck, then it must be a duck” may be appropriate in bird watching; however, the same idea can’t be applied when observing a statistical distribution. The dedicated ornithologist is often armed with binoculars and a field guide to the local birds and this should be sufficient. A statologist (I just made the word up, feel free to use it) on the other hand, is ill-equipped for the visual identification of his or her targets.

Normal, Student's t, Chi-Square, and F Distributions

Notice the upper two distributions in figure 1. The normal distribution and student’s t distribution may appear similar. However, the standard normal distribution is calculated using n and student’s t distribution is calculated using n-1. This may appear to be a minor difference, but when n is small, student’s t distribution displays much more peakedness. Student’s t distribution approaches the normal distribution as the sample size increases, but it never truly matches the shape of the normal distribution.

Observe the Chi-square and F distribution in the lower half of figure 1. The shapes of the distributions can vary and even the most astute observer will not be able to differentiate between them by eye. Many distributions can be sneaky like that. It is a part of their nature that we must accept as we can’t change it.

Distribution Field Guide Figure 1 Figure 1

Binomial, Hypergeometric, Poisson, and Laplace Distributions

Notice the distributions illustrated in figure 2. A bird watcher may suddenly encounter four birds sitting in a tree; a quick check of a reference book may help to determine that they are all of a different species. The same can’t always be said for statistical distributions. Observe the binomial distribution, hypergeometric distribution and Poisson distribution. We can’t even be sure the three are not the same distribution. If they are together with a Laplace distribution, an observer may conclude “one of these does not appear to be the same as the others.” But they are all different, which our eyes alone may fail to tell us.

Distribution Field Guide Figure 2 Figure 2

Weibull, Cauchy, Loglogistic, and Logistic Distributions

Suppose we observe the four distributions in figure 3.What are they? Could you tell if they were not labeled? We must identify them correctly before we can do anything with them. One is a Weibull distribution, but all four could conceivably be various Weibull distributions. The shape of the Weibull distribution varies based upon the shape parameter (κ) and scale parameter (λ).The Weibull distribution is a useful, but potentially devious distribution that can be much like the double-barred finch, which may be mistaken for an owl upon first glance.

Distribution Field Guide Figure 3 Figure 3

Attempting to visually identify a statistical distribution can be very risky. Many distributions such as the Chi-Square and F distribution change shape drastically based on the number of degrees of freedom. Figure 4 shows various shapes for the Chi-Square, F distribution and the Weibull distribution. Figure 4 also compares a standard normal distribution with a standard deviation of one to a t distribution with 27 degrees of freedom; notices how the shapes overlap to the point where it is no longer possible to tell the two distributions apart.

Although there is no definitive Field Guide to Statistical Distributions to guide us, there are formulas available to correctly identify statistical distributions. We can also use Minitab Statistical Software to identify our distribution.

Distribution Field Guide Figure 4 Figure 4

Go to Stat > Quality Tools > Individual Distribution Identification... and enter the column containing the data and the subgroup size. The results can be observed in either the session window (figure 5) or the graphical outputs shown in figures 6 through 9.

In this case, we can conclude we are observing a 3-parameter Weibull distribution based on the p value of 0.364.

Distribution Field Guide Figure 5

Figure 5

Distribution Field Guide Figure 6 Figure 6

Distribution Field Guide Figure 7 Figure 7

Figure 8

Figure 9

About the Guest Blogger

Matthew Barsalou is a statistical problem resolution Master Black Belt at BorgWarner Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time, Statistics for Six Sigma Black Belts and The ASQ Pocket Guide to Statistics for Six Sigma Black Belts.

In previous posts, I discussed the results of a recycling project done by Six Sigma students at Rose-Hulman Institute of Technology last spring. (If you’re playing catch up, you can read Part I and Part II.)

The students did an awesome job reducing the amount of recycling that was thrown into the normal trash cans across all of the institution’s academic buildings. At the end of the spring quarter (2014), 24% of trash cans (by weight) included recyclable items. At the beginning of that spring quarter, 36% of trash cans were recyclable items, so you can see that they were very successful in reducing this percentage!

The fall quarter (2015) brought a new set of Six Sigma students to Rose-Hulman who were just as dedicated to reducing the amount of recycling thrown into normal trash cans, and I want to cover their success in this post, as well as some of the neat statistical methods they used when completing their project.

Fall 2015 goals

This time around, the students wanted to at least maintain or improve on the percentage spring quarter (2014) students were able to achieve. They set out with a specific goal to reduce the amount of recycling in the trash to 20% by weight.

In order to further reduce the recyclables in the academic buildings in fall 2015, the standard “Define, Measure, Analyze, Improve, Control” (DMAIC) methodology of Six Sigma was once again implemented. The main project goal focused on standardizing the recycling process within the buildings, and their plan to reduce the amount of recyclables focused on optimizing the operating procedure for collecting recyclables in all academic building areas (excluding classrooms) where trash and recycling are collected.

Many of the same DMAIC tools that were used by spring 2014 students were also used here, including—Critical to Quality Diagrams, Process Maps, Attribute Agreement Analysis, Gage R&R, Statistical Plots, FMEA, Regression—among many others.

Making and measuring improvements

The spring 2014 initiative added recycling bins to every classroom, which created a measurable improvement. The fall 2015 effort focused on improvement through standardization of operation. For example, many areas in the academic buildings suffer from random placement and arrangement of trash cans and recycling bins. The students thought standardization of bin areas (one trash, one plastic/aluminum recycling, and one paper recycling) would lessen the confusion of recycling, and clear signage and stickers on identically shaped trash cans and recycling bins would be better visual cues of where to place waste of both kinds.

For fall 2015, there were seven teams, and they were assigned different academic building floors (not including classrooms) and common areas. Unlike the spring 2014 data collection, the teams did not combine the trash from their assigned areas. They treated each recycling station as a unique data point.

After implementing the improvements to standardize the bins, the teams collected data for four days across twenty-nine total stations. Thus, there were a total of 116 fall 2015 improvement percentages. The fall 2015 students used the post-improvement percentage of recyclables in the trash from spring 2014 (24%) as their baseline for determining improvement in fall 2015.

The descriptive statistics for the percentage of recyclables (by weight) in the trash were as follows:

Below, the students put together a histogram and a boxplot of the data using Minitab Statistical Software. Over half of the stations (61 out of 116) had less than 5% of recyclables in the trash. Forty-six of the 116 recycling stations had no recyclables. The value of the third quartile (16.6%), meant that 75% of the stations had less than 16.6% recyclables. The descriptive statistics above showed that the sample mean was much larger than the sample median. The graphs confirmed that this must be the case because of the strong positively skewed shape of the data.

Even though the 116 data points didn’t follow a normal distribution and there was a large mound of 0’s as part of the distribution from collection spots that had no recyclables, the students trusted that the Central Limit Theorem with a sample size of 116 would generate a sampling distribution of the means that was normally distributed. Because of the large sample size and unknown standard deviation, they used a t distribution to create a 95% confidence interval for the true mean percentage of recyclables in the trash for fall 2015.

Also using Minitab, they constructed the 95% confidence interval:

The 95% confidence interval meant that the students were 95% certain that the interval [9.94, 18.22] contains the true mean percentage of recyclables in the trash for fall 2015. At an alpha level equal to 0.025, they were able to reject the null hypothesis, where H0: μ = 24% versus Ha: μ < 24%, because 24% was not contained in the two-sided 95% confidence interval. (Remember that 24% was the mean percentage of recyclables in trash after the spring 2014 improvement phase.) The null hypothesis for H0: μ = 20% versus Ha: μ < 20%, was rejected. This meant that they had met their goal to reduce the percentage of recyclables in the trash to below 20% for this project!

Continuing to analyze the data

The students also subgrouped their data by collection day. Each day consisted of data from 29 recycling stations. The comparative boxplots and individual value plots below show the percentage of recyclables in the trash across the four collection dates. (The horizontal dotted line in the boxplot is the mean from spring 2014’s post-improvement data.)

Though all four collection days have sample means less than 24%, it’s obvious from the boxplots that the first three collection days are clearly below 24%, and the medians from all four days are less than 11%. The individual value plots reveal the large number of 0’s on each day, which represented collection spots that had no recyclables. Both graphs display the positively skewed nature of the data. Because of the positive skewness, each day’s mean is much larger than its median.

How capable was the process?

Next, the students ran a process capability analysis for the seven areas where trash was collected over four days:

The process capability indices were Pp = 0.48 and Ppk = 0.42. (The Pp value corresponds to a 1.44 Sigma Level, while the Ppk value corresponds to a 1.26 Sigma Level.) Recall that the previous Ppk value after improvements in spring 2014 was 0.22. The fall 2015 index is almost double that value!

The students knew that they still needed to account for the total weight of the trash and recyclables by calculating the percentage of recyclables per station. Some collection stations with the highest percentage of recyclables had the lowest total weight, while some stations with the lowest percentage of recyclables had the highest total weight. Instead of strictly using a capability index to indicate their improvement, they incorporated a regression model for the trash weight versus the total weight of trash and recyclables to show that the percentage of recyclables in the trash was less than 20%.

The 95% confidence interval for the true mean slope of the regression line was [0.856, 0.954]. The students were 95% certain that the trash weight was somewhere between 0.86 to 0.96 of the total weight of the collection. Hence, the recycling weight was between 0.046 and 0.114 of the total weight. This value is clearly below 20% with 95% confidence! From this, they were able to state through yet another type of analysis that there was a statistically significant improvement over the spring 2014 recycling project, and that they met their goal of reducing the percentage of recyclables in the trash to below 20%. Compared to the spring 2014 project where 24% of the trash was recyclables, the fall 2015 students saved at least 4% more recyclables from ending up in the local landfill!

For even more on this topic, be sure to check out Rose-Hulman student Peter Olejnik’s blog posts on how he and the recycling project team at the school used regression to evaluate project results:

Using Regression to Evaluate Project Results, part 1

Using Regression to Evaluate Project Results, part 2

Many thanks to Dr. Diane Evans for her contributions to this post!

Scientists who use the Hubble Space Telescope to explore the galaxy receive a stream of digitized images in the form binary code. In this state, the information is essentially worthless- these 1s and 0s must first be converted into pictures before the scientists can learn anything from them.

The same is true of statistical distributions and parameters that are used to describe sample data. They offer important information, but the numbers can be meaningless without an illustration to help you interpret them. For instance, what does it mean if your data follow a gamma distribution with a scale of 8 and a shape of 7? If the distribution shifts to a shape of 10, is that good or bad? And how would you explain all of this to an audience that is more interested in outcomes than in statistics?

Minitab’s probability distribution plots create the pictures that bring the numbers to life. Even novice users can reap the benefits that come from understanding their data’s distribution. Here are a few examples.

See what you’ve been missing

Estimates of Distribution Parameters output

A building materials manufacturer develops a new process to increase the strength of its I-beams. The output shows that the old process fit a gamma distribution with a scale of 8 and a shape of 7, whereas the new process has a shape of 10. The manufacturer does not know what this change in the shape parameter means.

Probability distribution plots that compare Gamma distributions

Minitab’s probability distribution plots show that the subtle shape change increases the number of acceptable beams from 91.4% to 99.5%, an improvement of 8.1%. Additionally, the right tail appears to be much thicker, which indicates many more unusually strong units. Perhaps these could lead to a premium line of products.

Communicate your results

A quality improvement specialist at a grocery store chain wants to implement a new but expensive program to reduce discrepancies between the item’s shelf price and the amount that is charged at the register. No difference in prices is ideal, but any difference within the range of ± 0.5% is considered acceptable.

Descriptive statistics of grocery program results before and after

In the pilot study, the mean improvement is tiny and the president doesn’t see the benefits of the smaller standard deviation. Therefore, the president is reluctant to approve the costly program.

Probability distribution plots that compere the before and after for the pilot study

The specialist knows that the tighter distribution is key to the program’s success. To illustrate this, she creates this plot to show that the differences are clustered much closer to zero and most are in the acceptable range. Now the president can see the improvement.

Compare distributions

The fabrication department of a farm equipment manufacturer counts the number of tractor chassis that are completed per hour. A Poisson distribution with a mean of 3.2 best describes the sample data. However, the test lab prefers to use an analysis that requires a normal distribution and wants to know if it is appropriate. If the normal distribution does not approximate the Poisson distribution, then the test results are invalid.

Probability distribution plot that compares a Poisson distribution to a Normal distribution

The distribution plot can easily compare the known distribution with a normal distribution. In this case, lab workers can clearly see that the normal distribution, as well as the analyses that require it, won’t be a good fit.

How to create probability distribution plots in Minitab

You can easily create a probability distribution plot to visualize and to compare distributions and even to scrutinize an area of interest. For example, an analyst wants to interview customers who have customer satisfaction scores that are between 115 and 1 35. Minitab’s Individual Distribution Identification feature shows that these scores are normally distributed with a mean of 100 and a standard deviation of 15. However, the analyst can’t visualize where his subjects fall within the range of scores or their proportion of the entire distribution.

Dialog box to create probability distribution plot

Choose Graph > Probability Distribution Plot > View Probability.
Click OK.
From Distribution, choose Normal.
In Mean, type 100.
In Standard deviation, type 15.

Shade area dialog for creating probability distribution plots

Click the Shaded Area tab.
In Define Shaded Area By, choose X Value.
Click Middle.
In X value 1, type 115.
In X value 2, type 135.
Click OK.

Probability distribution plot that shows the probability of IQ scores from 115 to 135

The scores in the region of interest (115-135) represent 14.9% of the population. This somewhat small percentage suggests that the analyst may have to expend extra effort to find a sufficient number of qualified subjects.

Putting probability distribution plots to use

Probability distribution plots provide valuable insight because they reveal the deeper meaning of your distributions. Use these graphs to highlight the effect of changing distributions and parameter values, to show where target values fall in a distribution, and to view the proportions that are associated with shaded areas. These simple plots also clearly and easily communicate these advanced concepts to a non-statistical audience.

Don’t let your audience be confused by hard-to-understand concepts and numbers. Instead, use Minitab to illustrate what your data are telling you.

By Matthew Barsalou, guest blogger.

Many statistical tests assume the data being tested came from a normal distribution. Violating the assumption of normality can result in incorrect conclusions. For example, a Z test may indicate a new process is more efficient than an older process when this is not true. This could result in a capital investment for equipment that actually results in higher costs in the long run.

Statistical Process Control (SPC) requires either normally distributed data or a transformation must be performed on the data. It would be very risky to monitor a process with SPC charts created with data that violated the assumption of normality.

What can we do if the assumption of normality is critical to so many statistical methods? We can construct a probability plot to test this assumption.

Those of us who are a bit old-fashioned can construct a probability plot by hand, by plotting the order values (j) against the observed cumulative frequency (j- 0.5/n). Using the numbers 16, 21, 20, 19, 18 and 15, we would construct a normal probability plot by first creating the table shown below.

(j – 0.5)/6

0.158

0.325

0.492

0.658

0.825

0.992

We then plot the results as shown in the figure below.

normal probability plot

That's fine for a small data set, but nobody wants to plot hundreds or thousands of data points by hand. Fortunately, we can also use Minitab Statistical Software to assess the normality of data. Minitab uses the Anderson-Darling test, which compares the actual distribution to a theoretical normal distribution. Anderson-Darling test’s null hypothesis is “The distribution is normal.”

Anderson-Darling test:

H0: The data follow a normal distribution.

Ha: The data don’t follow a normal distribution.

Test statistic: A2 = - N – S, where and F is the cumulative distribution function of the specified distribution. We can assess the results by looking at the resulting p value.

The figure below shows a normal distribution with a sample size of 27. The same data is shown in a histogram, probability plot, dot plot and a box blot.

Probability plot

The next figure shows a normal distribution with sample a size of 208. Notice how the data is concentrated in the center of the histogram, probability plot, dot plot, and box plot.

normal probability plot

A Laplace distribution with a sample size of 208 is shown below. Visually, this data almost resembles a normal distribution; however, the Minitab generated P value of < 0.05 tells us that this distribution is not normally distributed.

normality plot

The figure below shows a uniform distribution with a sample size of 270. Even without looking at the P value we can quickly see that the data is not normally distributed.

normal probability distribution assumption plot

Back in the days of hand-drawn probability plots, the “fat pencil test” was often used to evaluate normality. The data was plotted and the distribution was considered normal if all of the data points could be covered by a thick pencil. The fat pencil test was quick and easy. Unfortunately, it is not as accurate as the Anderson-Darling test and is not a substitution for an actual test.

probability plot of normal
Fat pencil test with normally distributed data

Probability Plot of Non-Normal
Fat pencil test with non-normally distributed data

The proper identification of a statistical distribution is critical for properly performing many types of hypothesis tests or for control charting. Fortunately, we can now asses our data without having to rely on hand-drawn tests and a large diameter pencil.

To test for normality go to the Graph menu in Minitab, and select Probability Plot.

Selecting the probability plot

Click on OK to select Single if you are only looking at one column of data.

probability plot selection

Select your column of data and then click OK.

Single Probability Plot

Minitab will generate a probability plot of your data. Notice the P-value below is 0.829. We would fail to reject the null hypothesis that the distribution of our data is equal to a normal distribution when we use a P-value of 0.05 for 95% confidence.

Probability Plot of C1

Using Minitab to test data for normality is far more reliable than a fat pencil test and generally quicker and easier. However, the fat pencil test may still be a viable option if you absolutely must analyze your data during a power outage.

About the Guest Blogger

Every now and then I’ll test my Internet speed at home using such sites as http://speedtest.comcast.net or http://www.att.com/speedtest/. My need to perform these tests could stem from the cool-looking interfaces they employ on their site, as they display the results using analog speedometers and RPM meters. They could also stem from the validation that I need in "getting what I am paying for," although I realize that there are other factors that determine what Internet speed you ultimately end up with when you browse the Web.

Recently I started thinking about the distribution of these speeds. If I were to run enough tests, would these speeds be normally distributed?

When performing an Internet speed test, you are given an estimated download and upload speed. The download speed is the rate at which data travels from the Internet to your device, and the upload speed is the rate at which data travels from your device to the Internet. I was also curious as to whether the population means of these speeds were statistically different.

Is the Data Normally Distributed?

I ran 30 speed tests from my office at Minitab and recorded the download and upload data into a Minitab Statistical Software worksheet: Here is a sample of the data:

I went to Stat > Basic Statistics > Normality Test. Here are the probability plots for download and upload speed.

I’ll be using an alpha level of 0.05 to compare the p-value to. Both probability plots show p-values greater than alpha, and therefore we do not have enough evidence to reject the null hypothesis. As a quick reminder, the null hypothesis is that our data follows a normal distribution. We can assume normality.

Is There a Difference Between Upload and Download Speed?

Let’s find out if there was a statistical difference between the download speed and the upload speed.

Go to Stat > Basic Statistics > 2-Sample t:

I chose “Each Sample is in its own column” under the dropdown, and entered in the column for download speed for Sample 1 and upload speed for Sample 2.

If you click on Options you’ll see a checkbox for "Assume Equal Variances." Checking this box will result in a slightly more powerful 2-Sample-t test. But how do I know if the variances are equal or not? By using quick test in Minitab!

I cancelled out of the 2-Sample t dialog window and quickly ran an Equal Variances test (Stat > Basic Statistics > 2 Variances) and received these results:

Given that our p-value is greater than an alpha of 0.05, we don’t have enough evidence to say that the two variances are statistically different. Therefore, we are able to go back to the 2-Sample t test and check the box for "Assume equal Variances."

Here's the output from my 2-Sample t-test:

Since our p-value is less than 0.05, we can reject the null hypothesis (that both means are the same) and say that the population means for download and upload speed are statistically different.

Vrrrrooooooooom!

I was curious as to why the upload speeds were higher than the download speeds during my testing. Whenever I’ve tested speeds at my house, I’ve always seen the reverse.

I asked someone here at Minitab who is well versed in network setup, and he said that there could have been more bandwidth consumption from my coworkers than normal at the time of data collection. This extra consumption can push the download speeds below the upload speeds. He also said that the nature of how the Internet is configured at a company can be a contributing factor as well.

If you were given an expected download rate by your cable company, you could add to this experiment by performing a 1-Sample t-test. The expected download rate would serve as your hypothesized mean. You would then be able to perform a hypothesis test to see if your mean is statistically different from your hypothesized mean.

If you find that you're not getting the speeds you wanted, I wouldn't start running around with pitchforks just yet. According to http://www.cnet.com/how-to/how-to-find-a-reliable-network-speed-test/ , accuracy and consistency in speeds may depend on what online speed test you are using. But comparing the different speed testing tools is an analysis for another day!

I don't like the taste of crow. That's a shame, because I'm about to eat a huge helping of it.

I'm going to tell you how I messed up an analysis. But in the process, I learned some new lessons and was reminded of some older ones I should remember to apply more carefully.

This Failure Starts in a Victory

My mistake originated in the 2015 Triple Crown victory of American Pharoah. I'm no racing enthusiast, but I knew this horse had ended almost four decades of Triple Crown disappointments, and that was exciting. I'd never seen a Triple Crown won before. It hadn't happened since 1978.

So when an acquaintance asked to contribute a guest post to the Minitab Blog that compared American Pharoah with previous Triple Crown contenders, including the record-shattering Secretariat, who took the Triple Crown in 1973, I eagerly accepted.

In reviewing the post, I checked and replicated the contributor's analysis. It was a fun post, and I was excited about publishing it. But a few days after it went live, I had to remove it: the analysis was not acceptable.

To explain how I made my mistake, I'll need to review that analysis.

Comparing American Pharoah and Secretariat

In the post, we used Minitab's statistical software to compare Secretariat's performance to other winners of Triple Crown races.

Since 1926, the Belmont Stakes has been the longest of the three races at 1.5 miles. The analysis began by charting 89 years of winning horse times:

Only two data points were outside of the I-chart's control limits:

The fastest winner, Secretariat's 1973 time of 144 seconds
The slowest winner, High Echelon's 1970 time of 154 seconds

The average winning time was 148.81 seconds, which Secretariat beat by more than 4 seconds.

Applying a Capability Approach to the Race Data

Next, the analysis approached the data from a capability perspective: Secretariat's time was used as a lower spec limit, and the analysis sought to assess the probability of another horse beating that time.

The way you assess capability depends on the distribution of your data, and a normality test in Minitab showed this data to be nonnormal.

When you run Minitab's normal capability analysis, you can elect to apply the Johnson transformation, which can automatically transform many nonnormal distributions before the capability analysis is performed. This is an extremely convenient feature, but here's where I made my mistake.

Running the capability analysis with Johnson transformation, using Secretariat's 144-second time as a lower spec limit, produced the following output:

The analysis found a .36% chance of any horse beating Secretariat's time, making it very unlikely indeed.

The same method was applied to Kentucky Derby and Preakness data.

We found a 5.54% chance of a horse beating Secretariat's Kentucky Derby time.

We found a 3.5% probability of a horse beating Secretariat's Preakness time.

Despite the billions of dollars and countless time and effort spent trying to make thoroughbred horses faster over the past 43 years, no one has yet beaten “Big Red,” as Secretariat was known. So the analysis indicated that American Pharoah may be a great horse, but he is no Secretariat.

That conclusion may well be true...but it turns out we can't use this analysis to make that assertion.

My Mistake Is Discovered, and the Analysis Unravels

Here's where I start chewing those crow feathers. A day or so after sharing the post about American Pharoah, a reader sent the following comment:

Why does Minitab allow a Johnson Transformation on this data when using Quality Tools > Capability Analysis > Normal > Transform, but does not allow a transformation when using Quality Tools > Johnson Transformation? Or could I be doing something wrong?

Interesting question. Honestly, it hadn't even occurred to me to try to run the Johnson transformation on the data by itself.

But if the Johnson Transformation worked when performed as part of the capability analysis, it ought to work when applied outside of that analysis, too.

I suspected the person who asked this question might have just checked a wrong option in the dialog box. So I tried running the Johnson Transformation on the data by itself.

The following note appeared in Minitab's session window:

no transformation is made

Uh oh.

Our reader hadn't done anything wrong, but it was looking like I made an error somewhere. But where?

I'll show you exactly where I made my mistake in my next post.

Photo of American Pharoah used under Creative Commons license 2.0. Source: Maryland GovPics https://www.flickr.com/people/64018555@N03

If you've read the first two parts of this tale, you know it started when I published a post that involved transforming data for capability analysis. When an astute reader asked why Minitab didn't seem to transform the data outside of the capability analysis, it revealed an oversight that invalidated the original analysis.

lemons and lemonade I removed the errant post. But to my surprise, the reader who helped me discover my error, John Borneman, continued looking at the original data. He explained to me, "I do have a day job, but I'm a data geek. Plus, doing this type of analysis ultimately helps me analyze data found in my real work!"

I want to share what he did, because it's a great example of how you can take an analysis that doesn't work, ask a few more questions, and end up with an analysis that does work.

Another Look at the Original Analysis

At root, the original post asked, "What is the probability of any horse beating Secretariat's record?" A capability study with Secretariat's winning time as the lower spec limit would provide a estimate of that probability, but as the probability plot below indicates, the data was not normal:

So we ran Stat > Capability Analysis > Normal and selected the option to apply the Johnson transformation before calculating capability. Minitab returned a capability analysis, but the resulting graph doesn't explicitly note that the Johnson transformation was not used.

Note the lack of information about the transformation in the preceding graph. If you don't see details about the transformation, it means the transformation failed. But I failed to notice what wasn't there. I also neglected to check the Session Window, which does tell you the transformation wasn't applied:

Applying the Transformation by Itself

When you select the Johnson transformation as part of the capability analysis in Minitab, the transformation is just a supporting player to the headliner, capability analysis. The transformation doesn't get a lot of attention.

But using Stat > Quality Tools > Johnson Transformation places the spotlight exclusively on the transformation, and Minitab highlights whether the transformation succeeds—or, in this case, fails.

When I looked at this data, I saw that it wasn't normally distributed. But Borneman noticed something else: the data had an ordinal pattern—the race times fell into buckets that were one full second apart.

That means the data lacked discrimination: it was not very precise.

While ordinal data can be used in many analyses, poor discrimination often causes problems when trying to transform data or fit it to a common distribution. Capability studies, where the data at the tails is important, really shouldn't be performed with ordinal data—especially when there is low discrimination.

What Can We Do If the Data Is Truly Ordinal?

But other techniques are available, particularly graphical tools, including box plots and time series plots. And if you wish to compare two groups of data and the data is ordinal with more than 10 categories, you can use ANOVA, a t-test, or even non-parametric tests such as Moods Median.

Playing out the "what if" scenario that this data was ordinal, Borneman used this approach to see if there was a difference between the Kentucky derby winning times in races run between 1875 and 1895 and those between 1896 and 2015.

Minitab Output

"The race was 1.5 miles until 1896, when it was shortened to 1.25 miles," Borneman says when looking at the results. "So obviously we'd expect to see a difference, but it's a good way to illustrate the point."

Ordinal data is valuable, but given its limited discrimination, it can only take you so far.

What Can We Do If the Data Is Not Truly Ordinal?

Borneman soon realized that the original data must have been rounded, and more precise data might not be ordinal. "Races clock the horse's speed more accurately than to the nearest second," he says. "In fact, I found that the Derby clocks times to the nearest 1/100 of a second since 2001. The race was timed to the 1/4 second from 1875 to 1905, and to the 1/5 second 1906 to 2000."

He found Kentucky Derby winning race times with more precise measurements, and not the rounded times:

Then he compared the rounded and non-rounded data. "The dot plot really shows the differences in discrimination between these two data sets," he says.

Does the New Data Fit the Normal Distribution?

Borneman wondered if the original analysis could be revisited with this new, more precise data. But a normality test showed the new data also was not normally distributed, and that it didn't fit any any common distribution.

However, running the Johnson Transformation on this data worked!

That meant the more detailed data could be used to perform the capability analysis that failed with the original, rounded data.

An Even More Dramatic Result

Running the capability study using the Johnson transformation and using Secretariat's time as the lower spec limit, Borneman found that the probability of another horse getting a time less than 119.4 seconds is 0.32%.

This is quite a difference from the original analysis, which found about a 5% chance of another horse beating Secretariat's time. In fact, it adds even more weight to the original post's argument that Secretariat was unique among Triple Crown winners.

Now, it should be noted that using a capability study to assess the chance of a future horse beating Secretariat's time is a bit, well, unorthodox. It may make for a fun blog post, but it does not account for the many factors that change from race to race.

"And as my wife—a horse rider and fanatic—pointed out, we also don't know what type of race each jockey and trainer ran," Borneman told me. "Some trainers have the goal to win the race, and not necessarily beat the fastest time."

Borneman's right about this being an off-label use of capability analysis. "On the other hand," he notes, "Secretariat's time is definitely impressive."

What Have I Learned from All This?

In the end, making this mistake reinforced several old lessons, and even taught me some new ones. So what am I taking away from all of this?

Graphs are great, but you can't assume they tell the whole story. Check all of the feedback and results available.
Know what the output should include. This is especially important if it's been a while since you performed a particular analysis. A quick peek at Minitab Help or Quality Trainer is all it takes.
Try performing the analyses in different ways. If I had performed this capability analysis using the Assistant in addition to using the Stat menu, for example, I would have discovered the problem earlier. And it would only have taken a few seconds.

And here's the biggest insight I'm taking from this experience:

When your analysis fails, KEEP ASKING QUESTIONS. The original analysis failed because the data could not be transformed. But by digging just a little deeper, Borneman realized that rounded data was inhibiting the successful transformation. And by asking variations on "what if," he demonstrated that you can still get good insights—even when your data won't behave the way you'd hoped.

I'm glad to learn these lessons, even at the cost of some embarrassment over my initial mistake. I hope sharing my experience will help you avoid a similar situation.

by Colin Courchesne, guest blogger, representing his Governor's School research team.

High-level research opportunities for high school students are rare; however, that was just what the New Jersey Governor’s School of Engineering and Technology provided.

Bringing together the best and brightest rising seniors from across the state, the Governor’s School, or GSET for short, tasks teams of students with completing a research project chosen from a myriad of engineering fields, ranging from biomedical engineering to, in our team's case, industrial engineering.

Tasked with analyzing, comparing, and simulating queue processes at Dunkin’ Donuts and Starbucks, our team of GSET scholars spent five days tirelessly collecting roughly 250 data points on each restaurant. Our data included how much time people spent waiting in line, what type of drinks customers ordered, and how much time they spent waiting for their drinks after ordering.

data collection interface
The students used a computerized interface to collect data about customers in two different coffee shops.

But once the data collection was over, we reached a sort of brick wall. What do we do with all this data? As research debutantes not well versed in the realm of statistics and data analysis, we had no idea how to proceed.

Thankfully, the helping hand of our project mentor, engineer Brandon Theiss, guided us towards Minitab.

Getting Meaning Out of Our Data

Our original, raw data told us nothing. In order to compare data between stores and create accurate process simulations, we needed a way to sort the data, determine descriptive statistics, and assign distributions; it is these very tools that Minitab offered. Getting started was both easy and intuitive.

First, we all managed to download Minitab 17 (thanks to the 30-day trial). Our team then went on to learn the ins and outs of Minitab, both through instructional videos on YouTube as well as helpful written guides, all of which are provided by Minitab. Less than an hour later, we were able to navigate the program with ease.

The nature of the simulations our team intended to create called for us to identify the arrival process for each store, the distributions for the wait time of a customer in line at each restaurant, as well as the distributions for the drink preparation time, sectioned off by both restaurant as well as drink type. In order to input this information into our simulation, we also needed certain parameters that were dependent on the distribution. Such parameters ranged from alpha and beta values for Gamma distributions to means and standard deviations for Normal distributions.

Thankfully, running the necessary hypothesis tests and calculating each of these parameters was simple. We first used the “Goodness of fit for Poisson” test in order to analyze our arrival rates.

All Necessary Information

Rather than having to fiddle with equations and arrange cells like in Excel, Minitab quickly provided us with all necessary information, including our P-value to determine whether the distribution fit the data as well as parameters for shape and scale.

As for distributions for individual drink preparation times, the process was similarly simple. Using the “Individual Distribution Identification” tool, Minitab ran a series of hypothesis tests, comparing our data against a total of 16 possible distributions. The software output graphs along with P-values and Anderson-Darling values for each distribution, allowing us to graphically and empirically determine the appropriateness of fit.

Probability Plot for Latte S

Within 3 hours, we had sorted and analyzed all of our data.

Not only was Minitab a fantastic tool for our analysis purposes, but the software also provided us with a graphical platform, a means by which to produce most of the graphs used in our research paper and presentation. Once we determined which distribution to use with what data, we used Minitab to output histograms with fitted data distributions for each set of data points. The ease of use for this feature served to save us time, as a series of simple clicks allowed us to output all 10 of our required histograms at the same time.

Histogram of Line Time S

The same tools first used to analyze our data were then finally used to analyze the success of our simulations; we ran a Kolmogorov-Smirnov test to determine whether two sets of data—in this case, our observed data and the data output by our simulation—share a common distribution. Like most other features in Minitab, it was extremely easy to use and provided clear and immediate feedback as to the results of the test, both graphically and through the requisite critical and KS values

Research isn’t always fun. It’s often long, tedious, and amounts to nothing. Thankfully, that wasn’t our case. Using Minitab, our entire analysis process was simple and painless. The software was easy to learn and was able to run any test quickly and efficiently, providing us with both empirical and graphical evidence of the results as well as high-quality graphs which were used throughout our project. It really was a pleasure to work with.

GSET Coff(IE) Team

—The GSET COFF[IE] Team, whose members were Kenneth Acquah, Colin Courchesne, Sheela Hanagal, Kenneth Li, and Caroline Potts. The team was mentored by Juilee Malavade and Brandon Theiss, PE. Photo courtesy Colin Courchesne.

About the Guest Blogger:

Colin Courchesne was a scholar in the 2015 New Jersey Governor's School of Engineering and Technology, a summer program for high-achieving high school students. Students in the program complete a set of challenging courses while working in small groups on real-world research and design projects that relate to the field of engineering. Governor’s School students are mentored by professional engineers as well as Rutgers University honors students and professors, and they often work with companies and organizations to solve real engineering problems.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

The 1949 film A Connecticut Yankee in King Arthur's Court includes the song “Busy Doing Nothing,” and this could be written about the Null Hypothesis as it is used in statistical analyses.

The words to the song go:

We're busy doin' nothin'
Workin' the whole day through
Tryin' to find lots of things not to do

And that summarises the role of the Null Hypothesis perfectly. Let me explain why.

What's the Question?

Before doing any statistical analysis—in fact even before we collect any data—we need to define what problem and/or question we need to answer. Once we have this, we can then work on defining our Null and Alternative Hypotheses.

The null hypothesis is always the option that maintains the status quo and results in the least amount of disruption, hence it is “Busy Doin’ Nothin'”.

When the probability of the Null Hypothesis is very low and we reject the Null Hypothesis, then we will have to take some action and we will no longer be “Doin Nothin'”.

Let’s have a look at how this works in practice with some common examples.

Question

Null Hypothesis

Do the chocolate bars I am selling weigh 100g? Chocolate Weight = 100g

If I am giving my customers the right size chocolate bars I don’t need to make changes to my chocolate packing process.
Are the diameters of my bolts normally distributed?

Bolt diameters are normally distributed.

If my bolt diameters are normally distributed I can use any statistical techniques that use the standard normal approach.

Does the weather affect how my strawberries grow? Number of hours sunshine has no effect on strawberry yield

Amount of rain has no effect on strawberry yield

Temperature has no effect on strawberry yield

Note that the last instance in the table, investigating if weather affects the growth of my strawberries, is a bit more complicated. That's because I needed to define some metrics to measure the weather. Once I decided that the weather was a combination of sunshine, rain and temperature, I established my null hypotheses. These all assume that none of these factors impact the strawberry yield. I only need to control the sunshine, temperature and rain if the probability that they have no effect is very small.

Is Your Null Hypothesis Suitably Inactive?

So in conclusion, in order to be “Busy Doin’ Nothin’”, your Null Hypothesis has to be as follows:

A logical question.
Focused on one objective.
Requires action only if its probability of being true is low (typically 5%).

When we take pictures with a digital camera or smartphone, what the device really does is capture information in the form of binary code. At the most basic level, our precious photos are really just a bunch of 1s and 0s, but if we were to look at them that way, they'd be pretty unexciting.

In its raw state, all that information the camera records is worthless. The 1s and 0s need to be converted into pictures before we can actually see what we've photographed.

We encounter a similar situation when we try to use statistical distributions and parameters to describe data. There's important information there, but it can seem like a bunch of meaningless numbers without an illustration that makes them easier to interpret.

For instance, if you have data that follows a gamma distribution with a scale of 8 and a shape of 7, what does that really mean? If the distribution shifts to a shape of 10, is that good or bad? And even if you understand it, how easy would it be explain to people who are more interested in outcomes than statistics?

Enter the Probability Distribution Plot

That's where the probability distribution plot comes in. Making a probability distribution plot using Minitab Statistical Software will create a picture that helps bring the numbers to life. Even novices can benefit from understanding their data’s distribution.

Let's take a look at a few examples.

Changing Shape

A building materials manufacturer develops a new process to increase the strength of its I-beams. The old process fit a gamma distribution with a scale of 8 and a shape of 7, whereas the new process has a shape of 10.

estimates

The manufacturer does not know what this change in the shape parameter means, and the numbers alone don't tell the story.

But if we go in Minitab to Graph > Probability Distribution Plot, select the "View Probability" option, and enter the information about these distributions, the impact of the change will be revealed.

Here's the original process, with the shape of 7:

And here is the plot for the new process, with a shape of 10:

The probability distribution plots make it easy to see that the shape change increases the number of acceptable beams from 91.4% to 99.5%, an 8.1% improvement. What's more, the right tail appears to be much thicker in the second graph, which indicates the new process creates many more unusually strong units. Hmmm...maybe the new process could ultimately lead to a premium line of products.

Communicating Results

Suppose a chain of department stores is considering a new program to reduce discrepancies between an item’s tagged price and the amount is charged at the register. Ideally, the system would eliminate any discrepancies, but a ± 0.5% difference is considered acceptable. However, implementing the program will be extremely expensive, so the company runs a pilot test in a single store.

In the pilot study, the mean improvement is small, and so is the standard deviation. When the company's board looks at the numbers, they don't see the benefits of approving the program, given its cost.

communicate results data

The store's quality specialist thinks the numbers aren't telling the story, and decides to show the board the pilot test data in a probability distribution plot instead:

By overlaying the before and after distributions, the specialist makes it very easy to see that price differences using the new system are clustered much closer to zero, and most are in the ± 0.5% acceptable range. Now the board can see the impact of adopting the new system.

Comparing Distributions

An electronics manufacturer counts the number of printed circuit boards that are completed per hour. The sample data is best described by a Poisson distribution with a mean of 3.2. However, the company's test lab prefers to use an analysis that requires a normal distribution and wants to know if it is appropriate.

The manufacturer can easily compare the known distribution with a normal distribution using the probability distribution plot. If the normal distribution does not approximate the Poisson distribution, then the lab's test results will be invalid.

As the graph indicates, the normal distribution—and the analyses that require it—won’t be a good fit for data that follow a Poisson distribution with a mean of 3.2.

Creating Probability Distribution Plots in Minitab

It's easy to use Minitab to create plots to visualize and to compare distributions and even to scrutinize an area of interest.

Let's say a market researcher wants to interview customers with satisfaction scores between 115 and 135. Minitab’s Individual Distribution Identification feature shows that these scores are normally distributed with a mean of 100 and a standard deviation of 15. However, the analyst can’t visualize where his subjects fall within the range of scores or their proportion of the entire distribution.

Choose Graph > Probability Distribution Plot > View Probability.
Click OK.

dialog box

From Distribution, choose Normal.
In Mean, type 100.
In Standard deviation, type 15.
Click on the "Shaded Area" tab.

distribution plot dialog box 2

In Define Shaded Area By, choose X Value.
Click Middle.
In X value 1, type 115.
In X value 2, type 135.
Click OK.

Minitab creates the following plot:

distribution plot

About 15% of sampled customers had scores in the region of interest (115-135). This is not a very large percentage, so the researcher may face challenges in finding qualified subjects.

Using Probability Distribution Plots

Just like your camera when it assembles 1s and 0s into pictures, probability distribution plots let you see the deeper meaning of the numbers that describe your distributions. You can use these graphs to highlight the impact of changing distributions and parameter values, to show where target values fall in a distribution, and to view the proportions that are associated with shaded areas. These simple plots also clearly and easily communicate these advanced concepts to a non-statistical audience that might be confused by hard-to-understand concepts and numbers.

Ever use dental floss to cut soft cheese? Or Alka Seltzer to clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects online. Some are more peculiar than others.

Ever use ordinary linear regression to evaluate a response (outcome) variable of counts?

Technically, ordinary linear regression was designed to evaluate a a continuous response variable. A continuous response variable, such as temperature or length, is measured along a continuous scale that includes fractional (decimal) values. In practice, however, ordinary linear regression is often used to evaluate a response of count data, which are whole numbers as 0, 1, 2, and so on.

You can do that. Just like you can use a banana to clean a DVD. But there are things to watch out for if you do that. To examine issues related to performing ordinary linear regression analysis with count data, consider the following scenario.

Kids, Ants, and Sandwiches

A bored kid in a backyard makes a great scientist. One day, three Australian kids wondered which of their lunch sandwiches would attract more meat ants: Peanut butter, Vegemite, or Ham and pickles.

Note: Meat ants are an aggressive species of Australian ant that can kill a poisonous cane toad. Vegemite is a slightly bitter, salty brown paste made from brewer’s yeast extract.

To test their hypotheses, the kids starting dropping pieces of the three sandwiches and counting the number of ants on each sandwich after a set amount of time. Years later, as an adult, one of the kids replicated this childhood experiment with increased rigor. You can find the details of his modified experiment and the sample data it produced on the web site of the American Statistical Association.

Preparing the Data

To make the data and the results easier to interpret, I coded and sorted the original sample data set using the Code and Sort commands in Minitab's Data menu. If you want to see those data manipulation maneuvers, click here to open the project file in Minitab, then open the Report Pad to see the instructions. If you don't have a copy of Minitab, you can download a free 30-day trial version.

After coding and sorting, the combination of factor levels for each sandwich used for ant bait are easy to see in the worksheet, and the data values are arranged in the order that they were collected.

For example, row 9 shows that ham and pickles on rye with butter was the 9th piece of sandwich bait used—and it attracted 65 meat ants.

Performing Linear Regression

Are meat ants statistically more likely to swarm a ham sandwich—or will the pickles be a turnoff? Do they gravitate to the creamy comfort of butter? Or will salty, malty Vegemite drive them wild?

To evaluate the data using ordinary linear regression, choose Stat > Regression > Fit Regression Model. Fill out the dialog box as shown below and click OK.

First, examine the ANOVA table to determine whether any of the predictors are statistically significant.

At the 0.1 level of significance, both Filling and Butter predictors are statistically significant (p-value < 0.1). What matters to a meat ant, it seems, is not the bread, but what's between it.

To see how each of the levels of the factors relate to the number of ants (the response), examine the Coefficients table.

Each coefficient value is calculated in relation to the reference level for the variable, which has a coefficient of 0. Whatever level isn’t shown in the table is the reference level. So for the Filling variable, the reference level is Vegemite.

Tip: You can see the reference levels used for each variable by clicking the Coding button on the Regression dialog box. If you want the coefficients to be calculated relative to a different level, simply change the reference level in the drop-down list and rerun the analysis.

So what do these coefficient values mean? Generally speaking, larger coefficients are associated with a response of greater magnitude. The positive coefficients indicate a positive association, and the negative coefficients indicate a negative association.

For example, the positive coefficient of 27.28 for ham and pickles indicates that many more ants are attracted to the ham and pickles over Vegemite. The p-value of 0.000 for the coefficient indicates that the difference between ham and pickles and Vegemite is statistically significant. Based on these results, meat ants appear to be aptly named!

The Regression Equation: Caveat with a Count Response

The output for ordinary linear regression also includes a regression equation. The equation can be used to estimate the value of the response for specific values of the predictor variables

For categorical predictors, substitute a value of 1 into the equation for the levels at which you want to predict a response, and substitute 0 for the other levels.

For example, using the equation above, the number of meat ants that you can expect to be attracted by a peanut butter sandwich, without butter, on white bread, is estimated at: 24.31 + 7.04(0) + 1.12(0) - 1.21(1) + 0.0(0) + 8.31(1) + 27.28(0) + 0.0(1) + 11.40(0) ≈ 31.41 ants. (You can have Minitab do these calculations for you. Simply choose Stat > Regression > Regression > Predict and enter the predictor levels in the dialog box.)

One issue that can arise if you use ordinary linear regression with a count response is that, at certain predictor levels, the regression equation may estimate negative values for the response. But a negative "count" of ants—or anything else—doesn't make any sense. In that case, the equation may not be practically useful.

For this particular data set, it's not a problem. Using the regression equation, the lowest possible estimated response is for a Vegemite sandwich on white bread without butter (24.31 - 1.21), which yields an estimate of about 23 ants. Negative estimates don't occur primarily because the counts in this data set are all considerably greater than 0. But often that's not the case.

Evaluating the Model Fit and Assumptions

Regardless of whether you're performing ordinary linear regression with a continuous response variable or a discrete response variable of counts, it's important to assess the model fit, investigate extreme outliers, and check the model assumptions. If there's a serious problem, your results might not be valid.

The R-squared (adj) value suggests this model explains about half of the variation of the ant count (47.35%). Not great—but not not bad for a linear regression model with only a few categorical predictors. For this particular analysis, the ANOVA output also includes a p-value for lack-of-fit.

If the p-value for lack-of-fit is less than 0.05, there's statistically significant evidence that the model does not fit the data adequately. For this model, the p-value here is greater than 0.05. That means there's not sufficient evidence to conclude that the model doesn't fit well. That's a good thing.

Minitab's regression output also flags unusual observations, based on the size of their residuals. Residuals, also called "model errors", measure how much the response values estimated by the regression model differ from the actual response values in your data. The smaller a residual, the closer the value estimated by the model is to the actual value in your data. If a residual is unusually large, it suggests the the observation may be an outlier that's "bucking the trend" of your model.

For the ant count sample data, three observations are flagged as unusual:

If you see unusual values in this table, it's not a cause for alarm. Generally, you can expect roughly 5% of the data values to have large standardized residuals. But if there's a lot more than that, or if the size of a residual is unusually large, you should investigate.

For this sample data set of 48 observations, the number of unusual observations is not worrisome. However, two of the observations (circled in red) appear to be very much out-of-whack with the other observations. To figure out why, I went back to the original sample data set online, and found this note from the experimenter:

"Two results are large outliers. A reading of 97 was due to…leaving a portion of sandwich behind from the previous observation (i.e., there were already ants there); and one of 2 was due to [the sandwich portion be placed] too far away from the entrance to the [ant] hill.”

Because these outliers can be attributed to a special (out-of-the-ordinary) cause, it would be OK to remove them and re-run the analysis, as long you clearly state that you have done so (and why). However, in this case, removing these two outliers doesn't significantly change the overall results of the linear regression analysis, anyway (for brevity, I won't include those results here).

Finally, examine the model assumptions for the regression analysis. In Minitab, choose Stat > Regression > Fit Regression Model. Then click Graphs and check Four in one.

The two plots on the left (the Normal Probability Plot and the Histogram) help you assess whether the residuals are normally distributed. Although normality of the residuals is a formal assumption for ordinary linear regression, the analysis is fairly robust (resilient) to this assumption if the data set is sufficiently large (greater than 15 or so). Here, the points fall along the line of the normal probability plot and the histogram shows a fairly normal distribution. All is well..

Constant variance of the residuals is a more critical assumption for linear regression. That means the residuals should be distributed fairly evenly and randomly across all the fitted (estimated) values. To assess constant variance, look at the Residuals versus Fits plot in the upper right. In the plot above, the points appear to be randomly scattered on both sides of the line representing a residual value of 0. Again, no evidence of a problem.

With this sample data, using ordinary linear regression with a count response seems to work OK. But with different count data, might things have worked out differently? We'll examine that in the next post (Part 2).

Meanwhile, kick back and fix yourself a ham and pickle sandwich on rye with butter. And keep an eye out for meat ants.

Whatever industry you're in, you're going to need to buy supplies. If you're a printer, you'll need to purchase inks, various types of printing equipment, and paper. If you're in manufacturing, you'll need to obtain parts that you don't make yourself.

But how do you know you're making the right choice when you have multiple suppliers vying to fulfill your orders? How can you be sure you're selecting the vendor with the highest quality, or eliminating the supplier whose products aren't meeting your expectations?

Let's take a look at an example from automotive manufacturing to see how we can use data to make an informed decision about the options.

Camshaft Problems

Thanks to camshafts that don’t meet specifications, too many of your car company's engines are failing. That’s harming your reputation and revenues. Your company has two different camshaft suppliers, and it's up to you to figure out if camshafts from one or both of them are failing to meet standards.

The camshafts in your engines must be 600 mm long, plus or minus 2 mm. To acquire a basic understanding of how your suppliers are doing, you measure 100 camshafts from each supplier, sampling 5 shafts each from 20 different batches and record the data in a Minitab worksheet.

Once you have your data in hand, the Assistant in Minitab Statistical Software can tell you what the data say. If you're not already using our software and you want to play along, you can get a free 30-day trial version.

Step 1: Graph the Data

Seeing your data in graphical form is always a good place to start your analysis. So in Minitab, select Assistant > Graphical Analysis.

The Assistant offers you a choice of several different graph options for each of three possible objectives. Since we're not graphing variables over time, nor looking for relationships between variables, let's consider the options available under "Graph the distribution of data," which include histograms, boxplots, and Pareto charts among others.

A basic summary of the data is a good place to start. Click the Graphical Summary button and complete the dialog box as shown.

The Assistant outputs a Diagnostic Report, a Report Card, and the Summary Report shown below.

The data table in this summary report reveals that the means of the camshafts sampled from supplier 2 and supplier 1 are both very close to the target of 600 mm.

But the report also reveals a critical difference between the suppliers: while neither data set contains any outliers, there is clearly more variation in the lengths of camshafts from supplier 2.

The supplier 1 distribution graph on the left is clustered tightly around the mean, while the one for supplier 2 reflects a wider range of values. The graph of the data in worksheet order shows that supplier 1’s values hew tightly to the center line, compared to supplier 2’s more extreme up-and-down pattern of variation.

Returning to the table of summary statistics at the bottom of the output, a check of the basic statistics quantifies the difference in variation between the samples: the standard deviation of supplier 1's samples is 0.31, compared to 1.87 for supplier 2.

We already have solid evidence to support choosing supplier 1 over supplier 2, but the Assistant in Minitab makes it very easy to get even more insight about the performance of each supplier so you can make the most informed decision.

Step 2. Perform a Capability Analysis

You want to assess the ability of each supplier to deliver camshafts relative to your specifications, so you select the Capability Analysis option in the Assistant menu.

The analysis you need depends on what type of data you have. The measurements you’ve collected are continuous. The Assistant directs you to the appropriate analysis.

Clicking the more… button displays the requirements and assumptions that need to be satisfied for the analysis to be valid.

You already know your data are reasonably normal, that it’s been collected in rational subgroups, and that you have enough data for reliable estimates—we saw this in the initial Graphical Analysis.

However, the Assistant also notes that you should collect data from a process that is stable. You haven’t evaluated that yet…but fortunately, the Assistant automatically assesses process stability as part of its capability analysis.

Confident that your data were collected appropriately, you click the Capability Analysis button and complete the dialog box as shown to analyze Supplier 1:

The Assistant produces all the output you need in a clear, easy-to-follow format. The Diagnostic Report offers detailed information about the analysis, while the Report Card flags potential problems. In this case, the Report Card verified that the data were from a stable process and that the analysis should be reasonably precise and accurate.

The Summary Report gives you the bottom-line results of the analysis.

As shown in the scale at the top left of the report, Supplier 1’s process is very capable of delivering camshafts that meet your requirements. The histogram shows that all of the data fall within specifications, while the Ppk measurement of overall capability is 1.94, exceeding the typical industry benchmark of 1.33.

Now you perform the analysis for Supplier 2. Once again, the Report Card verifies the stability of the process and confirms that the capability analysis should be reasonably accurate and precise. However, the Assistant’s Summary Report for Supplier 2 reveals a very different situation in most other respects:

The scale at the top left of the report shows that Supplier 2’s ability to provide parts that meet your specifications is quite low, while the histogram shows that an alarming number of the camshafts in your sample fall outside your spec limits. Supplier 2’s Ppk is 0.31, far below the 1.33 industry benchmark. And with a defect rate of 28.95%, you can expect more than a quarter of the motors you assemble using Supplier 2’s camshafts to require rework!

You can use the Assistant output you created to explain exactly why you should continue to acquire camshafts from Supplier 1. In addition, even though you’ll discontinue using Supplier 2 for now, you can give this supplier important information to help them improve their processes for the future.

A Clear Answer

Whatever your business, you count on your suppliers to provide deliverables that meet your requirements. You have seen how the Assistant can help you visualize the quality of the goods you receive, and perform a detailed analysis of your suppliers’ capability to deliver quality products.

How many samples do you need to be “95% confident that at least 95%—or even 99%—of your product is good?

The answer depends on the type of response variable you are using, categorical or continuous. The type of response will dictate whether you 'll use:

Attribute Sampling: Determine the sample size for a categorical response that classifies each unit as Good or Bad (or, perhaps, In-spec or Out-of-spec).
Variables Sampling: Determine the sample size for a continuous measurement that follows a Normal distribution.

The attribute sampling approach is valid regardless of the underlying distribution of the data. The variables sampling approach has a strict normality assumption, but requires fewer samples.

In this blog post, I'll focus on the attribute approach.

Attribute Sampling

A simple formula gives you the sample size required to make a 95% confidence statement about the probability an item will be in-spec when your sample of size n has zero defects.

, where the reliability is the probability of an in-spec item.

For a reliability of 0.95 or 95%,

For a reliability of 0.99 or 99%,

Of course, if you don't feel like calculating this manually, you can use the Stat > Basic Statistics > 1 Proportion dialog box in Minitab to see the reliability levels for different sample sizes.

one-sample-proportion

These two sampling plans are really just C=0 Acceptance Sampling plans with an infinite lot size. The same sample sizes can be generated using Stat > Quality Tools > Acceptance Sampling by Attributes by:

Setting RQL at 5% for 95% reliability or 1% for 99% reliability.
Setting the Consumer’s Risk (β) at 0.05, which results in a 95% confidence level.
Setting AQL at an arbitrary value lower than the RQL, such as 0.1%.
Setting Producer’s Risk (α) at an arbitrary high value, such as 0.5 (note, α must be less than 1-β to run).

By changing RQL to 1%, the following C=0 plan can be obtained:

If you want to make the same confidence statements while allowing 1 or more defects in your sample, the sample size required will be larger. For example, allowing 1 defect in the sample will require a sample size of 93 for the 95% reliability statement. This is a C=1 sampling plan. It can be generated, in this case, by lowering the Producer’s risk to 0.05.

As you can see, the sample size for an acceptance number of 0 is much smaller, but how realistic is this objective? That's a question you will need to answer.

Check out this post for more information about acceptance sampling.

By Matthew Barsalou, guest blogger

Teaching process performance and capability studies is easier when actual process data is available for the student or trainee to practice with. As I have previously discussed at the Minitab Blog, a catapult can be used to generate data for a capability study. My last blog on using a catapult for this purspose was several years ago, so I would like to revisit this topic with an emphasis on interpreting the catapult study results in the Minitab Statistical Software’s Capability SixpackTM. The catapult is can be used in various configurations, but here the settings will stay constant to simulate a manufacturing process. The plans and assembly instructions are available here.

The Catapult Study

The catapult used a 120 mm diameter heavy-duty rubber band originally intended for use in model airplanes. The rubber band guide was set at 4 cm and the arm stopper was set at 1 cm. The starting point was set at 8 cm and these settings were held constant for the duration of the study. Three operators each performed 2 runs of 20 shots each to simulate two days of production with three shifts per day. Each run was used a separate subgroup in the capability and performance study.

The capability indices Cp and Cpk use short-term data to tell us what the process sis capable of doing and the performance indices Pp and Ppk use long-term data to tell us what the process is actually doing. The capability indices use “within” variation in the formula, and performance indexes use “overall” variation; within variation is based on the pooled standard deviations of the subgroups and overall variation is based on the standard deviation of the entire data set.

There are requirements that must be met to perform a capability or performance study. The data must be normally distributed and the process needs to be in a state of statistical control. The data must also be randomly selected, and it needs to represent the population. There should be at least 100 values in the data set; otherwise, there will be a very wide confidence interval for the resulting capability and performance values. The person planning the study must ensure there is sufficient data and the data represents the values in the population; however, the Capability Sixpack in Minitab Statistical Software can be used to ensure the other requirements are fulfilled.

The figure below shows a Capability Sixpack for the catapult study.

The Capability Sixpack

The Capability Sixpack provides an I chart when the data consists of individual values; i.e. the subgroup size is 1. An Xbar chart is provided when the data is entered as subgroups. Either control chart can be used to assess the stability of the process. The process will need improvement to achieve stability if out of control values are seen in a control chart. The source of the variability in the process should be sought out and removed and then the study should be repeated.

A moving range chart is given when the subgroup sizes is 1, and an S chart is given when the subgroup size is greater than 1. The values in the moving range chart should be compared to the values in the I chart to ensure no patterns are present. The same should be done for the Xbar and S chart if they are used. This is done to help ensure the data are truly random. Either the last 25 observations or the last 5 subgroups will be shown. The last 25 observations are shown if the data is entered as 1 subgroup and the last 5 subgroups are shown if the data are entered as subgroups. The values should appear random and without trends or shifts if the process is stable.

A capability histogram is shown to compare the histogram of the data to the specification limits. The data should approximate the standard normal distribution. The line for overall shows the shape of a histogram using the overall standard deviation. The within line shows the shape of the histogram using the pooled standard deviation of the subgroups.

A normal probability plot is provided to assess the normality of the data. A p-value of less than 0.05 indicates the data is not normally distributed. Data that is not normally distributed can't be used in a capability study. You may transform non-normal data, or identify and remove the cause of the lack of normality. The better option is to improve the process so that the data is normally distributed. The Capability Sixpack can’t be used if the data hits a boundary such as 0 or an upper or lower limit; however, the regular capability study option can still be used as a checkmark is placed next to the boundary indicator beside the specification limit.

The capability plot displays the capability and performance of the process. The capability of a process is measured using Cp and Cpk, and both tell us what the process is capable of. They are intended for use with short-term data and use the pooled standard deviation of rational subgroups to tell us what the process is capable of. Rational subgroups use homogenous data so that only common cause variation is present. For example, parts may have all been produced on the same machine, using the same batch of raw material, by the same operator. The Cp compares the spread of the process to the specification limits; a process with a high Cp value may produce parts out of specification if the process is off-center. The Cpk considers position of the process mean relative to specification limits and there are actually two values for Cpk, the Cpk of the upper specification limit and the Cp of the lower specification limit. The Capability Sixpack lists the value of the worse performing of the two Cp values.

The performance of a process is measured using Pp and Ppk with long-term data. Generally, more than 30-days' worth of production data should be used for Pp and Ppk. Unlike the capability indices Cp and Cpk, Pp and Ppk calculations are performed using the total standard deviation, which is the same as the formula for a sample standard deviation. The Pp compares the spread of the process to the upper and lower specification limits and only the worse performing value is given. The Ppk considers position of the process mean relative to specification limits.

The process capability index of the mean is the Cpm, which uses a target value to account for the process mean relative to the target. However, this is only given if a target value is entered in Minitab.

Conclusion

The Minitab Capability Sixpack will quickly and easily provide a capability study; however, it will not directly tell you if the data is unstable for a capability study. It does, however, provide methods for assessing the suitability of the data and they should be used every time a capability study is performed.

About the Guest Blogger

Since it's the Halloween season, I want to share how a classic horror film helped me get a handle on an extremely useful statistical distribution.

The film is based on John W. Campbell's classic novella "Who Goes There?", but I first became familiar with it from John Carpenter's 1982 film The Thing.

In the film, researchers in the Antarctic encounter a predatory alien with a truly frightening ability: it can assume the form of any living thing it touches. It's a shapeshifter. A mimic with the uncanny ability to take on the characteristics of other beings. Soon, the researchers realize that they can no longer be sure who among them is really human, and who is not.

So what does that have to do with statistics? Meet the Weibull distribution, or, as I like to think of it, "The Thing" of statistical distributions. The Weibull distribution can take on the characteristics of many other distributions. The good news is, unlike The Thing, the Weibull distribution's ability to shapeshift is very helpful.

The Weibull Distribution Can't Be Nailed Down

Because the Weibull distribution can assume the form of many different distributions, it's a favorite among quality practitioners and engineers, and it's by far the most commonly used distribution for modeling reliability data. Just like "The Thing," the Weibull distribution is adaptable enough to be able to pass for other things—in this case, a variety of other distributions.

Got right-skewed, left-skewed, or symmetric data? You can model it with Weibull, no problem. That flexibilidty lets engineers use Weibull distribution to evaluate the reliability of everything from ball bearings to vacuum tubes.

The Weibull distribution can also model hazard functions that are decreasing, increasing or staying constant, so it to be used to model any phase of an item’s lifetime, from right after launch to the end of its usefulness.

How "The Thing" the Weibull Curve Changes Shape

To illustrate how flexible the Weibull distribution is, let's look at some examples in Minitab Statistical Software. (Care to follow along, but don't have Minitab? Just download the free 30-day trial.)

Start by choosing Graph > Probability Distribution Plot, which brings up this dialog box:

probability distribution plots

Select "View Single," and then choose "Weibull" in the Distribution drop-down menu. The subsequent dialog box will let you specify three parameters: shape, scale, and threshold.

The threshold parameter indicates the distribution's shift away from 0. A negative threshold will shift the distribution to the left of 0, while a positive threshold shifts it to the right. (All data must be greater than the threshold.)

The scale parameter is the 63.2 percentile of the data, and this value defines the Weibull curve's relation to the threshold, in the same way that the mean defines a normal curve's position. For our purposes, let's say we're testing reliability, and that 63.2 percent of the items we test fail within the first 10 hours following the threshold time. So our scale would be 10.

The shape parameter, unsurprisingly enough, describes the Weibull curve's shape. Changing the shape value enables you to use Weibull to model the characteristics of many different life distributions.

Entire books have been written about how these three parameters affect the characteristics of the Weibull distribution, but for this discussion we'll focus on how the value of shape can influence the curve. I'll show these examples one-by-one, but you can have Minitab display them together on a single plot if you select "Vary Parameters" instead of "View Single" in the first dialog box shown above.

Weibull with Shape Less Than 1

Let's start with a shape between 0 and 1. You may choose any value you like in that range. I'm going to enter enter 0.4, and when I press "OK", Minitab gives me the graph below:

The graph shows that probability decreases exponentially from infinity. If you're thinking about reliability, or the rate of failures, the Weibull distribution with these parameters would fit data that have a high number of initial failures. Then the failures decrease over time as the defective items are eliminated from the sample. These early failures are frequently referred to as "infant mortality," because they occur in the early stage of a product's life.

Weibull Distribution with shape between 0 and 1

Weibull with Shape = 1

When the shape is equal to 1, the Weibull distribution decreases exponentially from 1/alpha, where alpha = the scale parameter. In other words, the failure rate remains fairly consistent over time. This Weibull distribution's shape is applicable to data about random failures and multiple-cause failures, and can be used to model the useful life of products.

Weibull Distribution with shape = 1

Weibull with Shape Between 1 and 2

When the shape parameter is between 1 and 2, Weibull crests quickly, then decreases more gradually. The most rapid failure rate occurs initially. This shape indicates failures due to early wear-out.

Weibull Distribution with shape value between 1 and 2

Weibull with Shape = 2

When the shape parameter is equal to 2, Weibull approximates a linearly increasing failure rate, where the risk of wear-out failure increases steadily over the product's lifetime. (This variant of the Weibull distribution is also referred to as the Rayleigh distribution.)

Weibull Distribution with Shape = 2 AKA Rayleigh Distribution

Weibull with Shape Between 3 and 4

When the shape parameter falls between 3 and 4, Weibull becomes symmetric and bell-shaped, like the normal curve. For reliability, this form of the distribution suggests rapid wear-out failures during the final period of product life, when most failures happen.

Weibull distribution symmetric shape value = 3.5

Weibull with Shape > 10

When the shape is more than 10, the Weibull distribution is similar to an extreme value distribution. This form of the distribution can approximate the final stage of a product's life.

Weibull Distribution shape value = 20 skewed

Will "The Thing" Weibull Always Win?

When it comes to analyzing reliability, Weibull is the de facto default distribution, but other distribution families also can model a variety of distributional shapes. You want to find the distribution that gives you the very best fit for your data, and that may not be a Weibull. For instance, the lognormal distribution is typically used to model failures caused by chemical reactions or corrosion.

To assess the fit of your data using Minitab’s Distribution ID plot, you can use Stat > Reliability/Survival > Distribution Analysis (Right-Censoring or Arbitrary Censoring). If you want more details about that, check out this post on identifying your data's distribution.

By Matthew Barsalou, guest blogger

A problem must be understood before it can be properly addressed. A thorough understanding of the problem is critical when performing a root cause analysis (RCA) and an RCA is necessary if an organization wants to implement corrective actions that truly address the root cause of the problem. An RCA may also be necessary for process improvement projects; it is necessary to understand the cause of the current level performance before attempts are made to improve the performance.

Many statistical tests related to problem-solving can be performed using Minitab Statistical Software. However, the actual test you select should be based upon the type of data you have and what needs to be understood. The figure below shows various statistical options structured in a cause-and-effect diagram with the main branches based on characteristics that describe what the tests and methods are used for.

The main branch labeled “differences” is split into two high-level sub-branches: hypothesis tests that have an assumption of normality, and non-parametric tests of medians. The hypothesis tests assume data is normally distributed and can be used to compare means, variances, or proportions to either a given value or to the value of a second sample. An ANOVA can be performed to compare the means of two or more samples.

The non-parametric tests listed in the cause-and-effect diagram are used to compare medians, either to a specified value, or two or more medians, depending upon which test is selected. The non-parametric tests provide an option when data is too skewed to use other options, such as a Z-test.

Time may also be of interest when exploring a problem. If your data are recorded in order of occurrence, a time series plot can be created to show each value at the time it was produced; this may give insights into potential changes in a process.

A trend analysis looks much like the time series plot; however, Minitab also tests for potential trends in the data such as increasing or decreasing values over time. Exponential smoothing options are available to assign exponentially decreasing weights to the values over time when attempting to predict future outcomes.

Relationships can be explored using various types of regression analysis to identify potential correlations in the data such as the relationship between the hardness of steel and the quenching time of the steel. This can be helpful when attempting to identify the factors that influence a process. Another option for understanding relationships is Design of Experiments (DoE), where experiments are planned specifically to economically explore the effects and interactions between multiple factors and a response variable.

Another main branch is for capability and stability assessments. There are two main sub-branches here; one is for measures of process capability and performance and the other is for Statistical Process Control (SPC), which can assess the stability of a process.

The measures of process performance and capability can be useful for establishing the baseline performance of a process; this can be helpful in determining of process improvement activities have actually improved the process. The SPC sub-branch is split into three lower-level sub-branches; these are control charts for attribute data such as number of defective units, control charts for continues data such as diameters, and time-weighted charts that don’t give all values equal weights.

Control charts can be used for both assessing the current performance of a process such as by using an individual’s chart to determine if the process is in a states of statistical control, or for monitoring the performance of a process such as after improvements have been implemented.

Exploratory data analysis (EDA) can be useful for gaining insights to the problem using graphical methods. The individual values plot is useful for simply observing the position of each value relative to the other values in a data set. For example, a box plot can be helpful when comparing the means, medians and spread of data from multiple processes. The purpose of EDA is not to form conclusions, but to gain insights that can be helpful in forming tentative hypotheses or in deciding which type of statistical test to perform.

The tests and methods presented here do not cover all available statistical tests and methods in Minitab; however, they do provide a large selection of basic options to choose from.

These tools and methods are helpful when exploring a problem, but their use should not be limited to problem exploration. They can also be helpful for planning and verifying improvements. For example, an individual value plot may indicate one process performs better than a comparable process, and this can then be confirmed using a two-sample t test. Or, the settings of the better process can be used to plan a DoE to identify the optimal settings for the two processes and the improvements can be monitored using an xBar and S chart for the two processes.

About the Guest Blogger

Control charts are a fantastic tool. These charts plot your process data to identify common cause and special cause variation. By identifying the different causes of variation, you can take action on your process without over-controlling it.

Assessing the stability of a process can help you determine whether there is a problem and identify the source of the problem. Is the mean too high, too low, or unstable? Is variability a problem? If so, is the variability inherent in the process or attributable to specific sources? Control charts answer these questions, which can guide your corrective efforts.

Determining that your process is stable is good information all by itself, but it is also a prerequisite for further analysis, such as capability analysis. Before assessing process capability, you must be sure that your process is stable. An unstable process is unpredictable. If your process is stable, you can predict future performance and improve its capability.

While we associate control charts with business processes, I’ll argue in this post that control charts provide the same great benefits in other areas beyond statistical process control (SPC) and Six Sigma. In fact, you’ll see several examples where control charts find answers that you’d be hard pressed to uncover using different methods.

The Importance of Assessing Whether Other Types of Processes Are In Control

I want you to expand your mental concept of a process to include processes outside the business environment. After all, unstable process levels and excessive variability can be problems in many different settings. For example:

A teacher has a process that helps students learn the material as measured by test scores.
A diabetic has a process for keeping blood sugar in control.
A researcher has a process that causes subjects to experience an impact of 6 times their body weight.

All of these processes can be stable or unstable, have a certain amount of inherent variability, and can also have special causes of variability. Understanding these issues can help improve all of them.

The third bullet relates to a research study that I was involved with. Our research goal was to have middle school subjects jump from 24-inch steps, 30 times, every other school day to determine whether it would increase their bone density. We defined our treatment as the subjects experiencing an impact of 6 body weights. However, we weren’t quite hitting the mark.

To guide our corrective efforts, I conducted a pilot study and graphed the results in the Xbar-S chart below.

Xbar-S chart of ground reaction forces for pilot study

The in-control S chart (bottom) shows that each subject has a consistent landing style that produces impacts of a consistent magnitude—the variability is in control. However, the out-of-control Xbar chart (top) indicates that, while the overall mean (6.141) exceeds our target, different subjects have very different means. Collectively, the chart shows that some subjects are consistently hard landers while others are consistently soft landers. The control chart suggests that the variability is not inherent in the process (common cause variation) but rather assignable to differences between subjects (special cause variation).

Based on this information, we decided to train the subjects how to land and to have a nurse observe all of the jumping sessions. This ongoing training and corrective action reduced the variability enough so that the impacts were consistently greater than 6 body weights.

Control Charts as a Prerequisite for Statistical Hypothesis Tests

As I mentioned, control charts are also important because they can verify the assumption that a process is stable, which is required to produce a valid capability analysis. We don’t often think of using control charts to test the assumptions for hypothesis tests in a similar fashion, but they are very useful for that as well.

The assumption that the measurements used in a hypothesis test are stable is often overlooked. As with any process, if the measurements are not stable, you can’t make inferences about whatever you are measuring.

Let’s assume that we’re comparing test scores between group A and group B. We’ll use this data set to perform a 2-sample t-test as shown below.

two sample t-test results

The results appear to show that group A has the higher mean and that the difference is statistically significant. Group B has a marginally higher standard deviation, but we’re not assuming equal variances, so that’s not a problem. If you conduct normality tests, you’ll see that the data for both groups are normally distributed—although we have a sufficient number of observations per group that we don’t have to worry about normality. All is good, right?

The I-MR charts below suggest otherwise!

I-MR chart for group A

I-MR chart of group B

The chart for group A shows that these scores are stable. However, in group B, the multiple out-of-control points indicate that the scores are unstable. Clearly, there is a negative trend. Comparing a stable group to an unstable group is not a valid comparison even though the data satisfy the other assumptions.

This I-MR chart illustrates just one type of problem that control charts can detect. Control charts can also test for a variety of patterns in the data and for out-of-control variability. As these data show, you can miss problems using other methods.

Using the Different Types of Control Charts

The I-MR chart assesses the stability of the mean and standard deviation when you don’t have subgroups, while the XBar-S chart shown earlier assesses the same parameters but with subgroups.

You can also use other control charts to test other types of data. In Minitab, the U Chart and Laney U’ Chart are control charts that use the Poisson distribution. You can use these charts in conjunction with the 1-Sample and 2-Sample Poisson Rate tests. The P Chart and Laney P’ Chart are control charts that use the binomial distribution. Use these charts with the 1 Proportion and 2 Proportions tests.

if you're using Minitab Statistical Software, you can choose Assistant > Control Charts and get step-by-step guidance through the process of creating a control chart, from determining what type of data you have, to making sure that your data meets necessary assumptions, to interpreting the results of your chart.

Additionally, check out the great control charts tutorial put together by my colleague, Eston Martz.

There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc.

I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser tag game session one Saturday afternoon. The three teams represented three different groups of friends wishing to spend their afternoon tagging players from competing teams. Gengiz Khan turned out to be the best player, followed by Tarantula and Desert Fox.

One-Way ANOVA

In this post, I will focus on team performances, not on single individuals. I decided to compare the average scores of each team. The best tool I could possibly think of was a one-way ANOVA using the Minitab Assistant (with a continuous Y response and three sample means to compare).

To assess statistical significance, the differences between team averages are compared to the within (team) variability. A large between-team variability compared to a small within-team variability (the error term) means that the differences between teams are statistically significant.

In this comparison (see the output from the Assistant below), the P value was 0.053, just above the 0.05 standard usual threshold. The P value is the probability that the differences in observed means are only due to random causes. A p-value above 0.05, therefore, indicates that the probability that such differences are only due to random causes is not negligible. Because of that, the differences are not considered to be statistically significant (there is "not enough evidence that there are significant differences," according to the comments in Minitab Assistant). But the result remains somewhat ambiguous since the p-value is still very close to the significance limit (0.05).

Note that the variability within the Blue team seems to be much larger (see the confidence interval plot in the means comparison chart below) than for the other two groups. This not a cause for concern in this case, since the Minitab Assistant uses the Welch method of ANOVA, which does not require or assume variances within groups to be equal.

Outliers and Normality

When looking at the distribution of individual data (below) one point seems to be an outlier or at least a suspect, extreme value (marked in red). This is Gengiz Khan, the best player. In my worksheet, the scores have been entered from the best to the worst (not in time order). This is why we can see a downward trend in the chart on the right site of the diagnostic report (see below).

The Report Card (see below) from the Minitab Assistant shows that Normality might be an issue (the yellow triangle is a warning sign) because the sample sizes are quite small. We need to check normality within each team. The second warning sign is due to the unusual / extreme data (score in row 1) which may bias our analysis.

Following the suggestion from the warning signal in the Minitab Assistant Report Card, I decided to run a normality test. I performed a separate normality test for each team in order not to mix different distributions together.

A low P value in the normal probability plot (see below) signals a significant departure from normality. This p-value is below 0.05 for the Blue team. The points located along the normal probability plot line represent “normal,” common, random variations. The points at the upper or lower extreme, which are distant from the line, represent unusual values or outliers. The non-normal behavior in the probability plot of the blue team is clearly due to the outlier on the right side of the normal probability plot line.

Should we remove this value (Gengiz Khan’s score) in the Blue group and rerun the analysis without him ?

Even though Gengiz Khan is more experienced and talented than the other team members, there are no particular reasons why he should be removed—he is certainly part of the Blue team. There are probably many other talented laser game players around. If another additional laser game session takes place in the future, there will probably still be a large difference between Gengiz Khan and the rest of his team.

The problem is that this extreme value tends to inflate the within-group variability. Because there is a much larger within-team variability for the blue team, differences between groups when they are compared to the residual / within variability do not appear to be significant, causing the p-value to move just above the significance threshold.

A Non-parametric Solution

One possible solution is to use a non-parametric approach. Non-parametric techniques are based on ranks, or medians. Ranks represent the relative position of an individual in comparison to others, but are not affected by extreme values (whereas a mean is sensitive to outlier values). Ranks and medians are more “robust” to outliers.

I used the Kruskal-Wallis test (see the correspondence table between parametric and non-parametric tests below). The p-value (see the output below) is now significant (less than 0.05), and the conclusion is completely different. We can consider that the differences are significant .

Kruskal-Wallis Test: Score versus Team

Kruskal-Wallis Test on Score

Team N Median Ave Rank Z

Blue 9 2550,0 23,7 2,72

Pink 13 -450,0 11,6 -2,44

Yellow 10 975,0 16,4 -0,06

Overall 32 16,5

H = 8,86 DF = 2 P = 0,012

H = 8,87 DF = 2 P = 0,012 (adjusted for ties)

See below the correspondence table for parametric and non-parametric tests :

Conclusion

Outliers do happen and removing them is not always straightforward. One nice thing about non-parametric tests is that they are more robust to such outliers. However, this does not mean that non-parametric tests should be used in any circumstance. When there are no outliers and the distribution is normal, standard parametric tests (T tests or ANOVA) are more powerful.

The Easiest Way to Do Capability Analysis

What Should I Do If My Data Is Not Normal?

A Field Guide to Statistical Distributions

Improving Recycling Processes at Rose-Hulman, Part III

Graphing Distributions with Probability Distribution Plots

Pencils and Plots: Assessing the Normality of Data

T-tests for Speed Tests: How Fast Is Internet Speed?

Lessons from a Statistical Analysis Gone Wrong, part 1

Lessons from a Statistical Analysis Gone Wrong, Part 3

High School Researchers: What Do We Do with All of this Data?

The Null Hypothesis: Always “Busy Doing Nothing”

Using Probability Distribution Plots to See Data Clearly

Regression with Meat Ants: Analyzing a Count Response (Part 1)

Which Supplier Should You Choose? Check the Data.

How Many Samples Do You Need to Be Confident Your Product Is Good?

Using a Catapult as a Minitab Capability Sixpack Training Aid

"The Thing" and Your Data: Meet the Shapeshifter Distribution

Practical Statistical Problem Solving Using Minitab to Explore the Problem

Control Charts - Not Just for Statistical Process Control (SPC) Anymore!

Why You Should Use Non-parametric Tests when Analyzing Data with Outliers