Design of experiments via taguchi methods: one and two way layouts
<analytics uacct="UA-11196190-1" ></analytics>
Title: Design of experiments via taguchi methods: one and two way layouts
Authors: Joseph Casler, Andry Haryanto, Seth Kahle and Weiyin Xu
Date Presented: 5 December 2006
Date Revised: 30 November 2006
Experimental design is a strategy to gather empirical knowledge, i.e. knowledge based on the analysis of experimental data and not on theoretical models. It can be applied whenever you intend to investigate a phenomenon in order to gain understanding or improve performance.
Design of Experiment (DOE) is a structured, organized method to define the relationship between factors (X) that affect a process and its responses (Y). It involves designing a set of experiments, in which all relevant factors are varied systematically. The results of these experiments are analyzed to help identify optimal conditions, the factors that most influence the results, and those that do not. The existence of interactions and synergies between factors can also be determined.
A 1-way experimental layout is the method used to determine the effect of a single factor on the process output. Similarly, a 2-way experimental layout determines the effects of two factors on a process output. A 2-way layout can be done instead of two separate 1-way layouts; doing this saves time and money. 1-way and 2-way layouts are used for determining whether different factors affect the system/population and thus help in isolating factors that can be used for the optimization of processes. The data obtained from these layouts can be used to perform linear and quadratic regressions, unlike factorial designs which can only perform linear analysis.
One should be aware that 1-way and 2-way layouts (and therefore ANOVA too) can only be used to look at 2 factors at a maximum. If larger numbers of factors need to be considered, orthogonal arrays should be used.
A working mean is a figure that is roughly equal to the overall mean. This is typically used to subtract from the data points to simplify the data, which is called data transformation. Transforming the data doesn't affect any of the calculations needed to perform the necessary analysis. Therefore transformation is performed to make hand calculations faster and easier.
Analysis of Variance (ANOVA)
ANOVA is an analysis of the variation present in an experiment. Variation can be defined as the degree to which two or more things being compared differ. ANOVA is a general technique that can be used to test the hypothesis that the means among two or more groups are equal by comparing the variance at a specific confidence level. In other words, it tests the hypothesis that the variation in an experiment is no greater than that due to normal variation in measurements. ANOVA is performed as an alternative to taking several t-tests between group means. Assumptions of this technique include that the sampled populations are normally distributed and have equal variances.
The term one-way ANOVA, for a one-way layout, describes the analysis of variance performed on a single factor. The term two-way ANOVA, for a two-way layout, describes the analysis of variance performed on two factors. Typically each factor has two or more levels with many repetitions within each level. Repetitions are defined as experimental trials that are repeated at the same conditions. T
The F statistic
The null hypothesis (H0) of the ANOVA is always that the group means are equal (factor is insignificant) and the alternative hypothesis (Ha) is always that the group means are not equal (factor is significant). To determine which hypothesis is correct, a statistic known as the F statistic must be calculated and compared to literature values. The point of setting up an ANOVA table and carrying out all of the calculations is to determine the F statistic. From this, you can easily draw conclusions about the null hypothesis.
In this wiki the calculated test statistic will be referred to as F or F0. The literature value of F is refered to as the "critical F value." The critical F value will be denoted as Fcrit. The F0 statistic calculated from the ANOVA table must be compared to the Fcrit found in an F table in order to draw a conclusion from the analysis. The decision rule (DR) is to conclude H0 when the calculated F0 statistic is equal than or less than Fcrit, otherwise conclude Ha. Choosing the right Fcrit can sometimes be tricky. The information needed is the alpha level, the degrees of freedom of the numerator (the factor being tested), and the degrees of freedom of the denominator (the error). The alpha value is determined by the level of significance that one is testing at. For example, if 95% level of significance is required, then alpha = 1 - 0.95 = 0.05. This alpha value will then be used in choosing the table from which the Fcrit value is obtained from.
An alternative method for determining how data is related is to use the p value. Typically the p value is given in ANOVA outputs from computer analysis and would not be calculated when computing an ANOVA table by hand. The universal decision rule (UDR) is to conclude H0 when the calculated p value is equal to or greater than the significance level (alpha value), otherwise conclude Ha. The conclusion reached from the p value must agree with the conclusion reached from the F0 statistic.
The ANOVA table
An ANOVA table contains the sources of variation, the degrees of freedom, the sum of squares, the mean square, the F0 statistic, and sometimes the P-value. The following are the suggested steps to complete an ANOVA table.
1) Calculate the degrees of freedom for each factor and for the error
2) Calculate the overall mean
3) Calculate the means of each of the levels of each factor
4) Calculate the sum of squares for each factor, of the error, and of the total experiment
5) Calculate mean square of the factors and of the error
6) Calculate the F0 statistic for each factor
The row labeled "between groups" refers to the analysis of the levels of a specific factor. For example, if an experiment is run at different temperature levels (e.g. 50C, 100C, 150C), "between groups" refers to the variation between the data collected at each temperature level (50C compared to 100C compared to 150C); it is not concerned with variation within each level (ie different trials at 50C). If there are multiple factors (e.g. temperature and concentration) that affect the system, there will be multiple "between groups" lines on the ANOVA table. For one-way experiments there is one "between groups" line, and for two-way experiments there are two "between groups" lines.
The row labeled "within groups" contains the error within the levels. The last row will always be the total variation in the experiment. A F0 statistic is only calculated in the "between group" rows and not the error and total rows. This is because analysis can only be performed between groups.
Once the ANOVA table is finished, the factors can be analyzed for significance. The Fcrit is found for each factor and compared to the calculated F0 statistic. The decision rules were outlined in the previous section called "The F Statistic".
A one-way layout is a type of experiment with one factor and 'a' factor levels. An example is an experiment where temperature is a factor (tested at levels 50, 100, and 150 °C), and the yield is the response. There are two types of one-way layouts: ones with an equal number of repititions and ones with unequal number of repititions. The two tables below illustrate this point.
One-Way Anova Table
When you have an experiment with a one-way layout, you compute the F statistic using a one-way ANOVA table. Below is how the ANOVA table is calculated.
Some helpful definitions for this table:
a = the number of levels for a factor
i = level of factor
j = trial # at a given level
ni = the number of trials at the ith factor level
yij = the response value at ith factor level and the jth trial
y.. = overall mean of data
yi. = the mean at ith factor level
The Fcrit for such an analysis is Falpha, a-1, N-a and can be found in the F table. Please note that this ANOVA table can be used for both cases (equal and unequal number of repititions) of a one-way layout.
A two-way layout is a type of experiment with two different factors. In this wiki the first factor will be denoted as factor 'A' and the second will be denoted as factor 'B'. Factor A has 'a' levels and factor B has 'b' levels. An example is an experiment where temperature (levels of 50, 100, 150 °C) and concentration (levels of 1.0, 2.0, 3.0 M) are the factors and the yield is the response. An example is shown in the table below.
Two-Way ANOVA Table
When you have an experiment with a two-way layout, use a two-way ANOVA table to calculate the F statistic. Below is how the two-way ANOVA table is calculated.
Some helpful definitions for this table:
a = the number of levels for the first factor
b = the number of levels for the second factor
i = level of the first factor
j = level of the second factor
ni = the number of trials at the ith factor level for the first factor
nj = the number of trials at the jth factor level for the second factor
yij = the response value at ith and jth factor levels
y.. = overall mean of data
yi. = the mean at the ith factor level for the first factor
y.j = the mean at the jth factor level for the second factor
The Fcrit for such an analysis is Falpha, a-1, (a-1)(b-1) for factor A and Falpha, b-1, (a-1)(b-1) for factor B. This can be found in the F table. Please note that this table is used to analyze a two-way layout without repitition and doesn't take into account interactions between factor A and B.
ANOVA using Excel
In order to perform ANOVA using Microsoft Excel, the analysis toolpak needs to be installed. To do so, go to "Tools" then "Add-Ins" then check the box that says "Analysis Toolpak-VBA" and click "Ok". Now you will be equipped to use ANOVA. To use ANOVA now, just go to "Tools" then "Data Analysis" and you will be able to choose one of 3 different types of ANOVAs to perform. In the sections below, we will outline how and when to use each type of ANOVA analysis.
Imagine that you have been tasked to ensure that the SHORTSNORT decongestant that your manufacturing plant produces over 3 days is not significantly different even though the decongestant has been subjected to a different set of process controls each day. In order to guarantee that the populations are not significantly different at a 95% level of significance, you decide to use Anova to test the populations. 5 random samples were collected from each of the days and their percentage purity is as shown in the table below.
|Sample Number||Day 1||Day 2||Day 3|
Input the data into Excel then use "Anova: Single Factor" under the Data Analysis Tools. The excel file used for this example can be found here.
Below are the results of the Anova single factor analysis:
The above notations used are:
- SS - Sum of squares
- df - Degrees of Freedom
- MS - Mean square for groups
- F - F-value given by MS between groups divided by MS within groups
- Fcrit - Critical F-value that is tabulated in statistical tables
Since the P-value is > 0.05 and F < Fcrit, we can conclude that the 3 days do not have significantly different distributions. Thus, the decongestants over the 3 days are not significantly different at 95% level of significance.
Two-Factor With Replication
Imagine that in the SHORTSNORT decongestant plant, you are asked to optimize profits by optimizing the rate of the recycle stream and the rate of discharging profitable product in the holding tank (T2). In order to do this, you will need to vary V4 and V2. Thus, you hold V4 constant while varying V2 and vice versa. Now, you need to determine optimizing profits by optimizing V4 and V2 individually is feasible. Since you know that it will be feasible only if V4 and V2 are independent of each other, you need to test if this is true. Thus, you decide to use the ANOVA two factor with replication test. The results that you obtained are as shown below.
Using the "Anova: Two-Factor With Replication" data analysis, you get the following results:
Only the important portion of the data analysis results are shown here. For the full results, click here. From our results we see that all 3 p-values are less than 0.05.
- The first p-value for the "sample" is the p-value for effect on profits when varying V2. Since the p-value is less than 0.05, we can conclude that the profits obtained when varying V2 are significantly different. Thus, varying V2 will affect the profits.
- The second p-value for the "column" is the p-value for effect on profits when varying V4. Since the p-value is less than 0.05, we can conclude that the profits obtained when varying V4 are significantly different. Thus, varying V4 will affect the profits.
- Lastly, we wanted to know if there is any interaction between V2 and V4. If there is interaction, than they will not be independent of each other. From our results, the p-value for "interaction" is less than 0.05. Thus, we can conclude that V2 and V4 are significantly different and are independent of each other. We can now effectively optimize the profits by individually optimizing profits by optimizing V2 while keeping V4 constant and vice versa.
Two-Factor Without Replication
Two-factor without replication is useful when data is dependent on 2 variables but there is only one observation for each pair of variables. We will use the same example as above. But assume that you only had time to take one data for each pair of V2 and V4 conditions. The data will now look like this.
Using the "Two-Factor Without Replication" analysis in Excel, we obtain the following results.
Using this analysis, we can only tell if
- there is a the difference in profits are because of random errors of because of the changes in V2
- there is a the difference in profits are because of random errors of because of the changes in V2
but not whether there is any relationship between V2 and V4. Since both the p-values are less than 0.05, once again, we see from this analysis that the differences in the data is due to V2 or V4. Thus, we can optimize the profits by individually optimizing profits by optimizing V2 while keeping V4 constant and vice versa.
Other Useful Excel Functions
The function FDIST(x, degrees of freedom 1, degrees of freedom 2) will return the probability of the x value i.e. the p-value. For example, you have a data value of 23 which has 3 and 3 degrees of freedom. The function used will be FDIST(23,3,3) which will return the value 0.29. You will then compare it to the alpha value that you require. If we want to test this at 95% level of confidence, then alpha = 1 - 0.95 = 0.05.
The function FINV(alpha, degrees of freedom 1, degrees of freedom 2) will return the critical F value at the specified degrees of freedom. For the above example, if we want to know the Fcrit, we will use FINV(0.05,3,3) which will return the value 9.274. Which can then be compared to your calculated F-value.
To summarise the article, when using 1-way and 2-way layouts,
|Hypothesis|| Populations are the same. |
There is no correlation.
| Populations are different. |
There is a correlation.
|F-statistic||F0 < Fcrit||F0 > Fcrit|
|P-value||p > alpha||p < alpha|
NOTE: The p value and f statistic represent the same thing in a different way. Thus, it is impossible to have any combination of results not listed in the table above.
Worked out Example 1
You have just finished experiments on optimizing the yield of a reaction. You are interested in if the reactor temperature affects the yield of this reaction. To investigate this relationship you ran the reaction at temperatures of 50, 100, and 150 °C. You performed each trial 3 times. Determine if the temperature of the reactor affects the yield of this reaction using ANOVA. Use a 95% significance level (alpha = 0.05). The data can be found in Table 1 of this wiki.
To solve this problem, the procedure outlined in the ANOVA section will be used. As discussed earlier, this set of data is a one-way layout with temperature as the factor, with 3 factor levels, and equal number of repititions.
a = 3
N = 9
ni = n1 = n2 = n3 = 3
d.f. for temperature = 3-1 = 2
d.f. for error = 9-3 = 6
y.. = 65
y1. = 45.67
y2. = 63.67
y3. = 85.67
SSA = 3*(45.67-65)2 + 3*(63.67-65)2 + 3*(85.67-65)2 = 2408
SST = (46-65)2 + (44-65)2 + (47-65)2 + (64-65)2 + (62-65)2 + (65-65)2 + (84-65)2 + (87-65)2 + (86-65)2 = 2422
SSE = SST - SSA = 2422 - 2408 = 14
MSA = 2408/2 = 1204
MSE = 14/6 = 2.333
F0 = 1204/2.333 = 516
Fcrit = F0.05, 2, 6 = 5.14
Since Fcrit < F0, conclude Ha and reactor temperature does affect the yield of the reaction.
Worked out Example 2
You are in charge of a chemical process that needs to be optimized. You suspect that two factors of production are not optimized but are not sure if one or both need to be fully tested. To save time and money you test factor A at 5 levels and factor B at 7 levels with no repititions. Afterwards you intend to do a thorough study on all significant factor(s). Finish the ANOVA table below and determine the significance of factors A and B. The level of significance is 95% (alpha = 0.05).
MSA = 2032/4 = 508
MSB = 726/6 = 121
MSE = 3480/24 = 145
F0 = 508/145 = 3.50
Fcrit = 2.78
F0 > Fcrit
F0 = 121/145 = 0.83
Fcrit = 2.51
F0 < Fcrit
Factor A is significant while factor B is insignificant. Perform a more thorough study on factor A to determine the level that optimizes the process.
Multiple Choice Question 1
Which of the following is never true for ANOVA?
A) Equal variances of the sample population isn't an assumption when using ANOVA.
B) A one-way ANOVA describes the analysis of variance used on a single factor.
C) A normal distribution of the sample population is an assumption when using ANOVA.
D) The F statistic is used to perform ANOVA.
Multiple Choice Question 2
Which of the following ANOVA results would make you accept the alternative hypothesis at a 95% level of significance?
A) p > 0.05
B) p < 0.05
C) F > Fcrit
D) Answers B and C
Submitting answers to the multiple choice questions
- Authors of this wiki, please email the correct answers to email@example.com (and please remember to indicate which wiki article the answers correspond to).
- Everyone else, the deadline for submitting your answers is the start of class on Tuesday, 11/7.
You are expected to work on these multiple choice questions under the Honor Code.
Please use the following link to submit your answers to the above multiple choice questions:
- Montgomery, D.C.Design and analysis of Experiments John Wiley & Sons, 2004, 6th ed, ISBN 047148735X