Tuesday, April 10, 2012

SPC - Univariate and Bivariate Analysis

The next tools in this SPC pocketbook are Histogram and Correlation.



In modern terms, these are called Univariate and Bivariate Analysis.

Histogram - aka Univariate Analysis


A histogram is one aspect of univariate analysis. According to the pocket book, the histogram is:
  1. A picture of the distribution: How scattered are the data?
  2. What the pattern of the data are (evenly-spread? Normal distribution?)
  3. Can be used to compare the distribution to the specification

With modern computers, it is easy to create histograms with just a few clicks on your computer (with the $1,800 software JMP). In JMP, go to Analyze > Distribution.


You're going to get a dialog where you get to choose which columns you want to make into histograms. Select the columns and hit Y, Columns. Then click OK.


And voila, you get your histograms (plotted vertically by default) and more metrics than Ron Paul gets media coverage.


You get metrics like mean, standard deviation, standard error. And most importantly, you get visuals on how the data is spread.

Correlation - aka Bivariate Analysis


A correlation is also one specific type of bivariate analysis; the type where you plot numerical values against each other. Other types of bivariate analysis include means-comparisons and ANOVA. But yes, for SPC, the correlation is the most popular.

The pocketbook says that the correlation illustrates the relationship if it exists. From where I sit, the correlation feature is one of the most used functions in applying SPC to large-scale cell culture. Here's why:

While cell culture is complex, a lot of manufacturing phenomenon is simple. Mass-balance across a system is a linear process. Media batching is a linear process. The logarithm of cell density against time is a linear process. Many things can be explored by plotting Y vs. X and seeing if there's a correlation.

To get correlations with JMP, go to Analyze > Fit Y by X on the menu bar


You're going to get a dialog where you can specify which columns to plot on the y-axis (click Y, Columns). Then you get to specify which columns to plot on the x-axis (click X, Factor).


When you click OK, you're going to get your result. If it turns out that your Y has nothing to do with X, you're going to get something like this: a scatter of points where the mean and the correlation basically are on top of each other.


If you get a response that does vary with the factor, you're going to get something like this:



SPC in the information age is effortless. There really is no excuse to not have data-driven decisions that yield high-impact results.


No comments: