Big Data presentation by Roger Hoerl


Big Data – A Challenge for Statistical Leadership

Dr. Roger W. Hoerl
Brate-Peschel Assistant Professor at Union College

2013 Statistical Advocate of the Year Award Luncheon
Chicago Chapter of the American Statistical Association

Thursday 9 May 2013
Noon – 2:00 PM
Maggiano’s Little Italy, 516 North Clark Street, Chicago IL

The Wall Street Journal, New York Times and other respected publications have had major features recently on Big Data – the massive data sets which are becoming commonplace, and on the new, “sexy” data mining methods developed to analyze them. These articles, as well as much of the professional data mining and Big Data literature, may give casual users the impression that if one has a powerful enough algorithm and a lot of data, good models and good results are guaranteed at the push of a button. Obviously, this is not the case.

The leadership challenge to the statistical profession is to insure that Big Data projects are built upon a sound foundation of good modeling, and not upon the sandy foundation of hype and unstated assumptions. Further, we need to accomplish this without giving the impression that we are “against” Big Data or newer methods. I feel that the principles of statistical engineering (see Anderson-Cook and Lu 2012) can provide a path to do just this.

Three statistical engineering principles that are often overlooked or underemphasized by Big Data enthusiasts are the importance of data quality – knowing the “pedigree” of the data; the need to view statistical studies as part of the sequential process of scientific discovery – versus the “one-shot study” so common in textbooks; and the criticality of using subject-matter knowledge when developing models.

I will present examples of the severe problems that can arise in Big Data studies when these principles are not understood or ignored. In summary, I argue that the development of Big Data analytics provides significant opportunities to the profession, but at the same time requires a more proactive role from us, if we are to provide true leadership in the Big Data phenomenon.

About the Speaker: Dr. Roger W. Hoerl is the Brate-Peschel Assistant Professor of Statistics at Union College in Schenectady, NY. Previously, he led the Applied Statistics Lab at GE Global Research. While at GE, Dr. Hoerl led a team of statisticians, applied mathematicians, and computational financial analysts who worked on some of GE’s most challenging research problems, such as developing personalized medicine protocols, enhancing the reliability of aircraft engines, and management of risk for a half-trillion dollar portfolio.

Dr. Hoerl has been named a Fellow of the American Statistical Association and the American Society for Quality, and has been elected to the International Statistical Institute and the International Academy for Quality.  He has received the Brumbaugh and Hunter Awards, as well as the Shewhart Medal, from the American Society for Quality, and the Founders Award and Deming Lectureship Award from the American Statistical Association.  In 2006 he received the Coolidge Fellowship from GE Global Research, honoring one scientist a year from among the four global GE Research and Development sites for lifetime technical achievement.  He used his six-month Coolidge sabbatical to study the global HIV/AIDS pandemic, spending a month traveling through Africa in 2007.

His introductory text Statistical Thinking: Improving Business Performance, co-authored with Ron Snee and now in its second edition, was described as “…probably the most practical basic statistics textbook that has ever been written within a business context” by the journal Technometrics.   He is coauthor of Leading Six Sigma, a Step-by-Step Guide Based on Experience With GE and Other Six Sigma Companies, and Six Sigma Beyond the Factory Floor; Deployment Strategies for Financial Services, Healthcare, and the Rest of the Real Economy, both published by Financial Time/Prentice Hall, and served as an editor of the fourth edition of Statistics, A Guide to the Unknown, published by Duxbury Press.  His book Use What You Have; Resolving the HIV/AIDS Pandemic, based on his Coolidge Sabbatical research, was published with Professor Presha Neidermeyer of West Virginia University in 2009.

About the Award: The SAY Award recognizes those whose careers are distinguished by their leadership in championing respect for data and the effective use of statistical reasoning and data analysis in business, public policy, healthcare, education and other sectors. The award was inspired by the life and work of the late Harry V. Roberts a professor of statistics and quality management at the University of Chicago and the exemplar of statistical advocacy.

