Imagine the difference between a buffalo stampede and a cheeseburger. Both are tasty sources of protein. The difference lies in their requisite culinary tools. Predictive Analytics (PA) is the buffalo stampede of quantitative research: data is big, fast, and shaggy. Interactive Analytics (IA) is a cheeseburger: structured, convenient, and easy to grill. Both go well with fries and a coke.
Here’s an introductory comparison of these two forms of analytics:
Predictive Analytics typically mines data from multiple sources then applies statistical and modeling techniques, perhaps machine learning, to calculate a score that may prove useful in solving a practical problem. Check your credit score or view your personalized Amazon book recommendations—that’s PA at work.
In 1990, I utilized logistic regression to calculate a vote propensity score for every voter in the California voter file. Using data preserved in the voter file and data from other appended sources each voter’s probability of voting was calculated. Updated versions of this propensity index remain the superior predictor of voting.
Interactive Analytics replaces data analysis formerly performed using crosstabulation tables. IA provides for broad functionality not possible with crosstab tables alone. Moreover, IA replaces static presentation tools, such as PowerPoint slides, with interactive presentation. This lets market researchers, pollsters, consultants, and candidates ask questions of the data in real time during the presentation of results.
Explanation of variability
The main similarity between PA and IA is the quest to explain variability. Behavioral research does just one thing: attempts to explain variability in human behavior. Why do some people turn out to vote while others do not? Why do some voters respond favorably to a particular message while others upon hearing the exact same message decide to vote for your opponent? If we can explain variability in behavior perhaps we can influence that behavior. Both PA and IA search for and test relationships between predictor variables and the behavior to be explained. Explanation is the precursor to application.
Different data requirements
The biggest difference between PA and IA rests with their respective data requirements. Although data may originate from multiple sources, IA data must be compiled into a single, structured, rectangular data file. Rows, for example, might represent voters and columns different variables. An Excel or .csv file would be typical. By contrast, PA data is typically unstructured, coming from any number of relational data bases. PA often accommodates big data as well as rapidly streaming data; hence the analogy to a buffalo stampede. This gives PA greater flexibility to address a wider array of research problems.
Cost and learning curves
Flexibility and range come with greater cost. Hillary Clinton will employ more people and spend more money on PA than most down-ticket races will spend on their entire campaign. And given the more esoteric tools employed in PA technicians will have spent years learning their craft. The learning curve for PA is long and steep.
By contrast IA software is inexpensive and can be mastered in hours, not years. Smart features of the software automate many user tasks. Typically, IA functions are executed with just two or three mouse clicks. Selects, sorts, tables, graphic displays and text are automatically displayed. For example, IA looks at the type of data employed and automatically selects the appropriate measures of statistical significance and strength of relationship.
Garbage in, garbage out
Neither PA nor IA grants immunity to the principle of “garbage in—garbage out” (GIGO). In both cases the quality of the output depends on the quality of the predictor variables. Regardless of technical sophistication both types require careful selection and measurement of variables. Both can discover hidden patterns within data; but only when data collection is well designed and executed.
Payoffs
Both PA and IA promise big payoffs. Perhaps the most commonly used PA technique is uplift modeling. Uplift modeling has been credited with the PA success of Obama’s 2012 campaign. The Obama campaign used uplift modeling to isolate four categories of swing-state voters: “sure things,” “lost causes,” “do-not-disturbs,” and “persuadables.” Campaign efficiency was improved by profiling, then focusing on persuadables.
IA also performs uplift modeling. IA, however, takes a less esoteric approach. IA examines voter migration across ballot tests within a poll. Users of Porpoise Analytics just click on the “Select Plus” tab to isolate all four groups (Porpoise calls them “stays positive,” “stays negative,” “moves negative,” and “moves positive”). Then users select the “Profile” tab and Porpoise creates demographic profiles of each of the groups.
Val Smith, Ph.D. has been a political pollster for over 35 years, and is a Principal at Data Analysis and Display, LLC, authors of Porpoise Survey Analytics and Orca Data Editor. Reach Val at (916) 932-2374 or valsmith@PorpoiseAnalytics.com.