Being in the world of big data and data science, I see a lot of stuff like this:
Analytics at the speed of big data:
There is this idea endemic to the marketing of data science that big data analysis can happen quickly, supporting an innovative and rapidly changing company. But in my experience and in the experience of many of the analysts I know, this marketing idea bears little resemblance to reality.
Over the course of my career, I’ve built optimization models for a number of businesses, some large, like Royal Caribbean or Coke, some smaller, like MailChimp circa 2012. And the one thing I’ve learned about optimization models, for example, is that as soon as you’ve “finished” coding and deploying your model the business changes right under your nose, rendering your model fundamentally useless. And you have to change the optimization model to address the new process.
Once upon a time, I built a model for Dell that optimized the distribution of their chassis and monitors from China to their fulfillment centers in the U.S. Over and over again, my team worked on customizing our model to Dell’s supply chain. The moment the project was over…Dell closed down a factory and jacked the formulation. Now, we had done some things to make the model robust in such scenarios (made factories a flexible set in the ILOG OPL code for example). But nonetheless, the model was messed up, and someone needed to fix it.
And this example was for a relatively large and stable company. Dell sure moves slower than, say, a tech startup. But with each passing year, the young, turbulent company seems more the norm than the old rigid enterprise. The speed at which businesses are changing is accelerating.
And most data science models that are of any degree of sophistication, require stability.
A good demand forecast might need several seasonal cycles of historical data.
A good optimization model requires an ironed out process (manufacturing, logistics, customer support, etc.).
A good predictive model requires a stable set of inputs with a predictable range of values that won’t drift away from the training set. And the response variable needs to remain of organizational interest.
Process stability and “speed of BLAH” are not awesome bedfellows. Supervised AI models hate pivoting. When a business is changing a lot, that means processes get monkeyed with. Maybe customer support starts working in different shifts, maybe a new product gets released or prices are changed and that shifts demand from historical levels, or maybe your customer base changes to a younger demographic than your ML models have training data for targeting.
Whatever the change may be, younger, smaller companies mean more turbulence and less opportunity for monolithic analytics projects.
And that is not primarily a tool problem.
A lot of vendors want to cast the problem as a technological one. That if only you had the right tools then your analytics could stay ahead of the changing business in time for your data to inform the change rather than lag behind it.
This is bullshit. As Kevin Hillstrom put it recently:
In other words, it’s very hard for sophisticated analytics software and techniques running on “big data” to run out in front of your changing business and radically benefit it.
The most sophisticated analytics systems we have examples of run on stable problems. For example, ad targeting at Facebook and Google. This business model isn’t changing much, and when it does, it’s financially worth it to modify the model.
Airline scheduling. Oil exploration. High frequency trading.
For a model operating on these problems, the rules of the game are fairly established and the potential revenue gains/losses are substantial.
But what about forecasting demand for your new bedazzled chip clip on Etsy? What about predicting who’s a fraudster lurking within your online marketplace? Is your business stable enough and the revenue potential high enough to keep someone constantly working on “analytics at the speed of big data” to use a model in this context?
Analytics at the speed of meat and potatoes
You know what can keep up with a rapidly changing business?
Solid summary analysis of data. Especially when conducted by an analyst who’s paying attention, can identify what’s happening in the business, and can communicate their analysis in that chaotic context.
Boring, I know. But if you’re a nomad living out of a yurt, you dig a hole, not a sewer system.
Simple analyses don’t require huge models that get blown away when the business changes. Just yesterday I pulled a bunch of medians out of a system here at MailChimp. What is the median time it takes for 50% of a user’s clicks to come in after they’ve sent an email campaign? I can pull that, I can communicate it. And I can even give some color commentary on why that value is important to our business. (It lends perspective to our default A/B test length for example.)
If you want to move at the speed of “now, light, big data, thought, stuff,” pick your big data analytics battles. If your business is currently too chaotic to support a complex model, don’t build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires.
But how do I feel good about my graduate degree if all I’m doing is pulling a median?
If your goal is to positively impact the business, not to build a clustering algorithm that leverages storm and the Twitter API, you’ll be OK.