Data scientists, we are our own worst enemies.

Let me whip out a little Torah here for a moment to explain. When the Israelites fled Egypt, they were pretty stoked about going to the promised land. But then after wandering for a while in the desert, they started to doubt that this whole desert-wandering was worth the trouble. And they began to miss their lives back in Egypt:
"remember the fish we ate in Egypt at no cost—also the cucumbers, melons, leeks, onions and garlic." -Numbers 11:5

Who WOULDN'T miss melons with garlic...wait, I must be reading that wrong.

Well, this fed-up-with-the-desert situation is something data scientists are helping to create rather than alleviate. We've taken on a gatekeeper role. We rightly say that data science is hard and doing it wrong is downright dangerous. This is true. But to be honest, there's also a bit of protectionism to this truth. Data scientists enjoy their rarity. It makes them sexier.

But by saying things like, "Even K means is touchy!!! Don't try this at home!!!" we're simultaneously right and wrong.

The truth is that for businesses data science is not like milk -- as in, why buy the scientist when I can get the science milk for free? (That sounds gross)

No, data science is more like crack. Businesses want more not when it's held entirely out of their reach but when they've been allowed to successfully smoke a little. Then they're hooked. But for some SMBs, the upfront cost of hiring a data scientist before they see value in analytics is too high a cost to commit to (for others, "when you have the opportunity to buy LeBron James, you buy.").

I just got back from the Big Data Innovation Summit in Toronto, and while the typical topics came up in the actual talks (here's how we used Hadoop!), a different set of topics were prevalent in the lobby conversations. Chief concerns among attendees were:

"This all sounds great, but how does my business get started? How would we even use this stuff?"
"I'm feeling pressured to 'do big data' but a lot of the terms and techniques baffle me. And I feel embarassed to ask questions to get 'caught up."'
"And the reading materials are of no help...they all start with me %$&#ing configuring Hadoop on a VM."
"People are moving on from new technologies to even newer technologies, and I'm feeling left behind. Hell, I don't even understand exactly what data science is."

Recently I spoke with someone from an unnamed Fortune 500 who, I shit you not, was part of a team creating an internal knowledge base where "executives who don't understand anything about big data and data science can read plain explanations of things safely without embarrassment." Lordy! How did we get there? Embarrassment minimization as a business objective.

Businessfolk are attending conferences in an attempt to figure out if their business has a big data play -- this is a bit of a precursor and a bit of a substitute to hiring an actual analytics person. They need to identify and fundamentally understand their opportunities before they commit. Especially outside of the tech start-up world where data scientists are being hired just because any self-respecting tech company should have one.

Here's what I propose. The analytics industry should concern itself with these lost souls. Sure, an MBA isn't going to get all of data science, but some basic training around the core practices and techniques would be helpful. Yes, I mean you, K-means! 

It's not enough to just tell people "about" this stuff either. "Collaborative filtering is the practice of blah blah blah....go hire someone to do it." I think even managers need an opportunity to prototype a little bit with this stuff (much like how most MBA programs teach restocking point calculations, forecasting, and basic operations research). 

The MBA is not likely to do the data science (like operations research) themselves. But when the opportunities to do data science come to them, they'll feel comfortable starting conversations with the pros and specing out projects. They have just enough knowledge to be dangerous. That's a good thing.

That's why I started my other blog. And that's why I've got a book coming out that teaches data science in Excel. It's because we don't win by leaving people behind. We should separate data science education for these folks from functional programming and configuring software. That way they can get a foothold.

Data scientists' job security is bound up in industry perceptions of the value of data science. And while the "promise of a promised land" might hold people for a while, eventually they want the garlic and melon. Or crack...that too. (I get the mixed metaphor award today) Let's go ahead and find ways to get people hooked before they move on to the next fad.


Adam L
07/05/2013 11:37am

I think it's important to realize that part of the job is taking what you hear from the business side and realizing that in some form of output (whether that be analytics, a model, software, or data viz). You're never going to get a clear one-liner: I want you to classify users who like monkeys. This is nothing new - the same held true for data-mining.

I think the marketing hype has put a lot of focus on the "what" - e.g., "you need Hadoop with Impala". I spend a lot of time with execs talking over what we've built/what it means, etc. I think you have to spend that time with them.

Getting data science established as a culture in a company is a long-term goal - not a promised land - but it does improve agility. Its importance isn't going away.

07/06/2013 4:56am

Well Ramesh, we must be looking at different markets. It's hard finding qualified candidates that can grasp Relational Databases (go ahead, ask the next 5 people you interview to explain relational algebra) or OOP and only a fraction of potential candidates can even do a fraction of what they say. An even smaller fraction graduate from CS programs (freshman year you'll have a ton, but by senior year, they've all moved down to business, the social sciences or the absolute bottom, education). I make the point all the time that something as simple as a mean means nothing without understanding the distribution (aka you can drown in an average of 1cm of water). Many people, especially with the proliferation of the oh soo revered "Project Manager" even brag about being non-technical as though it's something to be proud of instead of embarrassed of. About a month ago a client had a very complex optimization scenario with a 'lot' of data and only a few vectors. Pretty savvy client generally, but the second I started talking about linear programming - "No, our programmers will never understand that - even if you do all the programming, they won't be able to maintain it.". Right now we have one proposal sitting on the table for a HUGE sum of money for a major client that wants a "Big Data" Solution and it was going well, until i started talking about linear regression. Heck, at least people will pretend they've heard of it, try discussing ANOVA and all you'll generally see is a bunch of really uncomfortable project managers wanting to change the subject, dying to get out of the meeting so they can convene and discus why this will never work. I don't even recall hearing a technical exception to anything since I moved into the space b/c I honestly don't' think any of the nay sayers could formulate a serious technical critique and don't think they want to even bother taking the time to look it up. "We just want it to work" (who doesn't), "It needs to be simple, so everyone can understand or it won't work", "Our ____ will never be able to understand what an R Square is . Literally had one group tell me - "If you can verify that the R Square thing just needs to be 1 or close to it and that's good - we're ok, otherwise, forget it". I'm old enough that it was hard getting into a MBA program and it took 2 years, if you think the average product of the condensed weekend 4 month program knows the business math or has any desire to learn it - we live in different universes and I need to move to yours - it sounds awesome over there ;-

I truly hope you're assessment is right and I've just experienced the wrong data points. My fear is that what we're seeing is a big shift - where data science is top tier stuff and the barrier to entry will keep it out of the reach of many. It'll get watered down just like MBAs did so you can have a No work, 4 month Data Science degree which will mean nothing but you know buzzwords and then you'll have all the old school standard app dev stuff. The smarter apps will get smarter and smarter and the lower end will go the way of COBOL development. Heck it wasn't that long ago that finding a .NET contractor who was good for 100.00/hr was on the low end of the cost curve, now you see plenty down at McDonald's cashier wages and there's only more downward pressure from what I see. We'll see , but when you look at what passes for "BI" , I think my bleak prediction of the really divergent world in this regard is where Vegas would be betting their money.

07/25/2013 6:26am

Is your book going to be available in kindle format? if so when? i would love to get it in kindle.

John Foreman
08/29/2013 5:20am

Absolutely the book will be out on Kindle. It comes out the same day as the paper copy.

10/06/2013 1:32am

I have always admired the ability to bite off more than one can chew and then chew it.


Leave a Reply