John Foreman, Data Scientist
  • Home
  • Data Smart book
  • Speaking & Events
  • Featured Talks
  • Blog
  • MailChimp
Contact

Don't Forget the "What" and "Why" in Big Data

5/13/2013

0 Comments

 
I'm a film nut. The irritating kind who tries to convince you that Tinker, Tailor, Soldier, Spy wasn't the most boring movie of all time.

And back in college, I got this idea in my head that I wanted to be a director. That's where all the predictable guys end up once they realize they'll never be an athlete or a guitarist.

Excited by the prospect of producing the next Sneakers (best. film. ever.), I set about gathering equipment. I built my own steady cam out of parts from Lowes. I secured a wheel chair to use for tracking shots. I knew how I'd do the long, fluid takes that eventually would become synonymous with "Foreman."

The problem was that none of this really mattered. Because I didn't know what I wanted to shoot or why I wanted to shoot it. Creating even the bones of a story, much less a compelling narrative, didn't really interest me. No, all I cared about was the How. How I would shoot the film. I obsessed over the tools. 

And ultimately, I never shot one scene.

This is exactly where we find ourselves in the world of big data today. The proliferation of vendors who drive the conversation at conferences and on tech blogs are concerned primarily with the How. That's their business: building technology and providing services to make your big data fantasy a reality. It's your job, not theirs, to articulate whatever that fantasy is.

Whenever I  meet other data science practitioners, I listen carefully to how they introduce their work.

If they say something like, "We're using some cool technology to do X," and then they proceed to tell me about this X they're doing, then I know this person's project stands a fighting chance. They know what they're building.

But when I hear, "We're doing some cool stuff using X technology," and then they proceed to tell me about their stack, I get a little nervous. Can they even define "cool stuff" or are they just tinkering?

Now, I get that you need to choose wisely the technologies you use to solve a problem. But the exciting part should be the business that's being done. So many folks are being pressured into doing a project, ANY PROJECT WILL DO, that uses Hadoop. Because their boss's boss wants a report on how the company is "doing big data." This is a regrettable situation. Not every business needs to do Big Data, which is why I really appreciated Evan Miller's grounded post on predictive analytics last week.

If I can't clearly articulate to my peers what my analytics project is and why I'm doing it, then forget everything else. Hell, that's why I'm writing an analytics book completely in spreadsheets -- because I'm tired of the tool discussion. When you use the most vanilla tools, the business problems come back into view.

Since my failed movie venture, I've swung in completely the other direction. I obsess over what business problem I should be solving with analytics and why they need solving. What does it get my company (MailChimp), and how does it help our customers?   

Can you articulate the business problem that you're throwing software, talent, and hardware at? Or are you just buying tools that are looking for a use?

0 Comments

    Author

    Hey, I'm John, the data scientist at MailChimp.com.

    This blog is where I put thoughts about doing data science as a profession and the state of the "analytics industry" in general.

    Want to get even dirtier with data? Check out my blog "Analytics Made Skeezy", where math meets meth as fictional drug dealers get schooled in data science.

    Reach out to me on Twitter at @John4man

    Picture
    Click here to buy the most amazing spreadsheet book you've ever read (probably because you've never read one).

    Archives

    January 2015
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    February 2013

    Categories

    All
    Advertising
    Big Data
    Data Science
    Machine Learning
    Shamelessly Plugging My Book
    Talent
    Talks

    RSS Feed


✕