John Foreman, Data Scientist
  • Home
  • Data Smart book
  • Speaking & Events
  • Featured Talks
  • Blog
  • MailChimp
Contact

The Perilous World of Machine Learning for Fun and Profit: Pipeline Jungles and Hidden Feedback Loops

1/5/2015

1 Comment

 
I haven't written a blog post in ages. And while I don't want to give anything away, the main reason I haven't been writing is that I've been too busy doing my day job at MailChimp. The data science team has been working closely with others at the company to do some fun things in the coming year.

That said, I got inspired to write a quick post by this excellent short paper out of Google,  "Machine Learning: The High Interest Credit Card of Technical Debt."

Anyone who plans on building production mathematical modeling systems for a living needs to keep a copy of that paper close.

And while I don't want to recap the whole paper here, I want to highlight some pieces of it that hit close to home.


Read More
1 Comment

Why is Big Data Creepy? And What Can We Do?

7/24/2014

1 Comment

 
I'm going to start off this post with a film clip. It's 4 minutes long, but I hope you'll watch it. The scene is from my favorite film ever, Sneakers, and in this clip, Robert Redford and his crack team of penetration testers root through a guy's trash.

Whose trash are they rooting through? A character named Werner Brandes.
Why are they rooting through his trash? They want to learn about Werner, his personality, his routine, his weaknesses. Because Robert Redford wants to find a way to get close to Werner and exploit him to break into his workplace.

Let's watch.

Read More
1 Comment

Data science is crack, not milk. Act like it.

7/5/2013

4 Comments

 
Data scientists, we are our own worst enemies.

Let me whip out a little Torah here for a moment to explain. When the Israelites fled Egypt, they were pretty stoked about going to the promised land. But then after wandering for a while in the desert, they started to doubt that this whole desert-wandering was worth the trouble. And they began to miss their lives back in Egypt:
"remember the fish we ate in Egypt at no cost—also the cucumbers, melons, leeks, onions and garlic." -Numbers 11:5

Who WOULDN'T miss melons with garlic...wait, I must be reading that wrong.

Read More
4 Comments

Don't Forget the "What" and "Why" in Big Data

5/13/2013

0 Comments

 
I'm a film nut. The irritating kind who tries to convince you that Tinker, Tailor, Soldier, Spy wasn't the most boring movie of all time.

And back in college, I got this idea in my head that I wanted to be a director. That's where all the predictable guys end up once they realize they'll never be an athlete or a guitarist.

Excited by the prospect of producing the next Sneakers (best. film. ever.), I set about gathering equipment. I built my own steady cam out of parts from Lowes. I secured a wheel chair to use for tracking shots. I knew how I'd do the long, fluid takes that eventually would become synonymous with "Foreman."

The problem was that none of this really mattered. Because I didn't know what I wanted to shoot or why I wanted to shoot it. Creating even the bones of a story, much less a compelling narrative, didn't really interest me. No, all I cared about was the How. How I would shoot the film. I obsessed over the tools. 

And ultimately, I never shot one scene.

This is exactly where we find ourselves in the world of big data today. The proliferation of vendors who drive the conversation at conferences and on tech blogs are concerned primarily with the How. That's their business: building technology and providing services to make your big data fantasy a reality. It's your job, not theirs, to articulate whatever that fantasy is.

Whenever I  meet other data science practitioners, I listen carefully to how they introduce their work.

If they say something like, "We're using some cool technology to do X," and then they proceed to tell me about this X they're doing, then I know this person's project stands a fighting chance. They know what they're building.

But when I hear, "We're doing some cool stuff using X technology," and then they proceed to tell me about their stack, I get a little nervous. Can they even define "cool stuff" or are they just tinkering?

Now, I get that you need to choose wisely the technologies you use to solve a problem. But the exciting part should be the business that's being done. So many folks are being pressured into doing a project, ANY PROJECT WILL DO, that uses Hadoop. Because their boss's boss wants a report on how the company is "doing big data." This is a regrettable situation. Not every business needs to do Big Data, which is why I really appreciated Evan Miller's grounded post on predictive analytics last week.

If I can't clearly articulate to my peers what my analytics project is and why I'm doing it, then forget everything else. Hell, that's why I'm writing an analytics book completely in spreadsheets -- because I'm tired of the tool discussion. When you use the most vanilla tools, the business problems come back into view.

Since my failed movie venture, I've swung in completely the other direction. I obsess over what business problem I should be solving with analytics and why they need solving. What does it get my company (MailChimp), and how does it help our customers?   

Can you articulate the business problem that you're throwing software, talent, and hardware at? Or are you just buying tools that are looking for a use?

0 Comments

    Author

    Hey, I'm John, the data scientist at MailChimp.com.

    This blog is where I put thoughts about doing data science as a profession and the state of the "analytics industry" in general.

    Want to get even dirtier with data? Check out my blog "Analytics Made Skeezy", where math meets meth as fictional drug dealers get schooled in data science.

    Reach out to me on Twitter at @John4man

    Picture
    Click here to buy the most amazing spreadsheet book you've ever read (probably because you've never read one).

    Archives

    January 2015
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    February 2013

    Categories

    All
    Advertising
    Big Data
    Data Science
    Machine Learning
    Shamelessly Plugging My Book
    Talent
    Talks

    RSS Feed


✕