John Foreman, Data Scientist
  • Home
  • Data Smart book
  • Speaking & Events
  • Featured Talks
  • Blog
  • MailChimp
Contact

Data Science and the Techonological Realization of Postmodern Thinking

10/17/2013

5 Comments

 
I'm taking a break from writing things that try to sell my book. Except for that intro sentence about my awesome data science book. And that one too.

No, I'm starting to bore myself. So I thought instead that today I'd write about postmodernism, data science, and how the two intersect. I really love the concepts that come out of postmodernism. They changed how I view everything from how I read film to how I practice my religion.

But what does postmodernism mean for my work as a data scientist? Let's step back a moment.

Many decades ago postmodern theory changed the way that we understood what it meant to create or engage with art, literature, and culture. As a reaction to modernism, postmodernism rejected the notion that there was one pure, perfect way to create or understand something. People had grown tired of the dictatorial pursuit of soulless perfection characterized by many modern thinkers. The great Jacques Tati relentlessly mocked such ideas in his films like Mon Oncle:
Postmodern thinkers realized that when you or I write a book, build a building, or roast a chicken, we bring to that activity our entire context, biases, beliefs, and blind spots. My perfect building might look like a cube and yours like a popcorn popper, because our backgrounds have given us different views of perfection.
Picture
Picture
Out of postmodernism, came some concepts that have been true for a long time, but only recently through the whole data science / big data movement, have we seen those concepts take hold in the way people approach business problems in marketing and operations. Seeing these connections has allowed me to integrate the way I live my professional life with the way I view the world in general. So let's take a tour of some of these concepts.

Writable Texts

Michael Lewis recently saw folks going into finance after reading his book on the financial crisis. His was a cautionary tale, but others saw it as a glimpse into the easy money that Wall Street offered the incompetent. Even though Lewis thought the book accurately portrayed Wall Street's awfulness, others saw the easy money and wanted in on the opportunity. Lewis said, "You never know what book you wrote until you know what book people read."

Postmodernism presents us with the idea of the writable text -- a text doesn't have one meaning, but rather, the reader is inherently an interpreter and can find meaning in the text other than what the author intended. This heightened view of context and individuality has long been ignored in business when dealing with customers in favor of monolithic solutions. It was ignored out of necessity, because a business could never understand individual context. But today, even a small business like MailChimp (where I work) can through data science.

Let me give an example from my own life.

In the past, Bayesian models presented a modern definition of spam -- there is some platonic notion of spam and through analysis of words and phrases in an email, we can make a determination once and for all whether something is spam.

But that's not true these days. A lot of spam (as the law defines it) is not about Viagra or Nigerian Princes -- it's just regular ol' marketing material for small businesses that lacks permission. In other words, the text can be interpreted many ways from a postmodern perspective, but what makes spam super-spammy is the interpretation of that mail by the recipients. Not the content itself -- the recipients' reading of that content.

So our models must now predict and interpret those readings. At MailChimp we do this via data -- we track all kinds of things about email addresses and then predict whether a list of emails wants content from a user or not. If the list were different, even for the same content, the prediction would change. That's the postmodern, reader-centric perspective of data science in action.

Intertextuality

Postmodern literary theory introduced the concept of intertextuality -- that the meaning of a text is shaped by other texts through allusions, quotations, parody, homage, etc. This is to say that no one text stands in isolation with meaning unto itself, but rather everything lends meaning to everything else it touches in an intertextual web.

This is the bread and butter of data science. For businesses, their users, readers, customers, etc. on the internet are all texts to be read and understood. All of my interests, likes, posts, clicks are references that lend context and meaning to me. And these interactions connect me to millions of other humans on the internet. We then all become texts, each lending meaning to each other directly and indirectly as we create and interact with content online.

If I respond to your tweet, then naturally I'm referencing you, in which case you are in a way an intertextual reference when understanding the body of online content that makes up my digital presence.

And if we both buy the same juicer on Amazon, then by both referencing that juicer, we are lending context to each other. If you then buy an electric chain saw, your next purchase says something vague about me given that at one time our juicing interests aligned. Perhaps I too would like to buy an electric chain saw.

This type of intertextuality then gives rise to the data science practice of collaborative filtering. Making suggestions to a user based on the behaviors of those whom the user references. 

Collaborative filtering moves us away from monolithic approaches to marketing where we believe all users want the same thing (perhaps causing us to design to the middle) and closer to a fluid, neotribalistic marketing approach where individuals are targeted with specific content based on how we read their intertextual presence.

Reactions -- Playfulness and Paranoia

Folks are going to handle this new way of viewing the world differently, and just like in postmodern literature, those responses seem to vacillate between playfulness and paranoia. In postmodern art, intertextual references are used for fun. A great example of this is Gilmore Girls, which exemplified the use of referential humor. Similarly in the world of data science, examining the interconnectedness of people and making decisions based on model outputs, is a blast. And you see individuals and companies toying with these ideas. Friend suggestion, social navigation, and sentiment analysis are all examples of play. People are toying with this intertextual data and are creating new and fun products out of it. Like Cleverbot, where the chat bot recycles past conversations with humans in its repertoire and throws them back at you as if the words were its own.

On the flip side, postmodern thought gives rise to the concept of paranoia. We cannot understand things holistically. This world is complex, chaotic, and any situation, person, text, etc. can be read in multiple ways. So then how do we respond when we discover secret ordering principles behind the world we once thought was un-understandable? The NSA is reading our mail and doing the dreaded 3 hop query. LinkedIn knows I just had a bowel movement. Amazon knows I want that blender. We respond (and rightfully so) just like Heller did in Catch-22 -- with paranoia.

I'd argue that a mixture of these emotions, excitement on one hand and fear on the other, is healthy. Those who end up too paranoid may fail to reap the benefits of the world these technologies are ushering us into. I for one love the collaborative filtering products offered by Netflix, Amazon, and Twitter. Their recommendations save me time and brain power that'd I'd rather spend blazing through episodes of Parks and Recreation. 

On the other hand, those who get too excited ended up hurting people. They violate privacy and propriety in unexpected ways. Excitement can blind your critical eye regarding how using these new tools can marginalize people.

Shilling my book in the conclusion

The reason why I wrote my book was because I wanted to communicate this perspective on data to more people than a small class of data science practitioners. The explosion in transactional data storage and intellectual capital around acting on that data has shifted how businesses can engage with their customer base. It's possible for even a small company to now move from a modern way of thinking ("Everyone is the same. There is one strategy for dealing with customers") to a postmodern way of thinking ("Everyone is different. Context matters. I can use data to better communicate individually with folks"). And I wanted to help enable that shift. That's why I wrote chapters on using supervised AI to target certain customers and another on using unsupervised AI to detect communities of people in social graphs.

So buy my book. I packed it with intertextual humor -- hell, I've even got a 90s dance music playlist in there. And a chapter on Anduril, Flame of the West. It doesn't get more postmodern than that, does it?
5 Comments
Zack link
10/17/2013 23:26:53

Interesting. With R, Python, and Hadoop being the main tools of a data scientist nowadays, why would someone learn the data analysis techniques involved in spreadsheets, instead of R, for example?

Reply
John Foreman
10/18/2013 07:15:02

Great question. If you check out the books that teach data science in R and Python, you'd find that these books really don't teach the algorithms -- they just load up data science packages and *call* the algorithms. For example, I love Torgo's data science book, but all he does when he teaches random forests is he calls the randomForest function from the randomForest package.

So what if you want to learn what's really going on in those functions? (Not all people care, but some, like me, have an unease calling AI functions they don't truly understand).

Well, that's where my book comes in. It teaches these approaches step by step in a spreadsheet where you can see the state of the data changing along the way. You walk through the algorithms in excruciating detail. At the same time, the book is conversational and very down to earth, unlike say, Hastie's book.

The last chapter of the book reproduces all the book's examples in R code, so it doesn't leave the reader stranded in Excel. But by the time you get to R in my book, you know exactly what's going on. You're not taking anything for granted. Hope that answers your question.

Reply
Reuben Thomas
10/19/2013 09:13:42

I don't understand the characterisation of Bayesian spam filtering as giving some platonic notion of spam: the whole point is surely that the user always has to provide a corpus to train it on, of both spam and ham. The entire Bayesian approach, the right-hand-side of that vertical bar in *conditional* probability, is surely an exemplar of what you're talking about here, not a counter-example!

Reply
John Foreman
10/21/2013 02:21:03

That makes sense. My point is more that I see content where the words can be very benign "Houses for sale in Atlanta: url1, url2, etc.," and no NB filter is going to work at an acceptable level of FPR, but some of that content is indeed spam. And the reason it's spam is because of the readership and how they attribute meaning to the content. If the list is full of double-opted-in real estate agents, then the content is read as ham, but if the list is full of purchased public university faculty addresses in the Atlanta area, then that content is read as spam.

So any approach that considers the context of the readership is going to perform better than a model that just gives a single conditional probability based on tokens as if the context didn't matter.

Reply
Jack link
2/2/2014 10:47:51

Good blog post about science i also shear about "Fluid"!!In physical science, a liquid is a substance that persistently twists (streams) under a connected shear stress. Liquids are a subset of the stages of matter and incorporate fluids, gases, plasmas and, to some degree, plastic solids.

Reply

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    Hey, I'm John, the data scientist at MailChimp.com.

    This blog is where I put thoughts about doing data science as a profession and the state of the "analytics industry" in general.

    Want to get even dirtier with data? Check out my blog "Analytics Made Skeezy", where math meets meth as fictional drug dealers get schooled in data science.

    Reach out to me on Twitter at @John4man

    Picture
    Click here to buy the most amazing spreadsheet book you've ever read (probably because you've never read one).

    Archives

    January 2015
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    February 2013

    Categories

    All
    Advertising
    Big Data
    Data Science
    Machine Learning
    Shamelessly Plugging My Book
    Talent
    Talks

    RSS Feed


✕