John Foreman, Data Scientist
  • Home
  • Data Smart book
  • Speaking & Events
  • Featured Talks
  • Blog
  • MailChimp
Contact

Facebook's solution to big data's "content problem:" dumber users

6/29/2014

4 Comments

 
In the early days of cinema, Soviet filmmakers were fascinated with film editing, i.e. placing shots in an arranged order. One of these filmmakers (and possibly the first film theorist), Lev Kuleshov, proposed that the emotional power of cinema lay not in the images themselves but in the way they were edited together.

For Kuleshov, the sequential juxtaposition of content lends meaning to images that may have nothing to do with each other. 

And he conducted an experiment, the so-called Kuleshov effect, to highlight this principle. Kuleshov took a clip of Ivan Ilyich Mozzhukhin (the Ryan Gosling of Tsarist Russia) staring at the camera and intercut it with some other images: a bowl of soup, a girl in a coffin, an attractive woman. When he showed this sequence to an audience, the viewers noted how the emotional state of the actor changed from cut to cut. He's hungry. He's sad. He's got the hots for that lady.

The audience praised Mozzhukhin's emotive performance. But the actor's stare that Kuleshov used was the same in each cut. Here's an example of the effect:

The Kuleshov effect tells us that a viewer's understanding of what they're seeing can be affected not just by the content on the screen but how it's edited together.

Editing manipulates and creates meaning by connecting (potentially unrelated) content together. It provides emphasis and perspective.

EdgeRank is a frenemy

Think of the content going through your Facebook stream as shots in a film. These shots are just pure life, right? Just an unedited stream of your friends' cats, kids, and poorly lit photos of quinoa entrees.

We know this isn't true. After all, Facebook uses EdgeRank to select what is shown to each user out of all the content available. The algorithm's whole purpose is to maximize an outcome by editing content together. And until recently, we've assumed that the objective of EdgeRank was more or less to maximize the engagement and relevance of posts in each user's stream. Ain't nothing wrong with increasing relevance! Sure, EdgeRank is an editor, but it's on your side. 


You're Scorsese, and EdgeRank is a Thelma Schoonmaker. Or maybe not.

Facebook's PNAS disfunction

In the current issue of the Proceedings of the National Academy of Sciences (PNAS), data scientists from Facebook show how they can use editing within feeds to affect the emotions of Facebook's users. Facebook data scientists demonstrated that when they prioritized happy content on a user's feed, that user was more likely to post happy content back to the network. The same went for disgruntled content.

These posts aren't necessarily related to each other. They're just content generated by a user's friends, liked pages, etc. But just like Kuleshov did for film, Facebook can do for their network -- they can stitch images and text together into a stream that meets their needs, one that conveys a concept perhaps not present in the sum total of all the social content lying on the cutting room floor.

Now, people have a problem with this. The whole experiment feels like a violation. Facebook emotionally manipulated people. And to add insult to injury, Facebook used user-generated, supposedly perspective-free content (at least free of Facebook's perspective).

The counterarguments I keep hearing are these:
  1. Facebook's TOS covers such experiments
  2. Facebook was already using and continues to use algorithms to edit your stream. This is nothing new
  3. Facebook didn't create content to manipulate people. They used existing content
Points (1) and (2) demonstrate extreme naiveté on the part of Facebook. 

What is allowable in data science is only partially governed by TOSs and precedence. There's an inherent creepiness to data science, so it's important that a company always ask itself, "What do our users expect from us?" Not "what is legal?" or "can we point to what we've already been doing that's similar?"

Actually, Facebook may not be naive. They may just not care. After all, their customers whom they're trying to impress and engage with using data science are their advertisers, not their users.

Counterargument (3) is where the Kuleshov Effect comes into play. Editing is powerful. If you're stewing up a pot of social slop, then you have power over the final product. A stream is nothing more than a montage of social content that constitute its ingredients. And in the creation of that stream, Facebook wields immense power even though they create none of the stream's content.

Regardless of where you fall in the debate over whether this was an appropriate experiment, its results lead to a more haunting realization. Before we get to it, let's talk about the "content problem" present in data-driven targeted marketing.

Big data has a content problem

A lot of digital marketing tools are coming out these days that promise hyper-specific tracking and data collection of leads, customers, users, etc. for the purposes of surgically targeted marketing.

There's only one problem: the only reason to target someone at a personal level is if you've got personalized marketing content to show that person. 

Understanding a person intimately and being able to target them is nothing without something to say.

And most companies don't have anything to say. Getting a marketer to finish one monolithic piece of creative is hard enough. Imagine needing personalized content for everybody! 

So shortcuts are taken ("just write 'customers like you also bought this' and then use data science to pull some product suggestions) to produce "relevantized" generic content. 

No matter how sophisticated data driven targeting products get, there will always be a content gap.

But Facebook may have found a shortcut. And this is where things get depressing.

Data science: a sheep dog, corralling people toward content

If I have a bunch of unique people, and I need to target them, I need a bunch of unique content to make that effective. Is there another way?

Rather than tailor marketing content to a user's unique emotional make-up, Facebook has shown that they can use tangentially related (and free!) user-generated content to push a user toward marketing content generated for a more general emotional state: insecure, hungry, lonely, etc. They can edit together photos and posts in a stream to skew a user's view of reality and shift them into one of these compromised emotional states.

In other words, if they can't use data to generate enough personalized content to target people, maybe they can use data to generate vanilla people within a smaller set of emotional states. Once you have a set of vanilla people, then your American Apparel ads will work on them without customization.

As Greg McNeal put it:

"What harm might flow from manipulating user timelines to create emotions?  Well, consider the controversial study published last year (not by Facebook researchers) that said companies should tailor their marketing to women based on how they felt about their appearance.  That marketing study began by examining the days and times when women felt the worst about themselves, finding that women felt most vulnerable on Mondays and felt the best about themselves on Thursdays.

The marketing study suggested companies should “[c]oncentrate  media during prime vulnerability moments, aligning with content involving tips and tricks, instant beauty rescues, dressing for the success, getting organized for the week and empowering stories… Concentrate media during her most beautiful moments, aligning with content involving weekend guides, weekend style, beauty tips for social activities and positive stories.”  The Facebook study, combined with last year’s marketing study suggests that marketers may not need to wait until Mondays or Thursdays to have an emotional impact, instead  social media companies may be able to manipulate timelines and news feeds to create emotionally fueled marketing opportunities."

This is part of the dehumanizing effect of AI and big data I wrote about a while ago.  Rather than data being used to make computers more intelligent, data is being used to make humans more predictable (read: more stupid, unhappy, and willing to buy something to palliate their discontent).

Yann LeCun, who runs Facebook's AI lab, said I'm utterly wrong on this point. In his response to my last post, he contends:

"The promise of ML (and AI) is, on the contrary, to let humans be more human, to free their minds from having to reason like machines, and to let them concentrate on things that are uniquely human, like communicating with other people."

In this particular study in PNAS, we can see that the promise of data modeling at Facebook is not to "let humans be more human." It's not to "free their minds."

All of that machine reasoning isn't trying to make us more human so much as it is trying to make us more sad and predictable. And just wait until deep learning applied to image recognition can recognize and stream my selfie at Krispie Kreme next to a tagged photo of me and my love handles at the beach. Data-driven inferiority complexes for all!

The promise of data modeling at Facebook is to place us in chains made from the juxtaposition of our own content. We'll be driven into pens made of a few profitable emotional states where marketing content waits for us like a cattle gun to the skull.

That said, where else am I going to share photos of my kids with old friends? Can't do that on Twitter...I only use Twitter to express faux indignation and convenient morality concerning trending causes. Looks like I'm stuck with Zuck.
4 Comments
Scott Edwards link
6/29/2014 15:20:51

Great piece - I wasn't worried too much about this until I considered these points. I wonder if it will ever make sense for FB to transition to a paid option, once they hit as close to 100% penetration as they can get. Then we could celebrate their research, knowing it is all about giving us more value/engagement rather than the advertisers. I happily pay for InstaPaper and Pandora - a few bucks a month for no ads and aligned interests would be well worth it to me.

(Just got your book in the mail - looks good!)

Reply
Joerg
6/30/2014 04:18:25

The effect, which took 700k people to even be statistically significant, was 2 percent of one standard deviation, and furthermore there could be any number of causal ways for this too happen. How does this lead to these huge swiping conclusions about brainwashing people? I'd interpret it as a tiny effect that would be extremely difficult to use.

Reply
Alfred
7/4/2014 00:28:10

Tiny effects that are extremely difficult to use are the meat and potatoes of digital business models. Ask a marketer what a tiny effect on a population the size of Facebook's might be worth.

Reply
Frank
7/6/2014 20:30:03

I enjoyed your article, thank you. I can't believe people are still OK with this! I don't FB, and this is one of the reasons I don't!

Reply

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    Hey, I'm John, the data scientist at MailChimp.com.

    This blog is where I put thoughts about doing data science as a profession and the state of the "analytics industry" in general.

    Want to get even dirtier with data? Check out my blog "Analytics Made Skeezy", where math meets meth as fictional drug dealers get schooled in data science.

    Reach out to me on Twitter at @John4man

    Picture
    Click here to buy the most amazing spreadsheet book you've ever read (probably because you've never read one).

    Archives

    January 2015
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    February 2013

    Categories

    All
    Advertising
    Big Data
    Data Science
    Machine Learning
    Shamelessly Plugging My Book
    Talent
    Talks

    RSS Feed


✕