Berkman Talk—Living with Data: Stories that Make Data More Personal

My Berkman lunch talk is coming up soon! Join in person if you can make it. The talk will also be webcast live and archived on the website shortly after. 

Living with Data: Stories that Make Data More Personal
with Berkman Fellow, Sara Watson

April 29, 2014 at 12:30pm ET
Berkman Center for Internet & Society, 23 Everett St, 2nd Floor
RSVP required for those attending in person via the form below
This event will be webcast live (on this page) at 12:30pm ET.

We are becoming data. Between our mobile phones, browser history, wearable sensors, and connected devices in our homes, there’s more data about us than ever before. So how are we learning to live with all this data

Inspired by her ethnographic interview work with members of the quantified self community, Sara hopes to make these larger systemic shifts more relatable and concrete with personal narratives. This talk will share some examples of how we find clues, investigate, and reverse engineer what’s going on with our data, and call for more stories to help personalize our evolving relationship to data and the algorithms that govern it.

The Daily Beast interviewed me about love-hate relationship with surveillance when our expectations about how its supposed to work aren’t met.

“I think especially in this case we have a lot of expectations about how surveillance and tracking of international air traffic is supposed to work. There are a lot of ways in which the things we have taken for granted failed,” says Sara M. Watson, a fellow at Harvard’s Berkman Center for Internet and Society. “There’s the disconnect from what we expect to happen and what is happening.” Read more.

I was struck by this revelation in the NYTimes coverage: “Using a system that looks for flashes around the world, the Pentagon reviewed preliminary surveillance data from the area where the plane disappeared and saw no evidence of an explosion, said an American government official who spoke on the condition of anonymity because the subject matter is classified.” Whoa, there’s a system for detecting flashes around the world? Of course the Pentagon has a system for detecting flashes around the world.

What didn’t make the cut is the idea that we have expectations about how plane and travel security and surveillance are supposed to work because they are out in the open. Interpol is supposed to stop people from getting on planes with stolen passports. Security theater exists to develop those expectations. But we didn’t have expectations about the extent of civilian surveillance in all the NSA revelations because it was all covert, and we never bought into it.

As for our obsession with disappearance stories in a connected age, I believe the fascination stems from the drama of getting lost or going missing. Missed connections and miscommunication are dramatic plot devices, and most of that drama is lost with constant connectivity. That’s what makes things going completely off the grid now very compelling (and incidentally why contemporary literature hasn’t done a great job of integrating communications technologies).

The Uncanny Valley of Targeted Marketing

Scott Howe of Acxiom spoke at Harvard last week as part of the Topics in Privacy series. I’ve been really interested to follow the steps Acxiom is taking to set an example in the advertising industry to engage more directly with consumers through Howe talked at length about the philosophy behind the site, its success in the first couple months, and addressed some of the early criticisms. I’m encouraged by what Acxiom is doing, but I walked away from the talk with more questions than answers.

Howe shared some statistics on the site after it first opened in September—they had 500K visitors in the first month, and only 2% of visitors have opted out as a result of logging in (I have to wonder, if that’s a[n intentional] design flaw, and that the option is buried or hard to find). He also shared that 11% of people made corrections to the data, most often addressing political party, income, education, marital status, and occupation. Howe also shared that the site has very low return rates so far, i.e. that once logged in, people aren’t coming back. He acknowledges that individuals won’t have reason to come back until the value of updating and maintaining a relationship with Acxiom is more clear to consumers.

Acxiom, like other data brokers,  is in the business of collecting, cleaning, analyzing, and segmenting consumer data from all kinds of sources. Aboutthedata only serves to show the demographic data, and some of the inferred insights about demographic data perhaps based on behaviorally tracked data. For example in my profile I am “inferred married” and it gives me the option to declare that I’m married. But Acxiom doesn’t expose the 70+ proprietary market segmentations* it has developed to describe me to marketers. I don’t get to object to being called a “Rolling Stone” or a “Midtown Minivanner” because these segmentations or “clusters” are fixed based on demographic details like household age, marital status, income, and “urbanicity.” Acxiom doesn’t expose this life stage segmentation to you, but I imagine mine is likely incorrect, given that Acxiom thought I owned a truck (I own no vehicle, let alone a truck).


But aside from these broad generalizations about people types based on demographic variables, it seems like there isn’t more customer segmentation happening based on more behavioral data (or at least proprietary segmentation like that is being kept under wraps). Market segmentation is still almost entirely demand driven, that is marketers come to Acxiom looking for specific parameters to define their customer segmentations, and hasn’t yet evolved to take advantage of the promise of big data to drive segmentation from correlative discoveries in the data.

For example, Howe’s described Porsche looking for the set of consumers who are likely to purchase a luxury vehicle in the next two weeks. The marketers, in this case Porsche, come to Acxiom with the set of parameters and models that will give them that set of customers to market to. Some of Acxiom’s customers are more sophisticated than others as to what parameters they are interested in (i.e. they have data scientists on their teams). So these segmentation parameters are generated by the demands of the marketer. They are essentially hypothesis driven, to match the product to the desired consumer behavior and interest data. The marketer says, “I’m looking for these people, Acxiom, show me where they are.” Acxiom will run those parameters, get rid of the twelve-year-olds who are ogling cars on the Porsche website, and deliver Porsche the men who are looking to buy their next midlife crisis fix so that Porsche can better target advertisement to those ready and willing customers.

We haven’t gotten anywhere closer to letting the data tell us about what kinds of segmentations might be interesting to market to, or even more advanced, to let the data define new market opportunities. This would be the supply-defined segmentation model of the data broker. And it seems like an underdeveloped opportunity for brokers to take on a role in defining markets, with a supply-driven market segmentation derived from the correlations. But it’s also reassuring that the big data promises of correlative discovery hasn’t yet resulted in the creation of new markets. The marketers, for the most part, are still defining the segmentation.

But it’s only a matter of time before defining markets with correlative methods becomes the value-adding, differentiating business of data brokers. Right now, segmentation for marketing purposes is only as useful as the market you think you are targeting. But given what we’ve seen in dragnet surveillance techniques for flagging behavioral patterns, I imagine we will begin to see the industry shift to include both demand (of marketers) segmentation, and supply (of the data brokers) proprietary insights gleaned from amassing and analyzing these huge datasets of both demographic and behavioral details. Right now there’s not enough finesse with correlations to handle false positives and false negatives to differentiate signal from noise. But I expect that will change with time.

I’ve been thinking a lot lately about who gets to define these segmentations and categorizations after reading Ian Hacking on human kinds, especially as the big data promise moves definitions away from humans with power to the largest databases with the best algorithms. When that correlative categorization paradigm shift happens, I wonder about the looping effects (Hacking’s term) of correlative categorizations of consumers. What does it mean for an algorithm to define a market segmentation, compared to a marketer to hypothesize their targeted demographic? And more importantly, what are the looping effects of these marketing segmentations if we as consumers don’t get to explicitly engage with and respond to them? Regardless of how they are defined, these categorizations are all influence with no accountability right now.

Howe briefly touched on the problem that once exposed and editable, consumers might falsify or obscure demographic data in aboutthedata, thus diminishing the value of the data. But Howe countered that if you are a fifty year old man, but you feel younger and more like 39 and you want the advertising you see to reflect that, then there’s not a lot of harm in making that change in the data (of course assuming that advertising is the only intended use of that demographic data). Howe cited that all marketing is aspirational, so no harm in giving consumers the ability to declare their aspirations. Market segmentations have been likened to opinions that marketers have about who they think their customers are. But isn’t data-based decision making supposed to remove the vicissitudes and messiness of opinions? Lying about your age shouldn’t be the only recourse to correct when personal preferences don’t match up with statistical norms in a population. That presumes consumers  know enough to understand the effects of any given data point on their desired advertising outcomes. Right now, we don’t have the ability to understand causality in the uses of data because it’s all opaque and hidden. At a large scale, marketers haven’t been interested in this disconnect because statistically significant patterns have been good enough. But if behavioral targeting is to reach its full potential, as I think Acxiom is invested in, it has to account for how individuals respond to, relate to, and rectify these demographically and behaviorally defined categorizations. 

Howe also talked about the need to trust common sense means of defining inappropriate uses of data (like for insurance, healthcare underwriting purposes) as opposed to regulative measures. To me, it is still very hard to have a productive discussion as a society about what “common sense” is, and should be, we don’t have the means to understand how are data is and is not being used right now, let alone how it should be used. Aboutthedata is a step towards declaring what data exists about us, but we still need better means of understanding how it is used. One step towards that for Acxiom would be to expose at least basic market segmentations that result from their demographic details (but the way Acxiom avoids this now is by saying they are both proprietary within Acxiom, and in large part defined by their marketer customers). I maintain that we can’t really have a common sense conversation about appropriate uses of data (i.e. norms) until we can actually start to trace the data and its uses in everyday practice.

All this leads me back to something I’ve been ruminating on for a while, which is the idea that right now we are in an uncanny valley of targeted advertising. Everything feels a little creepy, we can begin to infer how some of this is working, but it turns out what marketers are doing is still remarkably coarse and not nearly as granular and personally tailored as we think or expect it might be. The creepiest ads we see on Facebook are still based on very coarse demographic categorizations or are retargeted from cookies. Still, when we come across a retargeted ad (if we even know for sure that’s what it is) showing us a shirt that we were browsing for on J.Crew, which we already bought in the store with the company’s loyalty credit card, we feel offended and annoyed because we’ve already taken steps to expose our preferences and our loyalty to the brand, and yet they still don’t understand us in a way that feels right. With little guessing, I can assume that’s because there are silos between browser data, retargeted advertising buys, store records and credit card data. But it feels like we’re already at a point where that shouldn’t be the case anymore. I think we expect the personalization to be more advanced than it actually is. The uncanny valley of targeted advertising lies in the fact that we have some idea just how detailed these things could be, yet we have no means of understanding or confirming what’s going on behind the scenes.

Howe was talking about the potential for developing more of a direct relationship to consumers by building the infrastructure, the pipes, and the connections between all these disparate data sources, such that consumers might be able to say more about what they want, with the expectation of some value exchange. I remain doubtful that consumers are willing to spend time and energy managing those relationships except where they already have a vested interest in maintaining profile information. That’s Facebook’s and Google’s promise (and valuation). And that’s also the catch 22, that because we’ve updated our aboutthedata profile, or use the rewards credit card, that we think we should be getting something back for agreeing to participate in a closer relationship and exposing ourselves, and we aren’t yet.

*This pdf for Personicx seems to be outdated, because the URL reroutes to Acxiom’s new “Audience Operating System” product, but I think this document still gives a sense of the methods behind proprietary segmentation work Acxiom is offering on the supply side of the equation.

I’ve been translated into French! On the quantified self, taking data from an industrial to a personal scale, and autobiography through data:

Pour ces communautés, les données sont un miroir d’elles-mêmes. […] L’analogie est souvent faite avec une autobiographie. Lors des rencontres, ils mettent en scène leurs jeux de données, disent comment ils les ont collectées et ce qu’ils en ont appris. Ils utilisent leurs données pour raconter leur histoire et savent qu’en agissant sur leurs données, ils la modifieront.

Fellow Berkman fellow Judith Donath and I were on WGBH’s Greater Boston with Emily Rooney last night, talking about the early days of Facebook and how it has changed over the last ten years. My big takeaway: the audience for my profile and status updates has gone from being my friends and family to being the algorithms mining the data. More on the segment from WGBH here.

[the]facebook at 10

I remember reading about thefacebook in a paper copy of The Crimson at breakfast, and going back to my room to sign up on my Dell desktop because I thought it was a pretty good idea and became user number 1082.

I remember when Facebook memorialized your early adopter status with “member since” on your profile page. Member since: February 9, 2004.

I remember when we used our Facebook interest in movies and bands as coded signals which became the fodder for flirtation. 


I remember getting upset when the Newsfeed consolidated all our activity in one place because it meant that you didn’t have to go to friends’ individual pages to hunt for updates to their profile. 

I remember when Nick joined Facebook so he could post pictures from our trip to London together.

I remember being shocked that my brother was using Facebook to friend people in his college program before he got to college.

I remember when I purged my Facebook of tagged photos, wall posts, and inane profile interests to prepare for the job search.  

I remember when my mom joined Facebook.

I remember when I had my Facebook portrait painted by Matt Held.

I remember when Facebook asked me about my fiancé and showed me an ad for custom engagement rings.


Some of it is still there, in the archived messages and pictures. And it’s all probably still there in the servers, somewhere. But some of it is lost to me when I deleted it. Expecting that it might mean something in the future, I saved some of it. There’s a folder of embarrassing old Facebook profile images on my desktop that have since been deleted from my profile. And there’s a word doc where I copied my profile interests and band favorites before I deleted them all.

Facebook was still important to me over the last year when I went back to school and communicated with the rest of my cohort at OII. And when I was away in China it was a connection to home, as long as I could get through over VPN.

And it’s still the place where I keep up to date on my friends and families lives, even though we’re far dispersed. It does the work of updating us on major life events so that we can have more meaningful conversations when we reunite with old acquaintances in person.

But now Facebook is all spammy friend requests and birthday timeline posts. And Facebook keeps asking for more from us. It prompts us to share our “life events” and to share where we’ve worked. 

Facebook used to be a place where nuanced interactions happened. Flirtations, pokes, inside jokes, hidden meanings. I no longer think of my audience on Facebook as friends, acquaintances, humans. Instead, it’s the machine. It’s the algorithms, churning on all the data that’s in my profile and my status updates.

I was interviewed for an article in Data Informed on Nest, Google and data in our homes:

“It’s a Catch-22 problem. The way the Internet of Things is going to be appealing is when it becomes seamless and completely integrated into all the other data sources that you have,” says Sara M. Watson, a fellow at the Berkman Center for Internet and Society at Harvard University, who studies how technology is blurring the lines between the physical and virtual. “The problem with all this technology is that we’re adding all this functionality but that requires a lot of trust in those companies that handle that data.” …

“When all the digital information was on the Web, it was more contained and didn’t feel like encroachment. Now that it’s the world of physical devices, it’s much more personalized and graspable,” says Watson…

“I think the larger thing that’s going to happen is that we’re going to start demanding more visibility and legibility of data trails. Until now, there hasn’t been enough of a demand for that mostly because it’s been happening in the background,” says Watson, the technology researcher. “Encroachment into the physical world is where it gets more complicated.”

Read more