This Week In Media Measurement

This Week In Media Measurement

Papers about Media Measurement

Episode

Transcript 111 lines

Cold Open

Jenny If a website says someone clicked, how sure are we that a real person actually meant to?
Davis I'm less sure than the dashboard is, because a click can be a thumb, a bot, or now an AI assistant doing errands in the background.
Jenny Exactly, and I don't trust any chart that treats every click like a tiny human confession when so much online activity is automated traffic in a trench coat.
Davis So the next question isn't just how many visits happened, it's who or what acted, and whether that action changed anything a person cared about.
Jenny That's the measurement problem this week, from vanishing users to agent-aware analytics, which just means counting the actor behind the signal...welcome to This Week In Media Measurement on paperboy.fm.

Stats Overview

Davis This week is bigger across the board: about sixteen hundred query hits, one hundred seventeen qualified papers, three hundred forty-nine unique authors, and thirty-one countries in the mix.
Jenny Qualified papers rose from ninety-five to one hundred seventeen, so that's twenty-two more papers, up 23.2 percent, and the first thing I want to know is whether that growth is signal or just a wider net.
Davis The net did widen: query hits went from one thousand three hundred seventy-five to one thousand six hundred forty-two, up two hundred sixty-seven, or 19.4 percent, while the semantic shortlist stayed fixed at two hundred, meaning the system still only kept two hundred papers for closer reading.
Jenny So the filter got more selective, and the mix underneath is very survey-heavy: thirty-three surveys, twenty-three qualitative studies, and twenty quantitative papers, with social media at twenty-three papers, consumer behavior at ten, and digital media at eight.
Davis That fits the through-line: measurement here is less about counting who saw media and more about asking what the signal represents, whether it's a survey response, a consumer action, or a platform trace.
Jenny The author pool jumped from two hundred fifty-five to three hundred forty-nine, up ninety-four authors, or 36.9 percent, and countries rose from twenty-seven to thirty-one, led by Indonesia with ten papers, the U.S. with nine, and China with eight.
Davis And the author tiers make this feel like an open week: eighty-four first-time authors, meaning their first-ever paper in the metadata, one hundred seventy-six emerging authors, and eighty-nine experienced authors, so half the field here is early-career rather than the same senior names repeating.

Paper Walkthrough

Jenny Alright, let's get into the papers with a big measurement problem hiding in plain sight. Babu George and Divya Choudhary have a twenty-twenty-six position paper in Information called The Vanishing User: Web Analytics in an Agent-Dominated Internet, and the basic claim is that the old unit of web analytics, the human user, is getting shaky.
Jenny Plain version: a click may no longer mean a person wanted something. The authors argue that crawlers, bots, AI agents, LLM-powered agents, and more autonomous agents can all create web traces, and they flag three things that make LLM agents especially messy: identity discontinuity, meaning they don't keep one stable identity; task-based instantiation, meaning they appear for one job and vanish; and agent-to-agent loops, meaning one system may be talking to another with no human in the moment.
Davis So if a click might come from a person, a bot, or an AI agent acting for someone, what can a publisher still safely infer from it?
Jenny Not as much as dashboards usually imply. This isn't an empirical test with a measured share of traffic; it's a conceptual roadmap that synthesizes work on bot detection, agent architecture, web measurement validity, automated-system governance, and digital trace data, then proposes five measurement primitives: task chain, actor class, interaction provenance, objective alignment, and signal authenticity.
Jenny In normal language, they want analytics to ask what task produced the trace, what kind of actor made it, where the interaction came from, whether it matched a human goal, and whether the signal is trustworthy. The limitation is real, though: the paper names the measurement problem more than it proves how big the problem is.
Davis That still feels like the right opening frame for this week. If AI scrambles signals, then engagement and conversion can't just be treated as little pieces of human intention anymore; they need actor labels before anyone builds a newsroom plan, an ad report, or a product decision on top of them.
Davis That actor-label point is a nice bridge, because this next paper asks what happens after a signal gets believed by real newsrooms. S. Dvir-Gvirsman and Lidor Ivan call it Contextual gatekeeping in a platformized news ecology, meaning editorial choices made inside platform rules, dashboards, and audience feedback loops.
Davis The plain finding is that audience engagement did predict more coverage later, but not evenly. Across two and a half years of data from thirty-nine English-language outlets in the United States, the United Kingdom, Canada, and Ireland, the strongest bump showed up on Facebook pages of digitally born outlets, weaker on Facebook pages of legacy outlets, smallest on legacy websites, with digitally born websites in the middle.
Jenny How did they know engagement came before the extra coverage, rather than just showing that a topic was already getting hotter and everyone was chasing it?
Davis They modeled whether engagement in a topical beat in the prior month predicted later story counts, split by platform and by outlet lineage, so the timing is built into the test. That gives this more weight than a vibes-based dashboard story, especially with thirty-nine outlets across four countries from twenty-seventeen to twenty-nineteen, but it's still a historical baseline from a social-media-dependent era, not automatically a map of subscriber-first newsrooms today.
Jenny So the takeaway isn't, engagement controls journalism. It's sharper than that: this is the When Metrics Push Back thread, because the metric seems to push hardest where the newsroom is already organized around the platform, and weakest where the website and legacy routines still give editors more room to say no.
Jenny That last paper treated engagement as a signal that can push a newsroom around, and this one asks what happens when the platform changes the meaning of the signal itself: Stephan Carney, Ignacio Riveros, and Stephanie M. Tully have a Journal of Consumer Research paper from twenty-twenty-six called Made With AI: Consumer Engagement with Social Media Containing AI Disclosures.
Jenny The plain version is pretty sharp: when social posts carry an AI-generated-content disclosure, people engage less, not mainly because they think the content is worse or because they hate AI, but because the creator feels less personally present. The authors call that reduced parasocial connection, meaning the one-sided emotional bond a viewer feels with a creator, and they tie part of it to perceived effort.
Davis So is this really about AI, or is it about people feeling like the creator did less work?
Jenny They try to separate those with two kinds of evidence: real TikTok engagement after TikTok introduced its AI-generated content disclosure policy, and eight preregistered experiments, which means the tests were planned before the data were collected. Across those experiments, the drop didn't seem to come from quality worries, artificial-content wariness, or broad AI aversion, and disclosures that signaled more human effort softened the engagement hit, though the field evidence is centered on TikTok, so Instagram, YouTube, or niche creator communities could behave differently.
Davis That makes the policy problem more concrete: if disclosure is required, don't just slap on a label that says made with AI and walk away. This is the AI Scrambles Signals thread again, because the same like or comment now partly measures whether the audience still believes a human did meaningful creative work.
Davis That TikTok point about a like measuring belief in human effort sets up this next one nicely: P. Truyens has a twenty twenty-six Journalism paper called Imagined audiences as interpretive filters, and it's about what happens when the same audience dashboard enters two different magazine cultures.
Davis The plain finding is that metrics don't arrive as neutral facts; journalists read them through the reader they already have in their head, which is what imagined audience means. In one Flemish commercial publisher, Truyens compares two weekly brands, a lifestyle magazine and a business-finance magazine, during a rollout of more sophisticated audience metrics, and the same infrastructure produced very different editorial meanings.
Jenny So if the same metric lands differently in two newsrooms, is the metric itself less objective than we pretend, or are we just seeing two teams with different brand stories doing normal interpretation?
Davis The evidence points to the second, but with teeth. Truyens combines ethnographic material from extended work with the organization and seventeen expert interviews, and finds that the lifestyle journalists pictured a close community of readers whose lives resembled their own, so engagement numbers could feel like proof of closeness, while the business-finance journalists pictured distant, demanding expert readers and investors, so the same metrics often looked like managerial oversight.
Davis The limitation is real: this is deep qualitative evidence from one publisher and two magazine brands, so it explains a mechanism better than it tells us how common that mechanism is across journalism.
Jenny That makes the dashboard rollout sound less like installing a thermometer and more like dropping a new manager into the room. It's the When Metrics Push Back thread again: if you don't design for trust and interpretation, access to analytics can change role performance without anyone agreeing on what the numbers actually mean.
Jenny That dashboard-as-manager idea has a technical cousin here, because this paper asks whether we can measure richer social media signals without pulling everybody’s raw data into one big pot. Li Wan and Bin Zhang call it Federated high order tensor fusion for privacy preserving multimodal social media analysis, in PLoS ONE twenty twenty-six.
Jenny Plain version: their system learns from mixed media, like text, images, and audio, while keeping the raw data local. That’s federated learning, meaning each site trains where the data live and shares model updates, and their fusion step uses tensor Tucker decomposition, which is a way to compress a big multiway table so the model can capture relationships across media types without adding a pile of redundant parameters.
Davis So what exactly improves here: the privacy posture, the prediction accuracy, or both?
Jenny Both, at least on the benchmarks they test. On the TREC twenty seventeen Precision Medicine Track Scientific Abstracts dataset, which is mostly text, they report better mean average precision, meaning the relevant results rank higher; and on CMU-MOSI, the multimodal sentiment benchmark, they show the high-order fusion helps model the links across modalities. The limitation is that these are benchmark datasets, so we still don’t know how it behaves inside a real publisher stack, ad system, or messy social platform pipeline.
Davis That’s a useful kind of caution. In the AI Scrambles Signals thread, this is the hopeful side: privacy-preserving measurement shouldn’t only mean stripping identity after collection, it can mean building stronger local models so the signal never has to leave home in its rawest form.
Davis Staying with that idea that the raw signal can get smarter before it travels, this next paper asks a very measurement-y question about text: what if a sentiment score is only useful when you know who is speaking and who they're judging? It's called Fine-Grained Sentiment Quantification of Media Texts Considering Sentence Type and Holder–Target Awareness, by Xiaoqing Ju and colleagues in ISPRS International Journal of Geo-Information, twenty twenty-six.
Davis The plain version is: don't give the whole article one mood score if one sentence is a factual setup, another is a quote, and a third is the outlet's own judgment. Their method scores sentiment at the sentence level, and it also extracts the opinion holder and opinion target, meaning it tries to say, for example, this minister criticized that country, not just this article feels negative.
Davis On the regression task, where the model is trying to predict a numeric sentiment score, they report an R-squared of zero point eight nine nine, a mean absolute error of zero point zero eight eight, and a mean squared error of zero point zero two seven. Their strongest RoBERTa baseline, which is a fine-tuned language model comparison, had an R-squared of zero point eight seven one, so the gain is modest but real: two point eight percentage points in R-squared and an eight point three percent reduction in mean absolute error.
Jenny Does a better sentiment score actually make the metric more interpretable for a media buyer or an editor, though? Because if the model is a little more accurate but still can't explain whether the negativity came from a journalist, a quoted opponent, or a source describing violence, that's not a decision tool yet.
Davis That's exactly why their design matters more than the score alone. They use a large language model mainly for semantic parsing, which means turning sentences into structured pieces, and they classify sentence types while extracting holders and targets; in a supplementary check, holder identification gets a weighted average F-one of zero point eight one two, while target identification gets a loose F-one of zero point six nine one. The limitation is that this looks stronger as structured extraction, but its usefulness still depends on the media texts and settings you apply it to.
Jenny That lands for me in the AI Scrambles Signals thread, because the AI isn't just producing a bigger black-box sentiment number here; it's trying to make the number auditable. If a dashboard says one country's coverage is more negative, I want to click through and see who held the opinion, what target it attached to, and whether the sentence was reporting, quoting, or judging.
Jenny That click-through idea matters here too, because this next paper asks whether a media signal actually moved people, not just whether it described them. Zhao Li and Gregory Martin's paper is called Media and intraparty ideological movements: how fox news built the tea party, and it's about Fox News exposure inside the Republican Party after two thousand nine.
Jenny The plain finding is big: Fox News didn't noticeably grow Tea Party rally sizes in early two thousand nine, but later in the two thousand nine to two thousand ten cycle, more exposure to Fox increased fundraising and primary vote shares for Tea Party candidates compared with other Republican candidates. So the effect shows up less as street protest, and more as money and votes inside Republican primaries.
Davis What makes channel position a credible way to separate media exposure from audience self-selection? Because if Tea Party voters already wanted Fox, then Fox isn't building the movement; it's just serving the people who were already there.
Jenny That's the design move: they use differences in where Fox News sat on local cable systems, because lower channel positions tend to get more casual viewing, and those positions varied for reasons that aren't just individual ideology. In plain terms, it's quasi-random exposure variation, meaning some similar places got an easier-to-watch Fox News feed than others; then they connect that to campaign fundraising, primary results, and a content analysis, which is just systematically coding coverage, showing pro-Tea Party slant emerging in two thousand ten.
Jenny That makes the support stronger than a simple correlation, especially because the timing lines up: no early rally boost, then later gains in donations and primary vote share when the coverage turns favorable. But the limit is real too: one outlet, one insurgent movement, and one unusually hot moment in United States politics.
Davis This is exactly the Effects Need Context thread for me. The takeaway isn't that cable news always manufactures factions; it's that if you want to claim media influence, look for a source of exposure people didn't fully choose, and then ask where the effect lands: attendance, money, votes, or power inside a party.
Davis That last paper was all about not calling something influence until you know where the effect lands, like donations, votes, or party power, and this retail paper makes the same move with a checkout feature. Yang Yuan, Gregory Heim, and Michael Ketzenberg study Emerging technology-enabled e-retailer returns processes: Buy-Online-Return-In-Store and e-retailer performance.
Davis The plain version is that letting people buy online and return in a store sounds like easy convenience, but it didn't automatically lift the business. In annual data from the Top one thousand e-retailers in North America, from twenty thirteen to twenty nineteen, BORIS had negligible direct benefit for website sales and no meaningful impact across the main metrics for pure e-retailers.
Jenny If BORIS creates more convenience, why did the direct performance gains look so limited, especially when returns are one of the big frictions in online shopping?
Davis They tested that with regressions, robustness checks, and endogeneity corrections, meaning they tried to separate the BORIS effect from the fact that stronger retailers may choose better return systems in the first place. They looked at website sales, order conversion rates, average customer order value, and website traffic, and the useful twist is conditional: for bricks-and-clicks retailers, BORIS only weakly helped average order value and traffic when it interacted with things like free return shipping or sponsored search.
Jenny So the advertiser translation is: don't pitch a cross-channel feature as conversion magic just because it feels customer-friendly. This is moderate evidence from a big sample, but the effects depend on retailer type, return policy, and promotion tactic, which puts it right back in the Effects Need Context bucket.
Jenny That BORIS paper kept saying, don't treat convenience as magic, and this one says the same thing about conversation. S. Li and J. Dillard's Shaping Interpersonal Communication About Persuasive Media Campaigns by Inducing Critical Versus Collective Orientations asks what happens after a health campaign leaves the screen and people start talking about it.
Jenny They ran an experiment with two hundred seventy-five regular drinkers of sugar-sweetened beverages. The plain finding is sharp: people who saw the reduce-sugary-drinks message and didn't talk showed persuasion, but people who talked freely showed counter-persuasion, which means the conversation pushed them away from the campaign's goal.
Davis So how do we measure a campaign once the audience starts talking back to it? Did the authors actually shape the conversation, or are they just measuring attitudes after people chatted?
Jenny They shaped it directly. Participants were assigned to no-talk, unguided talk, or prompted talk, and the prompts pushed either a critical focus, meaning evaluate the message carefully, or a collective focus, meaning talk through what people like us could do together. Both prompts reduced the boomerang by about the same amount, and the transcript clues were concrete: more on-topic talk, different sentence length, more cognitive process words, and more negations, with the strongest effects among heavy drinkers. The limit is real, though: this is sugary drink advocacy among regular drinkers, so a climate campaign or a vaccine campaign could spark a totally different peer dynamic.
Davis That's a very Effects Need Context result. A decent sample and a clean experiment make the warning hard to ignore: don't just buy reach, then hope word of mouth helps. If the audience is going to talk, the campaign may need to design that talk, because the peer conversation can amplify the message or quietly flip it.
Davis That last paper said peer talk can flip a campaign, and this one asks a similar question from the company side: what happens when the post itself is already a mix of values, politics, and brand? In Business & Society, Mika Vehka, Robin Forsberg, Juho Vesa, and Matti Nelimarkka call it Theorizing and Measuring General Nonmarket Communication.
Davis Plainly, they're measuring company messages that aren't just selling a product. They group corporate political activity, social responsibility, and activism into one category called general nonmarket communication, or GNC, meaning public-facing firm talk about the social and political world around the business. Using English-translated Twitter/X messages from large Finnish firms, they find that GNC posts perform better on social media in audience engagement.
Jenny When a company post mixes politics, social responsibility, and brand identity, what are we actually measuring? Is the model finding a real kind of message, or just spotting the words companies use when they want applause?
Davis Their move is to make the fuzzy category explicit, then test whether it behaves like a category. They use a large language model classifier, basically an AI system trained to sort posts into buckets, to identify GNC in the Finnish firms' translated X messages; then they check whether the measure looks plausible, varies over time and by firm, and predicts real engagement. The big limit is that this is validated on large Finnish firms, in translated posts, so I'd want other countries and languages before treating it as universal.
Jenny That's a pretty constructive ending for the feature papers. AI scrambles signals all week, but here it also helps name one, as long as nobody turns “higher engagement” into “therefore this is good strategy” without checking the market, the language, and the politics around the post. Okay, let's zoom out for a second.

Speed Round

Davis Alright, lightning list.
Jenny First, polarization: a 1,201-study review says selective exposure and consensus pressure keep social media camps hardening.
Davis India's OTT surge is the crisis story: 412 respondents, and subscriptions reportedly jumped 180% during COVID.
Jenny Environmental virality review: 82 studies, mostly Twitter sentiment, so emotion looks powerful but narrowly measured.
Davis For Instagram brands, 296 responses point to simple colors, adaptable tags, and creativity doing the engagement work.
Jenny German election sampling over 40 days found emotion predicted attention right after campaign messages, not in hindsight.
Davis South Korea panel data: social news made people feel informed, but didn't lift quiz-based political knowledge.
Jenny Financial newspapers saying recession became an indicator, forecasting US activity and recession odds six months ahead.
Davis In Slovenia's food sector, 145 social users with higher environmental awareness spotted greenwashing and trusted brands less.
Jenny Kathmandu survey, 406 young users: mobile ads worked when they felt useful, not merely present.
Davis Karnataka short-video study: 254 students reported lost time, distraction, and weaker study focus from Reels-style feeds.
Jenny Let's step back and see what signal all these measures are actually capturing.

Themes & News

Jenny The pattern this week is pretty clear to me. Measurement isn't just counting who saw something. It's asking what that count stands for, and whether anyone changed behavior after it.
Davis Right, and the thread I'd underline is Effects Need Context. A sales lift, a classroom outcome, or a newsroom decision only means something when you know the channel, the timing, and the people acting on the signal.
Jenny My caution is that a better dashboard doesn't automatically mean a truer answer. I'd want proof that the metric predicts an outcome outside the system that built it.
Davis And the practical consequence is big. If the signal is weak, brands waste budget, editors chase noise, and creators optimize for a number that may not represent real attention.
Davis And for some headlines that connect to what we've seen, here's our reporter.
Andrew (News) massmarketretailers.com reports that Coca-Cola is backing a new Universal Media Measurement framework, built with Top Line Marketing and Kantar.
Jenny That lands right on the theme. Paid, owned, earned, and shared media are usually measured in different languages, so a single currency sounds useful and dangerous.
Andrew (News) The framework uses AI-driven algorithms to turn channels like TV, retail media, packaging, influencers, and social into comparable quality ratings and impact metrics.
Davis So who actually feels this first, the brand team moving budget, or the agencies trying to defend what worked?
Andrew (News) The stated users are marketers comparing cost per impact across channels, with the system presented at the World Federation of Advertisers Media Forum in Stockholm last month.
Jenny The open question is validation. If AI harmonizes messy inputs, I still want to know what it treats as impact.
Andrew (News) Next, a TV and streaming measurement story.
Andrew (News) mediapost.com reports that OpenAP is unifying outcome measurement across its Big Nine member companies.
Davis This is the advertiser version of the same problem. Exposure is the easy count. The hard part is tying that exposure to sales, site activity, or brand lift.
Andrew (News) OpenAP says the initiative uses a standardized conversion API for TV, so campaign exposure and performance signals can be compared across major publishers.
Jenny Does this solve attribution, or does it mostly standardize the plumbing?
Andrew (News) The briefing frames it as reducing fragmented data silos, not as proving one universal causal model for every campaign.
Davis That distinction matters. Cleaner pipes help, but they don't turn correlation into proof.
Andrew (News) Final headline, from India.
Andrew (News) adgully.com reports that India's media landscape is moving beyond TRPs toward multi-dimensional attribution.
Jenny TRPs are television rating points, basically a standard estimate of TV viewing. If that single number weakens, everyone has to renegotiate what evidence counts.
Andrew (News) The story says a new Ministry of Information and Broadcasting registration window closed on April 27, while BARC India's status remained unclear in the coverage.
Davis Who gets squeezed when ratings lose authority: broadcasters, advertisers, or the measurement firms?
Andrew (News) The article points to all three, with advertisers moving toward TV-digital planning, deduplicated reach, incremental contribution, and sales lift models.
Jenny That's When Metrics Push Back in the wild. Change the metric, and you change the bargaining table.
Andrew (News) That's the news desk for this week.
Davis Thanks. So the takeaway is not that measurement is getting cleaner. It's that every cleaner-looking number needs a harder question behind it.

Sign-off

Jenny So the phrase I'm taking home is the vanishing user, because half the episode came back to the same problem: a click, an impression, or a household graph can look solid until you ask who it actually points to.
Davis Right, and the useful measurement wasn't just more signals. It was the signal that still reached a person, a choice, or an outcome someone could act on Monday.
Jenny If this was useful, maybe send it to the friend who has to explain a dashboard at 9 a.m. and secretly wants the caveats before the meeting starts.
Davis And if you want more shows that follow the papers instead of the vibes, there's a whole shelf of these at paperboy.fm.
Jenny I'm Jenny.
Davis I'm Davis, and this has been This Week In Media Measurement. See you next week.

Other Episodes