This Week In Wellbeing Measurement

This Week In Wellbeing Measurement

What counts as progress, and who gets counted? Explore the tools, tradeoffs, and evidence behind wellbeing metrics, from GDP alternatives and resilience indicators to mental health, aging, climate, and care.

Episode

Transcript 26 lines

Cold Open

Jenny Have you ever noticed how people change what they do the moment they know they are being scored?
Davis I hate how true that is, because a score can focus a clinic or a school fast, but it also teaches everyone where the spotlight is.
Jenny Right, and my worry is the spotlight becomes the work: if a blood pressure number is tied to pay, people may recheck the reading more often without actually getting more patients under control.
Davis Which is the uncomfortable question for this whole week: is the metric helping people get better, or just helping the system look measured...welcome to This Week In Wellbeing Measurement on paperboy.fm.

Stats Overview

Jenny This week we screened 1,407 hits and ended with 108 qualified papers, from about 550 authors across 30 countries. That qualified pile is up from 92 last episode, a 17.4 percent jump, and the center of gravity is familiar but consequential: mental health shows up 11 times, quality of life 8 times, and patient-reported outcomes 6 times.
Davis The bigger swing is upstream. Query hits rose from 741 to 1,407, up nearly 90 percent, while the semantic shortlist stayed fixed at 200, meaning the search got much broader but the closer-read gate did not widen. So what's driving the flood: more measurement papers, looser language around wellbeing, or databases tagging everything as health?
Jenny And the methods tell me to keep my eyebrows up. We have 25 survey-tagged papers, then 17 qualitative papers and 12 quantitative papers, with 9 longitudinal studies and 7 RCTs behind that. Survey-heavy weeks can be useful, but they often tell us what people report, not what changed.
Davis The author mix also matters for whose measures get built. Of 547 authors, 97 are first-time authors, meaning first paper in the metadata, not just new to our feed; 262 are emerging researchers, and 188 are experienced. That's almost half emerging, which can mean fresh populations and tools, but also a lot of measures still proving themselves.
Jenny Country spread is broad but uneven: the USA leads with 18 papers, then China with 7, the UK with 6, and Canada with 4. So the through-line holds: wellbeing metrics aren't neutral scorecards this week. They're deciding what counts as mental health, whose quality of life gets measured, and which patient voices make it into care decisions.

Paper Walkthrough

Paper 1 Impact measurement and management as governance infrastructure in development finance: Catalytic or extractive?

Davis Alright, let's get into the papers with Kate Bennett's twenty twenty-six article, Impact measurement and management as governance infrastructure in development finance: Catalytic or extractive? Development finance means loans and investments meant to improve economic and social conditions, and Bennett is asking whether the scorecards used there help systems learn or just make organizations perform for funders.
Davis The plain version is this: measurement can shift power, or it can hide power. Bennett looks at impact measurement and management, or I-M-M, which is the whole process of defining, tracking, and acting on impact, and she says catalytic frameworks build in participatory metric design, adaptive feedback loops, long-term systems thinking, and accountability that runs in more than one direction.
Jenny How would we know whether a measurement framework is actually changing development outcomes, rather than just changing reports?
Davis That's the key limit here: Bennett uses qualitative, structured content analysis, which means she systematically reads widely used impact tools and standards for design patterns, not for measured effects in communities. The analysis is useful because it compares established frameworks across three dimensions: design conditions, impact flow dynamics, and systemic feedback or regenerative capacity, but it's still diagnosing the machinery rather than proving what the machinery did on the ground.
Jenny That makes the governance point land for me, because a context-free standard, one-way reporting, and a short impact horizon can look tidy in a dashboard while quietly deciding whose experience counts. So the practical takeaway is boring in the best way: before you demand standardized reporting, build participation, feedback, and a long enough time horizon into the metric itself.

Paper 2 Evaluating the Consequences of a Hypertension Management Incentive

Jenny That dashboard point has a very clinical version. Claire Boone and Ari Robicsek's Evaluating the Consequences of a Hypertension Management Incentive asks what happened when a large U.S. health system put a hypertension control target, blood pressure below one hundred forty over ninety, into physician contracts.
Jenny The cleanest finding is uncomfortable: the money changed the measuring more than the medicine. Across three hundred thirty-four thousand three hundred sixty-four patients with hypertension, at one hundred three primary care practices, the sixty-three practices that got the incentive in January twenty twenty-two rechecked blood pressure more often, by one point nine percentage points, but hypertension control, medication changes, and cardiovascular hospitalizations did not significantly improve overall.
Davis So if the number got better because clinicians measured again before the patient left, is that better care, or better scorekeeping?
Jenny The authors try to separate that with a quasi-experimental difference-in-differences design, which basically compares the before-and-after change in incentivized practices with the before-and-after change in practices that did not get the contract change. It covers seven hundred seventy thousand nine hundred seven encounters, and that scale makes the evidence pretty strong, but it is still one U.S. health system, so the exact behavior may not travel to every payment model.
Davis This is measurement as governance with a stethoscope on it. A metric tied to pay can make someone take a second reading, which might be useful, but if prescriptions and dose adjustments do not move, and some patients may face more hospitalization risk, the metric has to be monitored like an intervention, not trusted like a thermometer.

Paper 3 Politicizing Expertise for Dutch Broad Wellbeing: Knowledge Accountability for Transformative Change

Davis That thermometer line is exactly where this next paper lives, but it takes the stethoscope out of the clinic and puts it in the Dutch cabinet room. T. Maas, Annet P. Pauwelussen, and Esther Turnhout have a twenty twenty-six paper called Politicizing Expertise for Dutch Broad Wellbeing: Knowledge Accountability for Transformative Change, and it's asking what happens when a government says progress means more than GDP.
Davis The plain version is this: a wellbeing dashboard doesn't just describe society, it helps decide what counts as a problem. The paper centers on Dutch Broad Wellbeing, the Netherlands' policy effort to measure quality of life, sustainability, and future impacts beyond gross domestic product, and the authors introduce knowledge accountability, meaning a way to ask what knowledge gets used in policy and what job that knowledge is supposed to do.
Davis Their big split is between two logics. A technocratic logic treats expertise like a cleaner instrument panel, where experts refine indicators and policymakers steer from the numbers, while a political logic says the indicators themselves carry values, conflicts, and tradeoffs that need to be argued about in public.
Jenny So when experts build a wellbeing dashboard, where do public values enter the process? Is this paper showing that one model works better, or is it more of a map of how expertise and policy are shaping each other?
Davis It's the second one. They analyze the co-production of expertise and policymaking in the Dutch Broad Wellbeing context, so co-production just means the knowledge system and the policy system are being built together, not one after the other. The evidence is useful but moderate, because this is a conceptual policy analysis with qualitative interpretation, not an evaluation showing that one accountability model produces better sustainability outcomes.
Jenny That feels like Measurement as Governance in its cleanest form. If the dashboard says what progress is, then the fight over the dashboard is a democratic fight, not a technical errand. I like the warning here: make expertise explicit, or the values still govern, they just do it from behind a spreadsheet.

free_promo

Paperboy.fm This is the free version of the podcast. Subscribe at paperboy.fm to access a dozen different paper review podcasts for five dollars a month.

Other Episodes