This Week in AI Research

Transcript 40 lines

Cold Open Stats Overview Paper Walkthrough free_promo

Cold Open

Jenny When a computer gives advice, what makes you trust it enough to actually act on it?

Davis For me it’s who’s on the hook if it’s wrong, because “the AI said so” isn’t a person you can call at 2 a.m.

Jenny Okay, but what is “trust” there—do you mean you believe it’s accurate, or you feel safe following it when the stakes are high?

Davis Both, and it changes with the setting; I’ll take a route suggestion and shrug, but if it’s a health call I want a clinician in the loop and a clear reason, not just a confident answer.

Jenny And that gap between “sounds smart” and “I’d actually do it” is the story this week…welcome to AI Research on paperboy.fm.

Stats Overview

Davis Stats check for the week: we scanned about eight-seventy papers in the date window, and we ended up qualifying 138 for the show. Those papers came from 294 unique authors across 15 countries, so it’s a real slice of the field, but narrower than last time.

Jenny And the first weird split is volume versus quality: total hits jumped to 872 from 675, up about 29%, but qualified papers fell to 138 from 156, down about 12%. Are we seeing more noise in the query results, like more “AI” tagged education pieces that aren’t actually research, or did our bar get stricter?

Davis The theme mix hints at why it might feel noisier: “Artificial Intelligence” shows up about 50 times, “artificial intelligence” another 44, and then you’ve got smaller clusters like natural language processing and machine learning at about 6 each, plus AI in education at about 6. That lines up with the through-line this week—less shiny new models, more real-world trust and learning settings—so you can get a lot of papers that mention AI without testing anything hard.

Jenny Second shift: unique authors dropped hard, to 294 from 481, down about 39%, which suggests a less diverse author pool in who’s publishing what we’re catching. Is that because a few big teams are putting out multiple related papers, or because our top methods—26 qualitative studies and 19 surveys—tend to come from tighter, repeated networks?

Davis And the author mix is young: 102 first-time authors, meaning first-ever paper, that’s about 35%, plus 124 emerging researchers at about 42%, and only 68 experienced at about 23%. Practically, that says a lot of this week’s evidence base may be early-stage—more exploratory interviews and surveys, fewer mature replication-heavy programs.

Jenny Third shift is geography: we’re down to 15 countries from 23, about a 35% drop, and the top country counts are small—India 6, Indonesia 6, China 5, then the US and UK at 2 each. That’s a narrower map, so when we talk about “what people trust” or “what works in classrooms,” we should keep asking: which settings are missing, and would the same results hold in, say, a different school system or labor market?

Paper Walkthrough

Paper 1 AI-Induced Job Anxiety and the Perceived Effectiveness of AI-Enabled ESG Initiatives: Evidence from Bank Employees

Jenny Alright, let’s get into the papers, starting with “AI-Induced Job Anxiety and the Perceived Effectiveness of AI-Enabled ESG Initiatives: Evidence from Bank Employees.”

Jenny It’s a survey study in one big commercial bank, with eight hundred fifty-eight employees, asking what happens in people’s heads when AI shows up in ESG work—ESG meaning the company’s environment, social impact, and governance promises.

Jenny Plain version: the more worried people feel that AI could threaten their job, the more they say AI is effective for ESG, even though they don’t actually report learning more or feeling more motivated.

Jenny The key number is an estimate of about zero-point-one-nine-five between AI-induced job anxiety and perceived AI effectiveness in ESG, with p less than zero-point-zero-zero-one, so it’s not a tiny fluke in this sample.

Davis If anxiety boosts perceived effectiveness, how do we know this isn’t just fear-driven attention—like, “I’m scared so I’m watching it closely,” not “we’re actually adopting it well”?

Jenny That’s basically the authors’ framing: they use a structural equation model—think of it as a linked set of regression paths—to separate a cognitive appraisal pathway, meaning how you judge the tool, from motivation and learning pathways, meaning whether you feel driven and whether you actually pick up knowledge.

Jenny And in their model, anxiety doesn’t significantly raise intrinsic motivation or knowledge acquisition in either AI or ESG, while motivation is the main driver of knowledge development, so anxiety looks like vigilance without capability-building.

Jenny Big limitation though: it’s one banking context and it’s self-reported perceptions, so it can’t show anxiety causes better ESG outcomes in the real world.

Davis That paradox feels painfully plausible: people can rate the AI as “effective” because it feels powerful and inevitable, while still not learning a single new thing about how it works or how to do ESG better.

Davis So if you’re rolling out AI for ESG, you can’t just run an employee pulse survey and call it success—you’d measure actual training uptake, skill checks, and maybe who can explain the model’s limits after, because this is a big sample and a pretty careful model, but it’s still sentiment inside one bank, not proof of impact.

Paper 2 Patient and Public Perceptions of Artificial Intelligence in Breast Imaging and Clinical Decision-Making: An Exploratory Cross-Sectional Survey Study

Davis You just said “big sample, careful model, still sentiment,” and it made me think of the healthcare version of that tradeoff.

Davis This one’s called Patient and Public Perceptions of Artificial Intelligence in Breast Imaging and Clinical Decision-Making, and it’s a paper survey of women in two UK breast care units in late twenty-twenty-five.

Davis Plainly: people aren’t anti-AI here, but their yes is conditional on a clinician still owning the call and on them actually knowing what the AI is doing.

Davis They had one hundred twenty respondents at Queen’s Hospital in Burton and the London Breast Institute, and only about half said they’d accept AI alongside clinicians for reading mammograms or ultrasound scans.

Jenny When they say “alongside clinicians,” what does that mean in a real appointment—like, does the AI talk first and the radiologist confirms, or is it just a second opinion nobody sees?

Davis The survey didn’t simulate a workflow; it asked about comfort and trust in scenarios like AI helping interpret images or triaging referrals, then they tested associations with Pearson chi-square—basically, “are these two things linked more than chance,” with Cramér’s V as a rough effect size.

Davis Acceptance tracked with tech comfort pretty strongly—p less than zero point zero zero one with a Cramér’s V of zero point four two—and it was lower for people with a previous breast cancer diagnosis, p equals zero point zero two, V equals zero point two two, but the big limitation is it’s a small, local sample, so it’s more a signal about concerns than a population estimate.

Jenny That “signal” is loud though: eighty percent said they had no knowledge of AI being used in breast clinics, and only thirty-seven percent said they’d trust AI findings, so if a hospital rolls this out quietly they’re basically manufacturing distrust.

Jenny It also nails that cross-paper thread—trust needs human oversight—because what patients are asking for is accountability in one sentence: tell me who’s responsible when the machine is wrong.

Paper 3 Generative AI for Collaborative Learning: Fostering Critical Thinking in Teacher Education

Jenny You know how that breast imaging survey basically screamed “tell me who’s responsible,” and eighty percent didn’t even know AI was in the clinic.

Jenny Okay, different setting, same trust problem: “Generative AI for Collaborative Learning: Fostering Critical Thinking in Teacher Education,” with seventy-five pre-service teachers using GenAI to co-write sustainable-development lesson plans.

Jenny The plain takeaway is that GenAI helps when students treat it like a draft partner they keep checking, not a vending machine for answers.

Jenny They tracked this with process mining—basically, mapping the sequence of actions over time—and the high-quality groups showed more loops of write, revise, refine the prompt, and then go verify the AI’s claims on other websites instead of just pasting.

Davis Before I buy “high-quality groups did better thinking,” how did they decide who was high versus low performers, and could that label bias what patterns they think they’re seeing?

Jenny They grouped teams by the quality of the final lesson plan, then used video-coded behaviors—writing, editing, searching other sources, copy-pasting—and the GenAI prompt types like conceptual, pedagogical, and refining prompts, to see what sequences tended to show up in each group.

Jenny So the “critical thinking” claim here is really “we saw more corroboration and iterative refinement behaviors in the better outputs,” and the big limitation is it’s one task in one teacher-education context, so we shouldn’t pretend it automatically transfers to every classroom or subject.

Davis I like that it turns “don’t use ChatGPT” into a concrete routine: refine the prompt, write in passes, and force a second source check, because that’s basically human oversight as a learning skill.

Davis And with seventy-five people it’s more than a cute demo, but it’s still a pretty specific teacher-ed bubble, so I’d want the same process map in, like, ninth-grade history before we call it a general law of learning with AI.