The Gradient

All the tokens fit to embed.December 26, 2025 THE LEDE: VISUAL DATA OVERLOAD The Context: Times photographers captured a turbulent year, from a president returning to power and wildfires ravaging Los Angeles to conflicts in Sudan and Gaza. Images include a destroyed house with a intact pool in Pacific Palisades and sea gulls swarming a … Continue reading The Gradient

When AI Reviews AI: A Case Study in Benchmark Contamination

Date: December 19, 2025Method: UKE_G Recursive TriangulationTarget: "Evaluating Large Language Models in Scientific Discovery" (SDE Benchmark) Two days ago, a new benchmark paper dropped claiming to evaluate how well large language models perform at scientific discovery. The paper introduced SDE (Scientific Discovery Evaluation)—a two-tier benchmark spanning biology, chemistry, materials science, and physics. Models were tested … Continue reading When AI Reviews AI: A Case Study in Benchmark Contamination

GPT-3 Creative Fiction

"Creative writing by OpenAI’s GPT-3 model, demonstrating poetry, dialogue, puns, literary parodies, and storytelling... [In Dr. Seuss style:]You have brains in your head.You have feet in your shoes.You can steer yourself any direction you choose.You’re on your way!—Gwern Bradwen, "GPT-3 Creative Fiction." Gwern.net. June 19, 2020. It's interesting to read GPT-3's take on different writing … Continue reading GPT-3 Creative Fiction

Emotional Regimes

"In September 2017, a screenshot of a simple conversation went viral on the Russian-speaking segment of the internet. It showed the same phrase addressed to two conversational agents: the English-speaking Google Assistant, and the Russian-speaking Alisa, developed by the popular Russian search engine Yandex. The phrase was straightforward: ‘I feel sad.’ The responses to it, … Continue reading Emotional Regimes