Date: December 19, 2025Method: UKE_G Recursive TriangulationTarget: "Evaluating Large Language Models in Scientific Discovery" (SDE Benchmark) Two days ago, a new benchmark paper dropped claiming to evaluate how well large language models perform at scientific discovery. The paper introduced SDE (Scientific Discovery Evaluation)—a two-tier benchmark spanning biology, chemistry, materials science, and physics. Models were tested … Continue reading When AI Reviews AI: A Case Study in Benchmark Contamination
Month: December 2025
Zuihitsu, 2025-11
These aren’t polished essays or tidy aphorisms. They’re scraps I’ve carried around this month—half-heard thoughts, borrowed lines, sudden recognitions—that refused to be forgotten. Zuihitsu literally means “following the brush,” and while my version is shorter and scrappier than the classical form, the impulse feels the same: to catch what drifts across the mind before it … Continue reading Zuihitsu, 2025-11
