When AI Reviews AI: A Case Study in Benchmark Contamination

December 19, 2025December 19, 2025 ~ cafebedouin

Date: December 19, 2025Method: UKE_G Recursive TriangulationTarget: "Evaluating Large Language Models in Scientific Discovery" (SDE Benchmark) Two days ago, a new benchmark paper dropped claiming to evaluate how well large language models perform at scientific discovery. The paper introduced SDE (Scientific Discovery Evaluation)—a two-tier benchmark spanning biology, chemistry, materials science, and physics. Models were tested … Continue reading When AI Reviews AI: A Case Study in Benchmark Contamination

Zuihitsu, 2025-11

December 18, 2025December 30, 2025 ~ cafebedouin

These aren’t polished essays or tidy aphorisms. They’re scraps I’ve carried around this month—half-heard thoughts, borrowed lines, sudden recognitions—that refused to be forgotten. Zuihitsu literally means “following the brush,” and while my version is shorter and scrappier than the classical form, the impulse feels the same: to catch what drifts across the mind before it … Continue reading Zuihitsu, 2025-11