Date: December 19, 2025Method: UKE_G Recursive TriangulationTarget: "Evaluating Large Language Models in Scientific Discovery" (SDE Benchmark) Two days ago, a new benchmark paper dropped claiming to evaluate how well large language models perform at scientific discovery. The paper introduced SDE (Scientific Discovery Evaluation)—a two-tier benchmark spanning biology, chemistry, materials science, and physics. Models were tested … Continue reading When AI Reviews AI: A Case Study in Benchmark Contamination
Tag: artificial intelligence models
The AI “Microscope” Myth
When people ask how we will control an Artificial Intelligence that is smarter than us, the standard answer sounds very sensible: "Humans can’t see germs, so we invented the microscope. We can’t see ultraviolet light, so we built sensors. Our eyes are weak, but our tools are strong. We will just build 'AI Microscopes' to … Continue reading The AI “Microscope” Myth
