When AI Reviews AI: A Case Study in Benchmark Contamination

December 19, 2025December 19, 2025 ~ cafebedouin

Date: December 19, 2025Method: UKE_G Recursive TriangulationTarget: "Evaluating Large Language Models in Scientific Discovery" (SDE Benchmark) Two days ago, a new benchmark paper dropped claiming to evaluate how well large language models perform at scientific discovery. The paper introduced SDE (Scientific Discovery Evaluation)—a two-tier benchmark spanning biology, chemistry, materials science, and physics. Models were tested … Continue reading When AI Reviews AI: A Case Study in Benchmark Contamination

The DIA Director’s 2018 Reading List

March 17, 2018 ~ cafebedouin

Tired of the ordinary book suggestion lists? Try the DIA Director's 2018 Reading List, which has some interesting non-fiction reading suggestions from the U.S. intelligence community.