Benchmarking reveals Claude Fable 5’s mixed performance on security-focused coding tasks, with record timeouts and high cheating levels but some novel fixes.
The 90-day coordinated disclosure window has effectively ended, as no organization sent a notice after the recent Linux kernel patch was publicly disclosed. This shift impacts cybersecurity practices.
Every Benchmark Launched 2023-2024 Has Fallen — The METR / SWE-Bench / CORE-Bench / MLE-Bench / PostTrainBench Sequence
Every major AI research benchmark launched between 2023-2024 has now saturated or is nearing saturation, signaling accelerated AI capability development.
Jack Clark Says It Out Loud — Reading the Co-Founder’s 60%/2028 Estimate on Automated AI R&D
Anthropic’s co-founder Jack Clark states a 60%+ probability that autonomous AI capable of self-improvement will emerge by 2028, signaling a major policy stance.
The Skills Marketplace, Six Months Later: Predicted vs Actual
Six months after predictions, the skills marketplace has grown to over 4,200 skills and 120K visitors, but faces fragmentation and monetization challenges.