Rendered at 02:56:41 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
ricardobeat 1 days ago [-]
It’s interesting how little press Minimax M3 gets, given it outperforms Deepseek V4 Pro, previously the SOTA for open models. Meanwhile GLM has been in the news daily.
Reubend 23 hours ago [-]
It is strange, huh? But the hype cycles around these models often ignore good contenders. Xiaomi's MiMo-V2.5 Pro was doing really well and didn't get much hype either.
besterman23 1 days ago [-]
I wonder if multiple attempts at the opossum would produce better results.
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
ChrisArchitect 1 days ago [-]
Related:
GLM-5.2 is the new leading open weights model on Artificial Analysis
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
GLM-5.2 is the new leading open weights model on Artificial Analysis
https://news.ycombinator.com/item?id=48567759