What is the best LLM for text, coding, vision, search, or X? LMArena knows, and the results may surprise you.

LMArena is a widely referenced platform that ranks large AI models across multiple areas—including text, coding, vision, and specialized tasks—using extensive human preference data. Its leaderboards reflect which models users find most capable in real-world tasks, based on head-to-head comparisons and live voting.

LMArena assesses AI models in several distinct arenas:

  • Text: General language tasks (e.g., writing, reasoning, multi-turn dialogue)
  • WebDev: Coding and web development capabilities
  • Vision: Multimodal ability to handle images and text
  • Search: Information retrieval and synthesis
  • Copilot: Coding assistant and developer support
  • Text-to-Image: AI systems generating visual content from text prompts

Current Rankings by Area

Here are the current (July 2025) rankings.

Text Arena

RankModelScoreVotes
1Gemini-2.5-Pro146218,297
2o3-2025-04-16145224,554
2ChatGPT-4o-Latest-20250326144425,715
3GPT-4.5-Preview-2025-02-27143715,271
3Grok-4-070914334,227
5Claude-Opus-4-20250514-Thinking-16k141913,018
6Claude-Opus-4-20250514141621,129
6DeepSeek-R1-0528141414,078
6Gemini-2.5-Flash141423,738
6GPT-4.1-2025-04-14141219,766

WebDev Arena

RankModelScoreVotes
1Gemini-2.5-Pro14233,010
1DeepSeek-R1-052814071,978
1Claude Opus 4 (20250514)14044,322
3Claude Sonnet 4 (20250514)13783,258
4Claude 3.7 Sonnet (20250219)13577,481
6Gemini-2.5-Flash12993,681

Vision Arena

RankModelScoreVotes
1Gemini-2.5-Pro12684,382
2ChatGPT-4o-Latest-2025032612496,271
2o3-2025-04-1612385,097
2GPT-4.5-Preview-2025-02-2712313,066
3Gemini-2.5-Flash12245,184

Search Arena

RankModelScoreVotes
1Gemini-2.5-Pro-Grounding11421,215
1PPL-Sonar-Reasoning-Pro-High1136861
3PPL-Sonar-Reasoning10971,644
3PPL-Sonar10721,208

Copilot Arena

RankModelScoreVotes
1DeepSeek V2.5 (FIM)10282,292
1Claude 3.5 Sonnet (06/20)10123,544
1Claude 3.5 Sonnet (10/22)10043,596
1Codestral (25.01)10012,180
1Qwen-2.5-Coder (FiM)9983,401

Text-to-Image Arena

RankModelScoreVotes
1GPT-Image-1114822,691
2Imagen-4.0-Ultra-Generate-Preview-06-06111311,552
3Imagen-4.0-Generate-Preview-05-20109722,211

And the winner is…

Clearly, Gemini-2.5-Pro leads most areas, especially in Text, Vision, and WebDev. Google, again…

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply