Luau Benchmark Rankings

Comparative performance and efficiency analysis for leading intelligence models.

Cohort

61 Models

Spec

v1.0

Updated: 07/03/2026
Export RAW
#Intelligence ModelWeighted Score
1
OpenAI: GPT-5.2 Chatopenai/gpt-5.2-chat
83.6%
2
Anthropic: Claude Opus 4.5anthropic/claude-opus-4.5
83.6%
3
Anthropic: Claude Haiku 4.5anthropic/claude-haiku-4.5
82.6%
4
Anthropic: Claude Sonnet 4.5anthropic/claude-sonnet-4.5
81.9%
5
DeepSeek: DeepSeek V3.2deepseek/deepseek-v3.2
80.2%
6
Qwen: Qwen3.5-122B-A10Bqwen/qwen3.5-122b-a10b
78.8%
7
MoonshotAI: Kimi K2 0711moonshotai/kimi-k2
78.4%
8
Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6
77.7%
9
Inception: Mercury Coderinception/mercury-coder
76.4%
10
Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview
76.3%
11
Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview
76.2%
12
OpenAI: GPT-5.4openai/gpt-5.4
76.2%
13
Mistral: Devstral 2 2512mistralai/devstral-2512
75.8%
14
Anthropic: Claude Opus 4.6anthropic/claude-opus-4.6
75.2%
15
Inception: Mercuryinception/mercury
75.0%
16
MoonshotAI: Kimi K2 0905moonshotai/kimi-k2-0905
74.2%
17
Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23
73.9%
18
OpenAI: GPT-5.2-Codexopenai/gpt-5.2-codex
73.3%
19
DeepSeek: DeepSeek V3.1 Terminusdeepseek/deepseek-v3.1-terminus
73.2%
20
OpenAI: GPT-5.3 Chatopenai/gpt-5.3-chat
73.1%
21
xAI: Grok 4.1 Fastx-ai/grok-4.1-fast
72.8%
22
xAI: Grok 4 Fastx-ai/grok-4-fast
71.2%
23
Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025
69.4%
24
xAI: Grok Code Fast 1x-ai/grok-code-fast-1
68.9%
25
xAI: Grok 3 Minix-ai/grok-3-mini
67.9%
26
Cohere: Command Acohere/command-a
67.9%
27
DeepSeek: DeepSeek V3.1deepseek/deepseek-chat-v3.1
66.0%
28
Mistral: Mistral Small Creativemistralai/mistral-small-creative
65.3%
29
Meta: Llama 4 Maverickmeta-llama/llama-4-maverick
64.3%
30
Meta: Llama 4 Scoutmeta-llama/llama-4-scout
63.8%
31
Mistral: Ministral 3 8B 2512mistralai/ministral-8b-2512
62.5%
32
MiniMax: MiniMax M2-herminimax/minimax-m2-her
61.0%
33
Mistral: Ministral 3 14B 2512mistralai/ministral-14b-2512
58.2%
34
Meta: Llama 3.3 70B Instructmeta-llama/llama-3.3-70b-instruct
57.3%
35
Inception: Mercury 2inception/mercury-2
55.9%
36
Cohere: Command R+ (08-2024)cohere/command-r-plus-08-2024
53.5%
37
OpenAI: GPT-5.3-Codexopenai/gpt-5.3-codex
52.6%
38
xAI: Grok 4x-ai/grok-4
49.8%
39
Cohere: Command R (08-2024)cohere/command-r-08-2024
45.7%
40
Cohere: Command R7B (12-2024)cohere/command-r7b-12-2024
37.9%
41
Qwen: Qwen3.5 Plus 2026-02-15qwen/qwen3.5-plus-02-15
37.1%
42
Qwen: Qwen3.5-35B-A3Bqwen/qwen3.5-35b-a3b
35.5%
43
Mistral: Ministral 3 3B 2512mistralai/ministral-3b-2512
34.7%
44
Meta: Llama 3.2 3B Instructmeta-llama/llama-3.2-3b-instruct
32.7%
45
Meta: Llama 3.2 1B Instructmeta-llama/llama-3.2-1b-instruct
22.4%
46
Z.ai: GLM 4.6Vz-ai/glm-4.6v
18.8%
47
Z.ai: GLM 4.6z-ai/glm-4.6
16.9%
48
DeepSeek: R1 0528deepseek/deepseek-r1-0528
15.9%
49
MoonshotAI: Kimi K2 Thinkingmoonshotai/kimi-k2-thinking
14.4%
50
Z.ai: GLM 4.7z-ai/glm-4.7
13.2%
51
MiniMax: MiniMax M1minimax/minimax-m1
12.3%
52
Z.ai: GLM 5z-ai/glm-5
11.7%
53
DeepSeek: DeepSeek V3.2 Specialedeepseek/deepseek-v3.2-speciale
8.5%
54
MoonshotAI: Kimi K2.5moonshotai/kimi-k2.5
7.9%
55
MiniMax: MiniMax M2.5minimax/minimax-m2.5
7.6%
56
MiniMax: MiniMax M2.1minimax/minimax-m2.1
5.9%
57
Google: Gemini 3 Pro Previewgoogle/gemini-3-pro-preview
5.7%
58
Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview
5.0%
59
MiniMax: MiniMax M2minimax/minimax-m2
4.2%
60
Z.ai: GLM 4.7 Flashz-ai/glm-4.7-flash
3.5%
61
Qwen: Qwen3.5-27Bqwen/qwen3.5-27b
3.0%

Research Methodology

Each model is evaluated over 30 tasks in Luau specific environments. Scores are generated through automated testing with 3x replicates per task to ensure reproducibility.

View Protocol

Open Infrastructure

All prompts, model parameters, and raw logs are publicly auditable. Submit new models or tasks via GitHub.

GitHub Repository