Luau Benchmark Rankings

Comparative performance and efficiency analysis for leading intelligence models.

Cohort

61 Models

Spec

v1.0

Updated: 07/03/2026
Export RAW
#Intelligence ModelWeighted Score
1
OpenAI: GPT-5.4openai/gpt-5.4
80.3%
2
Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview
77.9%
3
Anthropic: Claude Opus 4.6anthropic/claude-opus-4.6
77.5%
4
Anthropic: Claude Opus 4.5anthropic/claude-opus-4.5
77.5%
5
MoonshotAI: Kimi K2 0711moonshotai/kimi-k2
77.4%
6
Anthropic: Claude Haiku 4.5anthropic/claude-haiku-4.5
76.7%
7
OpenAI: GPT-5.2 Chatopenai/gpt-5.2-chat
76.7%
8
xAI: Grok Code Fast 1x-ai/grok-code-fast-1
76.5%
9
Anthropic: Claude Sonnet 4.5anthropic/claude-sonnet-4.5
76.2%
10
OpenAI: GPT-5.3 Chatopenai/gpt-5.3-chat
75.8%
11
Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6
75.8%
12
DeepSeek: DeepSeek V3.1 Terminusdeepseek/deepseek-v3.1-terminus
75.6%
13
Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview
75.1%
14
Inception: Mercury Coderinception/mercury-coder
74.9%
15
OpenAI: GPT-5.2-Codexopenai/gpt-5.2-codex
74.5%
16
xAI: Grok 4 Fastx-ai/grok-4-fast
73.7%
17
DeepSeek: DeepSeek V3.1deepseek/deepseek-chat-v3.1
73.2%
18
Meta: Llama 4 Maverickmeta-llama/llama-4-maverick
72.0%
19
Mistral: Mistral Small Creativemistralai/mistral-small-creative
71.8%
20
MoonshotAI: Kimi K2 0905moonshotai/kimi-k2-0905
71.7%
21
DeepSeek: DeepSeek V3.2deepseek/deepseek-v3.2
71.4%
22
Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025
70.8%
23
xAI: Grok 3 Minix-ai/grok-3-mini
70.5%
24
Mistral: Devstral 2 2512mistralai/devstral-2512
69.4%
25
Inception: Mercuryinception/mercury
66.9%
26
Cohere: Command Acohere/command-a
65.9%
27
Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23
64.2%
28
xAI: Grok 4.1 Fastx-ai/grok-4.1-fast
62.2%
29
Qwen: Qwen3.5-122B-A10Bqwen/qwen3.5-122b-a10b
61.6%
30
Meta: Llama 4 Scoutmeta-llama/llama-4-scout
60.9%
31
Mistral: Ministral 3 14B 2512mistralai/ministral-14b-2512
60.5%
32
Meta: Llama 3.3 70B Instructmeta-llama/llama-3.3-70b-instruct
59.2%
33
OpenAI: GPT-5.3-Codexopenai/gpt-5.3-codex
53.8%
34
Mistral: Ministral 3 8B 2512mistralai/ministral-8b-2512
53.0%
35
Inception: Mercury 2inception/mercury-2
49.1%
36
MiniMax: MiniMax M2-herminimax/minimax-m2-her
48.7%
37
Cohere: Command R (08-2024)cohere/command-r-08-2024
47.6%
38
Qwen: Qwen3.5-27Bqwen/qwen3.5-27b
45.1%
39
xAI: Grok 4x-ai/grok-4
44.5%
40
Mistral: Ministral 3 3B 2512mistralai/ministral-3b-2512
43.1%
41
Cohere: Command R+ (08-2024)cohere/command-r-plus-08-2024
35.8%
42
Meta: Llama 3.2 3B Instructmeta-llama/llama-3.2-3b-instruct
33.7%
43
Cohere: Command R7B (12-2024)cohere/command-r7b-12-2024
29.9%
44
Qwen: Qwen3.5-35B-A3Bqwen/qwen3.5-35b-a3b
25.1%
45
DeepSeek: R1 0528deepseek/deepseek-r1-0528
21.4%
46
Meta: Llama 3.2 1B Instructmeta-llama/llama-3.2-1b-instruct
21.2%
47
MiniMax: MiniMax M2.1minimax/minimax-m2.1
21.1%
48
Z.ai: GLM 4.6z-ai/glm-4.6
21.1%
49
MoonshotAI: Kimi K2 Thinkingmoonshotai/kimi-k2-thinking
17.5%
50
MiniMax: MiniMax M1minimax/minimax-m1
16.6%
51
Z.ai: GLM 4.6Vz-ai/glm-4.6v
15.9%
52
MiniMax: MiniMax M2.5minimax/minimax-m2.5
11.6%
53
Z.ai: GLM 4.7z-ai/glm-4.7
10.0%
54
Z.ai: GLM 5z-ai/glm-5
8.5%
55
Qwen: Qwen3.5 Plus 2026-02-15qwen/qwen3.5-plus-02-15
6.2%
56
Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview
4.9%
57
Google: Gemini 3 Pro Previewgoogle/gemini-3-pro-preview
4.8%
58
MiniMax: MiniMax M2minimax/minimax-m2
4.6%
59
MoonshotAI: Kimi K2.5moonshotai/kimi-k2.5
2.7%
60
DeepSeek: DeepSeek V3.2 Specialedeepseek/deepseek-v3.2-speciale
0.7%
61
Z.ai: GLM 4.7 Flashz-ai/glm-4.7-flash
0.0%

Research Methodology

Each model is evaluated over 30 tasks in Luau specific environments. Scores are generated through automated testing with 3x replicates per task to ensure reproducibility.

View Protocol

Open Infrastructure

All prompts, model parameters, and raw logs are publicly auditable. Submit new models or tasks via GitHub.

GitHub Repository