Luau Benchmark Rankings

Comparative performance and efficiency analysis for leading intelligence models.

Cohort

61 Models

Spec

v1.0

Updated: 07/03/2026

#	Intelligence Model	Company	Weighted Score	Cost / 1k	Latency
1	OpenAI: GPT-5.4openai/gpt-5.4	openai	80.3%	$0.0002	494ms
2	Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview	google	77.9%	$0.0000	2,097ms
3	Anthropic: Claude Opus 4.6anthropic/claude-opus-4.6	anthropic	77.5%	$0.0003	2,259ms
4	Anthropic: Claude Opus 4.5anthropic/claude-opus-4.5	anthropic	77.5%	$0.0003	2,453ms
5	MoonshotAI: Kimi K2 0711moonshotai/kimi-k2	moonshotai	77.4%	$0.0000	2,214ms
6	Anthropic: Claude Haiku 4.5anthropic/claude-haiku-4.5	anthropic	76.7%	$0.0001	1,353ms
7	OpenAI: GPT-5.2 Chatopenai/gpt-5.2-chat	openai	76.7%	$0.0001	429ms
8	xAI: Grok Code Fast 1x-ai/grok-code-fast-1	x-ai	76.5%	$0.0000	1,336ms
9	Anthropic: Claude Sonnet 4.5anthropic/claude-sonnet-4.5	anthropic	76.2%	$0.0002	1,788ms
10	OpenAI: GPT-5.3 Chatopenai/gpt-5.3-chat	openai	75.8%	$0.0001	589ms
11	Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6	anthropic	75.8%	$0.0002	1,530ms
12	DeepSeek: DeepSeek V3.1 Terminusdeepseek/deepseek-v3.1-terminus	deepseek	75.6%	$0.0000	2,759ms
13	Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview	google	75.1%	$0.0000	1,577ms
14	Inception: Mercury Coderinception/mercury-coder	inception	74.9%	$0.0000	553ms
15	OpenAI: GPT-5.2-Codexopenai/gpt-5.2-codex	openai	74.5%	$0.0001	425ms
16	xAI: Grok 4 Fastx-ai/grok-4-fast	x-ai	73.7%	$0.0000	1,507ms
17	DeepSeek: DeepSeek V3.1deepseek/deepseek-chat-v3.1	deepseek	73.2%	$0.0000	2,002ms
18	Meta: Llama 4 Maverickmeta-llama/llama-4-maverick	meta-llama	72.0%	$0.0000	834ms
19	Mistral: Mistral Small Creativemistralai/mistral-small-creative	mistralai	71.8%	$0.0000	507ms
20	MoonshotAI: Kimi K2 0905moonshotai/kimi-k2-0905	moonshotai	71.7%	$0.0000	1,691ms
21	DeepSeek: DeepSeek V3.2deepseek/deepseek-v3.2	deepseek	71.4%	$0.0000	1,847ms
22	Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025	google	70.8%	$0.0000	1,108ms
23	xAI: Grok 3 Minix-ai/grok-3-mini	x-ai	70.5%	$0.0000	1,302ms
24	Mistral: Devstral 2 2512mistralai/devstral-2512	mistralai	69.4%	$0.0000	608ms
25	Inception: Mercuryinception/mercury	inception	66.9%	$0.0000	552ms
26	Cohere: Command Acohere/command-a	cohere	65.9%	$0.0002	1,421ms
27	Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23	qwen	64.2%	$0.0000	842ms
28	xAI: Grok 4.1 Fastx-ai/grok-4.1-fast	x-ai	62.2%	$0.0000	1,361ms
29	Qwen: Qwen3.5-122B-A10Bqwen/qwen3.5-122b-a10b	qwen	61.6%	$0.0000	704ms
30	Meta: Llama 4 Scoutmeta-llama/llama-4-scout	meta-llama	60.9%	$0.0000	502ms
31	Mistral: Ministral 3 14B 2512mistralai/ministral-14b-2512	mistralai	60.5%	$0.0000	719ms
32	Meta: Llama 3.3 70B Instructmeta-llama/llama-3.3-70b-instruct	meta-llama	59.2%	$0.0000	1,971ms
33	OpenAI: GPT-5.3-Codexopenai/gpt-5.3-codex	openai	53.8%	$0.0001	517ms
34	Mistral: Ministral 3 8B 2512mistralai/ministral-8b-2512	mistralai	53.0%	$0.0000	492ms
35	Inception: Mercury 2inception/mercury-2	inception	49.1%	$0.0000	554ms
36	MiniMax: MiniMax M2-herminimax/minimax-m2-her	minimax	48.7%	$0.0000	1,642ms
37	Cohere: Command R (08-2024)cohere/command-r-08-2024	cohere	47.6%	$0.0000	1,021ms
38	Qwen: Qwen3.5-27Bqwen/qwen3.5-27b	qwen	45.1%	$0.0000	953ms
39	xAI: Grok 4x-ai/grok-4	x-ai	44.5%	$0.0002	938ms
40	Mistral: Ministral 3 3B 2512mistralai/ministral-3b-2512	mistralai	43.1%	$0.0000	561ms
41	Cohere: Command R+ (08-2024)cohere/command-r-plus-08-2024	cohere	35.8%	$0.0002	1,109ms
42	Meta: Llama 3.2 3B Instructmeta-llama/llama-3.2-3b-instruct	meta-llama	33.7%	$0.0000	675ms
43	Cohere: Command R7B (12-2024)cohere/command-r7b-12-2024	cohere	29.9%	$0.0000	789ms
44	Qwen: Qwen3.5-35B-A3Bqwen/qwen3.5-35b-a3b	qwen	25.1%	$0.0000	804ms
45	DeepSeek: R1 0528deepseek/deepseek-r1-0528	deepseek	21.4%	$0.0000	1,427ms
46	Meta: Llama 3.2 1B Instructmeta-llama/llama-3.2-1b-instruct	meta-llama	21.2%	$0.0000	653ms
47	MiniMax: MiniMax M2.1minimax/minimax-m2.1	minimax	21.1%	$0.0000	1,985ms
48	Z.ai: GLM 4.6z-ai/glm-4.6	z-ai	21.1%	$0.0000	951ms
49	MoonshotAI: Kimi K2 Thinkingmoonshotai/kimi-k2-thinking	moonshotai	17.5%	$0.0000	1,800ms
50	MiniMax: MiniMax M1minimax/minimax-m1	minimax	16.6%	$0.0000	2,428ms
51	Z.ai: GLM 4.6Vz-ai/glm-4.6v	z-ai	15.9%	$0.0000	1,396ms
52	MiniMax: MiniMax M2.5minimax/minimax-m2.5	minimax	11.6%	$0.0000	1,487ms
53	Z.ai: GLM 4.7z-ai/glm-4.7	z-ai	10.0%	$0.0000	1,196ms
54	Z.ai: GLM 5z-ai/glm-5	z-ai	8.5%	$0.0001	1,996ms
55	Qwen: Qwen3.5 Plus 2026-02-15qwen/qwen3.5-plus-02-15	qwen	6.2%	$0.0000	1,639ms
56	Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview	google	4.9%	$0.0001	6,105ms
57	Google: Gemini 3 Pro Previewgoogle/gemini-3-pro-preview	google	4.8%	$0.0001	9,096ms
58	MiniMax: MiniMax M2minimax/minimax-m2	minimax	4.6%	$0.0000	3,248ms
59	MoonshotAI: Kimi K2.5moonshotai/kimi-k2.5	moonshotai	2.7%	$0.0000	1,302ms
60	DeepSeek: DeepSeek V3.2 Specialedeepseek/deepseek-v3.2-speciale	deepseek	0.7%	$0.0000	2,826ms
61	Z.ai: GLM 4.7 Flashz-ai/glm-4.7-flash	z-ai	0.0%	$0.0000	1,170ms

Research Methodology

Each model is evaluated over 30 tasks in Luau specific environments. Scores are generated through automated testing with 3x replicates per task to ensure reproducibility.

View Protocol

Open Infrastructure

All prompts, model parameters, and raw logs are publicly auditable. Submit new models or tasks via GitHub.

GitHub Repository