| {{{#!wiki style="margin:-0px -10px -5px" {{{#!folding [ 펼치기 · 접기 ] {{{#!wiki style="margin:-5px -1px -11px; word-break:keep-all" | OpenAI | GPT (1/2/3/4/oss/5/6개발 중) · (o1/o3/o4) |
| 구글 | Gemini(1/2/3) · Gemma · LaMDA · PaLM 2 | |
| Anthropic | Claude (Opus/Sonnet/Haiku) | |
| xAI | Grok | |
| 메타 | LLaMA · Muse Spark | |
| 기타 | }}}}}}}}} |
| Muse Spark Muse Spark | |
| 공개일 | 2026년 4월 8일 |
| 제작사 | 메타 초지능 연구소 |
| 기능 | 언어 모델 |
| 링크 | |
1. 개요
Muse Spark는 메타의 언어 모델이다.2. 제품
2.1. Muse Spark
2026년 4월 8일 공개되었다.| <rowcolor=#ffffff> 분류 | Benchmark | Muse Spark Thinking | Opus 4.6 Max | Gemini 3.1 Pro High | GPT 5.4 Xhigh | Grok 4.2 Reasoning |
| MULTIMODAL | CharXiv Reasoning Figure Understanding | 86.4 | 65.3 Self-Reported: 61.5 | 80.2 | 82.8 | 60.9 |
| MMMU Pro Multimodal Understanding | 80.4 | 77.4 | 83.9 | 81.2 | 75.2 | |
| ERQA Embodied Reasoning | 64.7 | 51.6 | 69.4 | 65.4 | 54.1 | |
| SimpleVQA Visual Factuality | 71.3 | 62.2 | 72.4 | 61.1 | 57.4 | |
| ScreenSpot Pro Screenshot Localization - With Python | 84.1 | 83.1 | 84.4 | 85.4 | — | |
| ZeroBench Multi-Step Visual Reasoning (pass@5) - With Python | 33.0 | — | 29.0 | 41.0 | — | |
| TEXT/REASONING | Humanity’s Last Exam Multidisciplinary Reasoning (No Tools) | 42.8 | 40.0 | 45.4 Self-Reported: 44.4 | 43.9 Self-Reported: 39.8 | 31.6 |
| Humanity’s Last Exam Multidisciplinary Reasoning (With Tools) | 50.4 | 53.1 | 51.4 | 52.1 | — | |
| ARC AGI 2 Abstract Reasoning Puzzles (Public) | 42.5 | 63.3 | 76.5 | 76.1 | 53.3 | |
| GPQA Diamond PhD Level Reasoning | 89.5 | 92.7 Self-Reported: 91.3 | 94.3 | 92.8 | 88.5 | |
| LiveCodeBench Pro Competitive Coding | 80.0 | 70.7 | 82.9 Self-Reported: 78.2 | 87.5 | 74.2 | |
| HEALTH | HealthBench Hard Open-Ended Health Queries | 42.8 | 14.8 | 20.6 | 40.1 | 20.3 |
| MedXpertQA (Text) Medical Multiple Choice | 52.6 | 52.1 | 71.5 | 59.6 | 50.2 | |
| MedXpertQA (MM) Medical Multiple Choice | 78.4 | 64.8 | 81.3 | 77.1 | 65.8 | |
| AGENTIC | DeepSearchQA Agentic Search | 74.8 | 73.7 | 69.7 | 73.6 | 62.8 |
| SWE-Bench Verified Agentic Coding | 77.4 | 80.8 | 80.6 | — | 76.7* | |
| SWE-Bench Pro Diverse Agentic Coding | 52.4 | 53.4 | 54.2 | 57.7 | 51.8* | |
| Terminal-Bench 2.0 Agentic Terminal Coding | 59.0 | 65.4 | 68.5 | 75.1 | 47.1* | |
| τ²-Bench Telecom Agentic Tool Use (Artificial Analysis) | 91.5 | 92.1 | 95.6 | 91.5 | 96.5 | |
| GDPval-AA Elo Office Tasks (Artificial Analysis) | 1444 | 1606 | 1320 | 1672 | 1055 |