🧪

Mixtral

Mistral's sparse mixture-of-experts LLM with GPT-4 level performance

Code & Development

Mixtral

Mistral's sparse mixture-of-experts LLM with GPT-4 level performance

Code & DevelopmentFree

Mixtral 8x7B is a sparse mixture-of-experts language model developed by Mistral AI that achieves performance comparable to GPT-4 while using only a fraction of the compute at inference time. The model contains 8 expert networks, with only 2 active for any given token, giving it 46.7B parameters in total but behaving like a 12.9B parameter model for inference. Mixtral outperforms Llama 2 70B and matches or exceeds GPT-3.5 on most benchmarks while running faster. The model is openly released under the Apache 2.0 license, making it free for commercial use and self-hosting. Mixtral set a new benchmark for open-weight model efficiency when released in December 2023.