Recommended models
Recommended models for self-hosted usage (LocalAI, Ollama)
Currently, we recommend using the following LLMs (these models were tested with Unctl, and demonstrated very good results):
codellama
- Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and more efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.mixtral
- The Mixtral-8x7B Large Language Model (LLM) is a pre-trained generative Sparse Mixture of Experts. It outperforms Llama 2 70B on many benchmarks.As of December 2023, it is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs.
Model comparison
Notes (table columns description):
JSON valid: indicates whether the returned JSON was valid
Yes - valid python-parsable JSON
No - invalid, cannot be parsed
Yes/No - overall almost a valid JSON, but requires some additional normalization before loading
Required parsing: indicates whether the returned response needs additional parsing
Yes - the response contains a valid JSON with a leftover text, that we need to strip away
No - response contains a valid JSON only
Avg. time: average time in seconds for a single completion response
Accuracy: indicated how accurate the response was, whether it contains all required data or not
Bad - might return a valid JSON that contains: invalid k8s commands, text instructions, redundant description
Poor - response might contain some valuable data but in most cases
fixes
are missing, anddiagnostics
is pretty smallGood - contains valid k8s commands, good summary, all pieces are in place
Very good - Good + returns a valid JSON which doesn't require any modifications to get parsed
Max. tokens: maximum window size allowed by the model
llama2 7B
Yes
Yes
~10 seconds
Poor
4K
llama2 13B
Yes
Yes
~50 seconds
Bad
4K
codellama 7B
Yes
Yes
~20 seconds
Poor
100K
codellama 13B
Yes
No
~50 seconds
Very good
100K
mistral instruct 7B
Yes/No
Yes
~20 seconds
Poor
8K
mistral (default) 7B
Yes/No
Yes
~20 seconds
Poor
8K
mixtral 45B
Yes/No
No
~4700 seconds
Good
32K
Considering the results, codellama>=13B
seems to be the best choice for a local usage.
A note on Mixtral: overall pretty good model, which has meaningful completions. Seems to be not usable locally as it requires 64GB of RAM and 2 GPUs (minimum). That limitation makes it a good choice for a "server-side" deployment, where you'd get a bigger server instance and deploy it there using LocalAI exposing the API to unctl.
Last updated