Recommended models

Recommended models for self-hosted usage (LocalAI, Ollama)

Currently, we recommend using the following LLMs (these models were tested with Unctl, and demonstrated very good results):

codellama - Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and more efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
mixtral - The Mixtral-8x7B Large Language Model (LLM) is a pre-trained generative Sparse Mixture of Experts. It outperforms Llama 2 70B on many benchmarks.
As of December 2023, it is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs.

Model comparison

Notes (table columns description):

JSON valid: indicates whether the returned JSON was valid
- Yes - valid python-parsable JSON
- No - invalid, cannot be parsed
- Yes/No - overall almost a valid JSON, but requires some additional normalization before loading
Required parsing: indicates whether the returned response needs additional parsing
- Yes - the response contains a valid JSON with a leftover text, that we need to strip away
- No - response contains a valid JSON only
Avg. time: average time in seconds for a single completion response
Accuracy: indicated how accurate the response was, whether it contains all required data or not
- Bad - might return a valid JSON that contains: invalid k8s commands, text instructions, redundant description
- Poor - response might contain some valuable data but in most cases fixes are missing, and diagnostics is pretty small
- Good - contains valid k8s commands, good summary, all pieces are in place
- Very good - Good + returns a valid JSON which doesn't require any modifications to get parsed
Max. tokens: maximum window size allowed by the model

Model

JSON valid

Required parsing

Avg. time

Accuracy

Max. tokens

llama2 7B

Yes

~10 seconds

Poor

llama2 13B

Yes

~50 seconds

Bad

codellama 7B

Yes

~20 seconds

Poor

100K

codellama 13B

Yes

~50 seconds

Very good

100K

mistral instruct 7B

Yes/No

Yes

~20 seconds

Poor

mistral (default) 7B

Yes/No

Yes

~20 seconds

Poor

mixtral 45B

Yes/No

~4700 seconds

Good

32K

Considering the results, codellama>=13B seems to be the best choice for a local usage.

A note on Mixtral: overall pretty good model, which has meaningful completions. Seems to be not usable locally as it requires 64GB of RAM and 2 GPUs (minimum). That limitation makes it a good choice for a "server-side" deployment, where you'd get a bigger server instance and deploy it there using LocalAI exposing the API to unctl.

PreviousOllama NextInteractive app

Last updated 1 year ago