Recommended models

Currently, we recommend using the following LLMs (these models were tested with Unctl, and demonstrated very good results):

  • codellama - Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and more efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.

  • mixtral - The Mixtral-8x7B Large Language Model (LLM) is a pre-trained generative Sparse Mixture of Experts. It outperforms Llama 2 70B on many benchmarks.

    As of December 2023, it is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs.

Model comparison

Notes (table columns description):

  • JSON valid: indicates whether the returned JSON was valid

    • Yes - valid python-parsable JSON

    • No - invalid, cannot be parsed

    • Yes/No - overall almost a valid JSON, but requires some additional normalization before loading

  • Required parsing: indicates whether the returned response needs additional parsing

    • Yes - the response contains a valid JSON with a leftover text, that we need to strip away

    • No - response contains a valid JSON only

  • Avg. time: average time in seconds for a single completion response

  • Accuracy: indicated how accurate the response was, whether it contains all required data or not

    • Bad - might return a valid JSON that contains: invalid k8s commands, text instructions, redundant description

    • Poor - response might contain some valuable data but in most cases fixes are missing, and diagnostics is pretty small

    • Good - contains valid k8s commands, good summary, all pieces are in place

    • Very good - Good + returns a valid JSON which doesn't require any modifications to get parsed

  • Max. tokens: maximum window size allowed by the model

Model
JSON valid
Required parsing
Avg. time
Accuracy
Max. tokens

llama2 7B

Yes

Yes

~10 seconds

Poor

4K

llama2 13B

Yes

Yes

~50 seconds

Bad

4K

codellama 7B

Yes

Yes

~20 seconds

Poor

100K

codellama 13B

Yes

No

~50 seconds

Very good

100K

mistral instruct 7B

Yes/No

Yes

~20 seconds

Poor

8K

mistral (default) 7B

Yes/No

Yes

~20 seconds

Poor

8K

mixtral 45B

Yes/No

No

~4700 seconds

Good

32K

Considering the results, codellama>=13B seems to be the best choice for a local usage.

A note on Mixtral: overall pretty good model, which has meaningful completions. Seems to be not usable locally as it requires 64GB of RAM and 2 GPUs (minimum). That limitation makes it a good choice for a "server-side" deployment, where you'd get a bigger server instance and deploy it there using LocalAI exposing the API to unctl.

Last updated