Overview
This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.
If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.
See also:
Recent API changes (most recent at the top)
| version |
PR |
desc |
| TBD. |
#17668 |
Default model name removed; "model" value from request no longer be reflected back |
| TBD. |
#17470 |
Add model load / unload endpoints |
| TBD. |
#17524 |
Enable --jinja by default |
| TBD. |
#16943 |
Add "model_alias" to /props endpoint |
| b6890 |
#16818 |
/metrics: Rename llamacpp:n_past_max -> llamacpp:n_tokens_max |
| b6523 |
#16109 |
In stream mode, error events are now OAI-compatible |
| b6508 |
#16052 |
include usage statistics only when stream_options.include_usage is specified |
| b6399 |
#15827 |
added return_progress and timings.cache_n |
| b6243 |
#15108 |
Add multimodal support to completions and embeddings endpoints |
| b6205 |
#15416 |
Disable context shift by default |
| b5441 |
#13660 |
Remove /metrics fields related to KV cache tokens and cells` |
| b5441 |
#13660 |
Remove /metrics fields related to KV cache tokens and cells` |
| b5223 |
#13174 |
For chat competion, if last message is assistant, it will be a prefilled message |
| b4599 |
#9639 |
/v1/chat/completions now supports tools & tool_choice |
| TBD. |
#10974 |
/v1/completions is now OAI-compat |
| TBD. |
#10783 |
logprobs is now OAI-compat, default to pre-sampling probs |
| TBD. |
#10861 |
/embeddings supports pooling type none |
| TBD. |
#10853 |
Add optional "tokens" output to /completions endpoint |
| b4337 |
#10803 |
Remove penalize_nl |
| b4265 |
#10626 |
CPU docker images working directory changed to /app |
| b4285 |
#10691 |
(Again) Change /slots and /props responses |
| b4283 |
#10704 |
Change /slots and /props responses |
| b4027 |
#10162 |
/slots endpoint: remove slot[i].state, add slot[i].is_processing |
| b3912 |
#9865 |
Add option to time limit the generation phase |
| b3911 |
#9860 |
Remove self-extend support |
| b3910 |
#9857 |
Remove legacy system prompt support |
| b3897 |
#9776 |
Change default security settings, /slots is now disabled by default Endpoints now check for API key if it's set |
| b3887 |
#9510 |
Add /rerank endpoint |
| b3754 |
#9459 |
Add [DONE]\n\n in OAI stream response to match spec |
| b3721 |
#9398 |
Add seed_cur to completion response |
| b3683 |
#9308 |
Environment variable updated |
| b3599 |
#9056 |
Change /health and /slots |
For older changes, use:
git log --oneline -p b3599 -- examples/server/README.md
Upcoming API changes
Overview
This is a list of changes to the public HTTP interface of the
llama-serverexample. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into themasterbranch.If you are building a 3rd party project that relies on
llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.See also:
libllamaAPIRecent API changes (most recent at the top)
"model"value from request no longer be reflected back--jinjaby default/metrics: Renamellamacpp:n_past_max->llamacpp:n_tokens_maxstream_options.include_usageis specifiedreturn_progressandtimings.cache_ncompletionsandembeddingsendpoints/metricsfields related to KV cache tokens and cells`/metricsfields related to KV cache tokens and cells`/v1/chat/completionsnow supportstools&tool_choice/v1/completionsis now OAI-compatlogprobsis now OAI-compat, default to pre-sampling probs/embeddingssupports pooling typenone"tokens"output to/completionsendpointpenalize_nl/slotsand/propsresponses/slotsand/propsresponses/slotsendpoint: removeslot[i].state, addslot[i].is_processing/slotsis now disabled by defaultEndpoints now check for API key if it's set
/rerankendpoint[DONE]\n\nin OAI stream response to match specseed_curto completion response/healthand/slotsFor older changes, use:
Upcoming API changes