changelog : `llama-server` REST API

# Overview

This is a list of changes to the public HTTP interface of the `llama-server` example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the `master` branch.

If you are building a 3rd party project that relies on `llama-server`, it is recommended to follow this issue and check it carefully before upgrading to new versions.

See also:

- [Changelog for `libllama` API](https://github.com/ggerganov/llama.cpp/issues/9289)

## Recent API changes (most recent at the top)

| version | PR  | desc |
| ---     | --- | ---  |
| TBD.  | #17668 | Default model name removed; `"model"` value from request no longer be reflected back |
| TBD.  | #17470 | Add model load / unload endpoints |
| TBD.  | #17524 | Enable `--jinja` by default |
| TBD.  | #16943 | Add "model_alias" to /props endpoint |
| b6890 | #16818 | `/metrics`: Rename `llamacpp:n_past_max` -> `llamacpp:n_tokens_max` |
| b6523 | #16109 | In stream mode, error events are now OAI-compatible |
| b6508 | #16052 | include usage statistics only when `stream_options.include_usage` is specified |
| b6399 | #15827 | added `return_progress` and `timings.cache_n` |
| b6243 | #15108 | Add multimodal support to `completions` and `embeddings` endpoints |
| b6205 | #15416 | Disable context shift by default |
| b5441 | #13660 | Remove `/metrics` fields related to KV cache tokens and cells` |
| b5441 | #13660 | Remove `/metrics` fields related to KV cache tokens and cells` |
| b5223 | #13174 | For chat competion, if last message is assistant, it will be a prefilled message |
| b4599 | #9639 | `/v1/chat/completions` now supports `tools` & `tool_choice` |
| TBD.  | #10974 | `/v1/completions` is now OAI-compat |
| TBD.  | #10783 | `logprobs` is now OAI-compat, default to pre-sampling probs |
| TBD.  | #10861 | `/embeddings` supports pooling type `none` |
| TBD.  | #10853 | Add optional `"tokens"` output to `/completions` endpoint |
| b4337 | #10803 | Remove `penalize_nl` |
| b4265 | #10626 | CPU docker images working directory changed to /app |
| b4285 | #10691 | (Again) Change `/slots` and `/props` responses |
| b4283 | #10704 | Change `/slots` and `/props` responses |
| b4027 | #10162 | `/slots` endpoint: remove `slot[i].state`, add `slot[i].is_processing` |
| b3912 | #9865 | Add option to time limit the generation phase |
| b3911 | #9860 | Remove self-extend support |
| b3910 | #9857 | Remove legacy system prompt support |
| b3897 | #9776 | Change default security settings, `/slots` is now disabled by default<br/>Endpoints now check for API key if it's set |
| b3887 | #9510 | Add `/rerank` endpoint |
| b3754 | #9459 | Add `[DONE]\n\n` in OAI stream response to match spec |
| b3721 | #9398 | Add `seed_cur` to completion response |
| b3683 | #9308 | Environment variable updated |
| b3599 | #9056 | Change `/health` and `/slots` |

*For older changes, use:*

```bash
git log --oneline -p b3599 -- examples/server/README.md
```

## Upcoming API changes

- TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changelog : `llama-server` REST API #9291

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

version	PR	desc
TBD.	#17668	Default model name removed; `"model"` value from request no longer be reflected back
TBD.	#17470	Add model load / unload endpoints
TBD.	#17524	Enable `--jinja` by default
TBD.	#16943	Add "model_alias" to /props endpoint
b6890	#16818	`/metrics`: Rename `llamacpp:n_past_max` -> `llamacpp:n_tokens_max`
b6523	#16109	In stream mode, error events are now OAI-compatible
b6508	#16052	include usage statistics only when `stream_options.include_usage` is specified
b6399	#15827	added `return_progress` and `timings.cache_n`
b6243	#15108	Add multimodal support to `completions` and `embeddings` endpoints
b6205	#15416	Disable context shift by default
b5441	#13660	Remove `/metrics` fields related to KV cache tokens and cells`
b5441	#13660	Remove `/metrics` fields related to KV cache tokens and cells`
b5223	#13174	For chat competion, if last message is assistant, it will be a prefilled message
b4599	#9639	`/v1/chat/completions` now supports `tools` & `tool_choice`
TBD.	#10974	`/v1/completions` is now OAI-compat
TBD.	#10783	`logprobs` is now OAI-compat, default to pre-sampling probs
TBD.	#10861	`/embeddings` supports pooling type `none`
TBD.	#10853	Add optional `"tokens"` output to `/completions` endpoint
b4337	#10803	Remove `penalize_nl`
b4265	#10626	CPU docker images working directory changed to /app
b4285	#10691	(Again) Change `/slots` and `/props` responses
b4283	#10704	Change `/slots` and `/props` responses
b4027	#10162	`/slots` endpoint: remove `slot[i].state`, add `slot[i].is_processing`
b3912	#9865	Add option to time limit the generation phase
b3911	#9860	Remove self-extend support
b3910	#9857	Remove legacy system prompt support
b3897	#9776	Change default security settings, `/slots` is now disabled by default Endpoints now check for API key if it's set
b3887	#9510	Add `/rerank` endpoint
b3754	#9459	Add `[DONE]\n\n` in OAI stream response to match spec
b3721	#9398	Add `seed_cur` to completion response
b3683	#9308	Environment variable updated
b3599	#9056	Change `/health` and `/slots`

changelog : llama-server REST API #9291

Description

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

changelog : `llama-server` REST API #9291