Ollama LLM API Integration

Ollama LLM API Integration

Ollama provides a platform to run LLM or large language models locally and supports API endpoints for interacting with the Model, the API responds in JSON format and can be expected via Stream as well, let's learn about Ollama LLM API Integration.

Load Model in Memory

By default, Ollama keeps the LLM Model in memory for 5 mins only.

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1"
}'

Load the LLM Model in Memory and keep it forever.

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "keep_alive": -1
}'

Unload the LLM Model from memory

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "keep_alive": 0
}'

Get a List of Model

curl --location 'http://localhost:11434/api/tags' \
--header 'Content-Type: application/json'

Get details of running the Model

curl --location 'http://localhost:11434/api/ps' \
--header 'Content-Type: application/json'

Get Model Details

curl --location 'http://localhost:11434/api/show' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1"
}'

Write the first prompt

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?"
}'

Ollama LLM API Integration without stream

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "prompt": "Why is the sky blue?",
    "stream": false
}'

Response Type

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "prompt": "Why is the sky blue? Respond using JSON",
    "stream": false,
    "format": "json"
}'

Temperature

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "prompt": "Why is the sky blue?",
    "stream": false,
    "options": {
        "temperature": 0
    }
}'

All Parameter

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "prompt": "Why is the sky blue?",
    "stream": false,
    "options": {
        "num_keep": 5,
        "seed": 42,
        "num_predict": 100,
        "top_k": 20,
        "top_p": 0.9,
        "min_p": 0.0,
        "tfs_z": 0.5,
        "typical_p": 0.7,
        "repeat_last_n": 33,
        "temperature": 0.8,
        "repeat_penalty": 1.2,
        "presence_penalty": 1.5,
        "frequency_penalty": 1.0,
        "mirostat": 1,
        "mirostat_tau": 0.8,
        "mirostat_eta": 0.6,
        "penalize_newline": true,
        "stop": [
            "\n",
            "user:"
        ],
        "numa": false,
        "num_ctx": 1024,
        "num_batch": 2,
        "num_gpu": 1,
        "main_gpu": 0,
        "low_vram": false,
        "f16_kv": true,
        "vocab_only": false,
        "use_mmap": true,
        "use_mlock": false,
        "num_thread": 8
    }
}'

Chat

curl --location 'http://localhost:11434/api/chat' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "stream": false,
    "messages": [
        {
            "role": "user",
            "content": "why is the sky blue?"
        }
    ]
}'

Generate Embedding

curl --location 'http://localhost:11434/api/embed' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "input": "Why is the sky blue?"
}'

Request Raw Prompt

Sometimes it's required to bypass the Prompt template and provide a full prompt, Ollama provides a raw parameter to disable the existing template.

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama3.1",
    "prompt": "[INST] why is the sky blue? [/INST]",
    "raw": true,
    "stream": false
}'

Show ModeFIle of the LLM Model

docker exec -it ollama_container ollama show --modelfile llama3.1

Create a New Model from the Existing Model

curl --location 'http://localhost:11434/api/create' \
--header 'Content-Type: application/json' \
--data '{
  "name": "mario",
  "modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'

now the new Model can be used.

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mario",
    "prompt": "Why is the sky blue?",
    "stream": false
}'

Delete the Model

curl --location --request DELETE 'http://localhost:11434/api/delete' \
--header 'Content-Type: application/json' \
--data '{
  "name": "mario"
}'

 

follow us on