When experimenting with base or pretrained language models, the barrier to find a model can be high, since most available models are now instruction-tuned.

That’s why I wanted to share a couple of quick, practical ways to run these models, both through hosted APIs and locally, so you can start trying it out without wasting time searching for a model.

Using OpenAI’s API

Here’s a simple curl example using babbage-002:

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "babbage-002",
    "prompt": "I want to be a doctor when I",
    "max_tokens": 5,
    "temperature": 0.9
  }'

Running Local Models with Ollama

If you’d rather avoid APIs and run models locally, Ollama makes it surprisingly straightforward. You don’t need to worry about model weights, tokenizers, or custom serving scripts, just install Ollama and run:

ollama run mistral:7b-text-v0.2-q4_0

Or try another base model:

ollama run qwen2.5:0.5b-base-q8_0

Previous Post Next Post