When experimenting with base or pretrained language models, the barrier to find a model can be high, since most available models are now instruction-tuned.
That’s why I wanted to share a couple of quick, practical ways to run these models, both through hosted APIs and locally, so you can start trying it out without wasting time searching for a model.
Here’s a simple curl example using babbage-002:
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "babbage-002",
"prompt": "I want to be a doctor when I",
"max_tokens": 5,
"temperature": 0.9
}'
If you’d rather avoid APIs and run models locally, Ollama makes it surprisingly straightforward. You don’t need to worry about model weights, tokenizers, or custom serving scripts, just install Ollama and run:
ollama run mistral:7b-text-v0.2-q4_0
Or try another base model:
ollama run qwen2.5:0.5b-base-q8_0