brew tap RayBytes/chatmock
brew install chatmockpipx install chatmockDownload from releases (macOS & Windows)
See DOCKER.md
# 1. Sign in with your ChatGPT account
# If you are running this on a headless server, append --headless
chatmock login
# 2. Start the server
chatmock serveThe server runs at http://127.0.0.1:8000 by default. Use http://127.0.0.1:8000/v1 as your base URL for OpenAI-compatible apps.
Raycast Integration
-
Configure the Host URL
Open your Raycast Extensions preferences, navigate to the Ollama settings section, and input the host URL (default is127.0.0.1:8000).

-
Sync Your Models
Click the Sync Models button, which will register all available models. -
Start Chatting
Open the Raycast AI Chat interface. You will now see model slugs which you can chat with.
Terax (Agentic Terminal) Integration
-
Configure the provider settings
Open your Terax settings, and switch to the Models tab, add a new provider (OpenAI Compatible), and input the host URL (default ishttp://127.0.0.1:8000/v1), along with the model IDs you wish to use (API key may be anything).

-
Favourite, and start using it!
Go back to your main chat window, select the model by going to the OpenAI Compatible icon, and clicking the model there (you may favourite it here to quickly select it the next time if you switch between models)
gpt-5.5gpt-5.4gpt-5.4-minigpt-5.3-codex-spark
- Tool / function calling
- Vision / image input
- Thinking summaries (via think tags)
- Configurable thinking effort
- Fast mode for supported models
- Web search tool
- OpenAI-compatible
/v1/responses(HTTP + WebSocket) - Ollama-compatible endpoints
- Reasoning effort exposed as separate models (optional)
All flags go after chatmock serve. These can also be set as environment variables.
| Flag | Env var | Options | Default | Description |
|---|---|---|---|---|
--reasoning-effort |
CHATGPT_LOCAL_REASONING_EFFORT |
none, minimal, low, medium, high, xhigh | medium | How hard the model thinks |
--reasoning-summary |
CHATGPT_LOCAL_REASONING_SUMMARY |
auto, concise, detailed, none | auto | Thinking summary verbosity |
--reasoning-compat |
CHATGPT_LOCAL_REASONING_COMPAT |
legacy, o3, think-tags | think-tags | How reasoning is returned to the client |
--fast-mode |
CHATGPT_LOCAL_FAST_MODE |
true/false | false | Priority processing for supported models |
--enable-web-search |
CHATGPT_LOCAL_ENABLE_WEB_SEARCH |
true/false | false | Allow the model to search the web |
--expose-reasoning-models |
CHATGPT_LOCAL_EXPOSE_REASONING_MODELS |
true/false | false | List each reasoning level as its own model |
Web search in a request
{
"model": "gpt-5.4",
"messages": [{"role": "user", "content": "latest news on ..."}],
"responses_tools": [{"type": "web_search"}],
"responses_tool_choice": "auto"
}Fast mode in a request
{
"model": "gpt-5.4",
"input": "summarize this",
"fast_mode": true
}Use responsibly and at your own risk. This project is not affiliated with OpenAI.