OpenClaw Gateway OpenAI compatible HTTP API chat completions models CLI auth troubleshooting

Shipping an OpenAI-compatible HTTP surface on the OpenClaw Gateway means two planes must agree: the REST paths your IDE or agent calls (/v1/models, /v1/chat/completions) and the CLI control channel that proves RPC, auth, and listener health. When only one side looks green, operators chase ghosts โ€” streaming clients time out while models list succeeds, or curl returns 401 while a dashboard still shows "listening". This runbook maps the HTTP routes to CLI checks, forces openclaw gateway status --require-rpc into the acceptance path, and documents reproducible steps for streaming read stalls versus hard 401 on 2026.5.x. For tailnet-only Gateways where HTTPS and token headers already caused confusion, pair this with OpenClaw on Docker and Tailscale with zero public exposure and token-authenticated Gateway paths. When the same host also runs heavy Xcode workloads, see CLT-only vs full Xcode.app on remote bare-metal Apple Silicon โ€” footprint and drift so inference spikes do not starve the Gateway process.

1. Split the smoke tests: /v1/models then /v1/chat/completions

Start with GET /v1/models using the same base URL, TLS profile, and reverse-proxy hop as production clients. It is cheap, usually non-streaming, and isolates listener reachability from long-lived SSE or chunked bodies. Only after models return 200 with a sane payload should you exercise POST /v1/chat/completions with stream: true. If models fail while the CLI prints listeners, suspect path rewriting, missing /v1 prefixes, or a second virtual host stealing traffic โ€” not model weights.

Map this HTTP ladder to the CLI channel: the same operator identity that can run openclaw gateway status should be able to hit the HTTP base URL with curl from that host, proving loopback versus advertised hostname parity. When CLI status is green but HTTP fails off-box, you have a bind or ingress mismatch; when both fail together, chase process health and auth files first.

Rule: never declare the HTTP API "up" from models alone when streaming is in scope.

2. Align Authorization headers with what the Gateway actually validates

OpenAI-compatible clients typically send Authorization: Bearer โ€ฆ. Confirm the exact prefix your Gateway build expects โ€” some deployments also honour auxiliary headers or query tokens for legacy bridges. Capture three failing examples in the ticket: missing header, wrong scheme (Token vs Bearer), and correct header but clock-skewed JWT. Compare against the secret file or vault reference the CLI uses so you are not rotating the wrong slot. After any change, hit models twice: once with and once without the header to prove enforcement moved monotonically.

3. Cross-check with openclaw gateway status --require-rpc

Plain status can show process-local happiness while RPC dependencies are wedged. --require-rpc forces the same dependency graph your HTTP handlers touch โ€” policy, plugin registry, upstream model router โ€” to answer before you trust the surface. Run it immediately after deploy, after TLS renewals, and after editing openclaw.json. If status passes but HTTP still 401s, bisect ingress headers (stripped Authorization at the CDN) before touching tokens again.

  • Status output lists the advertised public base URL you curl against.
  • RPC-required checks finish within your SLO window, not only after manual warm-up.
  • Inner health loops use the same bearer material as external clients.

4. Reproducible streaming timeouts

Timeouts during streaming are often idle-read issues: intermediaries close quiet TCP sessions while tokens still generate. Reproduce with a fixed prompt that yields long gaps, log timestamps for first byte versus chunk n, and compare direct-to-Gateway curls with proxied paths. Raise idle ceilings on the proxy, enable HTTP/1.1 chunk flush hints where supported, and avoid running the Gateway on a host whose unified memory is already saturated by parallel Xcode indexing โ€” swap pressure delays token flushes and looks like a network fault.

5. Reproducible 401 matrix

Build a tiny table: anonymous models call (expect deny), valid bearer models call, valid bearer chat non-stream, valid bearer chat stream. Attach raw response bodies. If only streaming fails with 401, suspect a different upstream route or middleware that re-authenticates mid-body. If everything fails after a deploy, diff gateway.auth and confirm hot reload actually picked up the new file inode โ€” editors that swap temp files can leave listeners on stale secrets.

Keep one row that uses the exact client library your app ships (for example the same SDK version and default timeout), not only hand-crafted curl: some stacks strip hop-by-hop headers or retry without Authorization on the second attempt, which reproduces flaky 401 patterns that raw curls never show.

6. 2026.5.x ops notes and a remote high-memory Mac layout

On 2026.5.x, pin one Node major per fleet, keep plugin installs out of the Gateway user's home mix, and snapshot doctor before upgrades. For a remote high-memory Mac that stays resident: bind the Gateway to a stable tailnet or loopback-plus-ingress pair, cap concurrent streaming jobs so RAM pressure cannot delay TLS or token refresh RPCs, and park long agent sessions on a second machine when possible. Treat --require-rpc status as a release gate alongside HTTP probes so regressions are caught before traffic shifts.

Why Mac mini-class hardware fits this Gateway lane

OpenAI-compatible Gateways need predictable memory bandwidth, quiet thermals, and trustworthy NVMe more than headline core counts. A Mac mini on Apple Silicon pairs those traits with macOS primitives โ€” launchd, code signing, Gatekeeper, SIP, and FileVault โ€” that reduce risk for always-on listeners compared with ad-hoc PC stacks. Idle power stays low for 24/7 HTTP and RPC endpoints, while unified memory keeps concurrent streaming handlers responsive when large-context agent jobs contend for RAM. If you want a high-memory remote anchor without rack procurement, review the Macstripe home page to match region and RAM to your footprint โ€” Mac mini M4 remains a practical baseline for the pinned, audited layout this article describes, and it keeps total cost of ownership sensible next to bespoke workstations.

If you want this HTTP plus --require-rpc workflow on hardware that stays out of your way, Mac mini M4 is a strong next step โ€” explore options on the Macstripe home page and line up capacity before your next auth-plane change.