We’re currently in research preview! We’re excited to share our system with you, and we would love to hear your feedback. If you have thoughts, we’d love to hear from you.
Why async?
Streaming responses are great when you want a human in the loop, but some workloads:- Run for a long time (many tool calls, large searches, multi‑step plans).
- Must be durable (you care that they finish, not that you watched every token).
- Need to fan out results to other systems (pipelines, CRMs, warehouses, etc.).
- Async jobs – you submit work, we enqueue and process it in the background.
- Polling – you can check the status and result at any time.
- Webhooks – we can call back into your system when the job finishes.
High‑level architecture
At a high level, the async path looks like this:- You call
POST /v1/chat/completions/asyncwith your usual model + messages. - We create an async job and enqueue it into an internal SQS queue.
- A dedicated worker service pulls jobs from the queue and runs them against the engine.
- On completion we:
- update the job status, and
- optionally enqueue one or more webhook deliveries.
- You:
- poll the status endpoint, or
- receive a webhook at your configured callback URL.
Async jobs & polling
The core entrypoint is:POST /v1/chat/completions/async
- return quickly with:
- a
requestId, status: "pending", and- an estimated completion time.
- a
- do the work in the background.
GET /v1/requests/{requestId}/status
status:pending | processing | completed | failed | cancelledjobId: internal job identifiermodel/engineresult(when completed)error(when failed)- timestamps (
createdAt,updatedAt,startedAt,completedAt)
Webhooks
Polling is ideal when you have a single client watching a single job. For automation and integrations, it’s more natural for us to notify you when jobs finish. You have two options:- Ephemeral webhooks – you pass a
callbackUrlon the async request, and we call that URL once when the job finishes. - Persistent subscriptions – you register webhook subscriptions for events like
job.completedorjob.failed, and we fan out to all matching subscribers.
- are queued and retried with exponential backoff,
- use timeouts to avoid hanging on bad receivers,
- eventually land in a DLQ if they cannot be delivered,
- are stored in our database so you can inspect their status.
When to use what
Use sync + streaming when:- a human is watching the response,
- you care more about interactivity than durability.
- the job may take a while,
- you have a client that can easily poll (dashboard, CLI, cron job),
- you want a simple way to check status and surface errors.
- you want to plug agents into other systems (HubSpot, Salesforce, workflow engines),
- you need a reliable, push‑based notification that work finished,
- you want a clear audit trail of deliveries and retries.
Related docs
- Quickstart – see the async polling example for a minimal curl flow.
- API Reference –
POST /v1/chat/completions/async,GET /v1/requests/{requestId}/status, and the/v1/webhooks/*endpoints. - Logs – how to inspect worker logs, queue age, and delivery errors.