Subconscious Cache - Subconscious Docs

Normally, our TIMRUN inference runtime has two features enabled by default to improve agent performance and ability.

Auto Compaction -> compact the message list at runtime.
Subconscious Cache -> Maintain both the prefix and the suffix around the pruned messages.

This guide will walk you through how to disable auto compaction, so that you can use the subconscious cache explicitly.

How It Works

To hit the subconscious cache, the cached tokens and new inputs need to satisfy two criteria:

The cached chain can be precisely split into three sections A, B, C
Section B is pruned.
The new input chain can be precisely split into three sections A, C, D, such that A and C match the prefix A and suffix C in the cache and len(C) > threshold. We usually set threshold = 8 tokens to avoid matching the suffix of chat templates.

Manually Triggering Subconscious Cache

Subconscious API enables auto-compaction by default. Under the auto-compaction mode, developers can send any message list to the LLM API and the inference system will detect prunable messages. Message pruning in the auto compaction mode will automatically hit the subconscious cache. If you want to manually hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards. The chat_template_kwargs extension is accepted on both the Completions and Messages endpoints.

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.subconscious.dev/v1",
)

response = client.chat.completions.create(
    model="subconscious/tim-qwen3.6-27b",
    messages=[{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    extra_body={
        "chat_template_kwargs": {"enable_auto_compaction": False},
    },
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-api-key",
  baseURL: "https://api.subconscious.dev/v1",
});

const response = await client.chat.completions.create({
  model: "subconscious/tim-qwen3.6-27b",
  messages: [{ role: "user", content: "What is 127 * 849 + 3621?" }],
  // @ts-expect-error Subconscious extension
  chat_template_kwargs: { enable_auto_compaction: false },
});

console.log(response.choices[0].message.content);

from anthropic import Anthropic

client = Anthropic(
    auth_token="your-api-key",
    base_url="https://api.subconscious.dev",
)

message = client.messages.create(
    model="subconscious/tim-qwen3.6-27b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    extra_body={
        "chat_template_kwargs": {"enable_auto_compaction": False},
    },
)

print(message.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  authToken: "your-api-key",
  baseURL: "https://api.subconscious.dev",
});

const message = await client.messages.create({
  model: "subconscious/tim-qwen3.6-27b",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is 127 * 849 + 3621?" }],
  // @ts-expect-error Subconscious extension
  chat_template_kwargs: { enable_auto_compaction: false },
});

console.log(message.content[0].text);

curl https://api.subconscious.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "subconscious/tim-qwen3.6-27b",
    "messages": [{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    "chat_template_kwargs": {"enable_auto_compaction": false}
  }'

curl https://api.subconscious.dev/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "subconscious/tim-qwen3.6-27b",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    "chat_template_kwargs": {"enable_auto_compaction": false}
  }'

When to Turn Off Auto Compaction

If you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune one continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules. Use Auto Compaction for:

Programming tasks, where assistant-tool-user messages keeps growing in a message list
Browser automation, where dead end exploration is easily pruned
Workflow automation, where stale tool calls pile up quickly
Multi-turn conversation, where rigid context pruning rule cannot handle arbitrary user inputs

Skip auto compaction for:

ReACT multi-modal reasoning: Subconscious cache works perfectly when you only keep latest turns / images in the message list
Other applications where you need to carefully control context engineering.

​How It Works

​Manually Triggering Subconscious Cache

​When to Turn Off Auto Compaction

How It Works

Manually Triggering Subconscious Cache

When to Turn Off Auto Compaction