Thinking Mode - Subconscious Docs

Thinking mode enables the model to reason step by step before producing its final answer. This provides higher quality outputs for complex tasks like math, logic, code generation, and multi-step analysis.

Closed models usually hide their reasoning, returning only a summary or nothing at all. Because Subconscious serves open models, the model’s reasoning is completely visible. You get the full, unaltered chain of thought, giving you total transparency for debugging, auditing, and trust.

How It Works

When thinking mode is enabled, the model generates internal reasoning tokens (wrapped in <think> tags) before the final response. These reasoning tokens help the model work through complex problems but are included in your output token usage.

Enabling Thinking Mode

Each wire format controls thinking with its own syntax. With the OpenAI format, pass the Subconscious extension chat_template_kwargs with enable_thinking: true via the extra_body parameter. With the Anthropic format, use the native thinking parameter ({"type": "enabled", "budget_tokens": ...}):

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.subconscious.dev/v1",
)

response = client.chat.completions.create(
    model="subconscious/tim-qwen3.6-27b",
    messages=[{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    extra_body={
        "chat_template_kwargs": {"enable_thinking": True},
    },
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-api-key",
  baseURL: "https://api.subconscious.dev/v1",
});

const response = await client.chat.completions.create({
  model: "subconscious/tim-qwen3.6-27b",
  messages: [{ role: "user", content: "What is 127 * 849 + 3621?" }],
  // @ts-expect-error Subconscious extension
  chat_template_kwargs: { enable_thinking: true },
});

console.log(response.choices[0].message.content);

from anthropic import Anthropic

client = Anthropic(
    auth_token="your-api-key",
    base_url="https://api.subconscious.dev",
)

message = client.messages.create(
    model="subconscious/tim-qwen3.6-27b",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "What is 127 * 849 + 3621?"}],
)

for block in message.content:
    if block.type == "thinking":
        print("[thinking]", block.thinking)
    elif block.type == "text":
        print(block.text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  authToken: "your-api-key",
  baseURL: "https://api.subconscious.dev",
});

const message = await client.messages.create({
  model: "subconscious/tim-qwen3.6-27b",
  max_tokens: 2048,
  thinking: { type: "enabled", budget_tokens: 2000 },
  messages: [{ role: "user", content: "What is 127 * 849 + 3621?" }],
});

for (const block of message.content) {
  if (block.type === "thinking") console.log("[thinking]", block.thinking);
  else if (block.type === "text") console.log(block.text);
}

curl https://api.subconscious.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "subconscious/tim-qwen3.6-27b",
    "messages": [{"role": "user", "content": "What is 127 * 849 + 3621?"}],
    "chat_template_kwargs": {"enable_thinking": true}
  }'

curl https://api.subconscious.dev/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "subconscious/tim-qwen3.6-27b",
    "max_tokens": 2048,
    "thinking": {"type": "enabled", "budget_tokens": 2000},
    "messages": [{"role": "user", "content": "What is 127 * 849 + 3621?"}]
  }'

Both controls enable the same underlying feature. The OpenAI format toggles it with the enable_thinking extension; the Anthropic format uses the native thinking config and also lets you cap reasoning with budget_tokens.

Response Format

With the OpenAI format, the model’s response includes reasoning wrapped in <think> tags followed by the final answer:

<think>
Let me calculate this step by step.
127 * 849 = 127 * 800 + 127 * 49
127 * 800 = 101,600
127 * 49 = 6,223
101,600 + 6,223 = 107,823
107,823 + 3,621 = 111,444
</think>

The answer is **111,444**.

With the Anthropic format, the reasoning is returned as a separate thinking content block before the text block, rather than inline tags:

{
  "content": [
    {"type": "thinking", "thinking": "Let me calculate this step by step...", "signature": ""},
    {"type": "text", "text": "The answer is **111,444**."}
  ]
}

Streaming with Thinking

Thinking mode works with streaming. The reasoning tokens stream first, followed by the final answer:

stream = client.chat.completions.create(
    model="subconscious/tim-qwen3.6-27b",
    messages=[{"role": "user", "content": "Solve: If 3x + 7 = 22, what is x?"}],
    stream=True,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": True},
    },
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

with client.messages.stream(
    model="subconscious/tim-qwen3.6-27b",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "Solve: If 3x + 7 = 22, what is x?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

When to Use Thinking Mode

Use thinking mode for:

Math and arithmetic problems
Logic puzzles and reasoning tasks
Complex code generation
Multi-step analysis
Tasks requiring planning or strategy

Skip thinking mode for:

Simple Q&A
Creative writing
Translation
Summarization
Tasks where speed matters more than accuracy

Thinking tokens count toward your output token usage. For simple tasks, leaving thinking mode off will be faster and more cost-effective.

​How It Works

​Enabling Thinking Mode

​Response Format

​Streaming with Thinking

​When to Use Thinking Mode

How It Works

Enabling Thinking Mode

Response Format

Streaming with Thinking

When to Use Thinking Mode