Thinking mode enables the model to reason step by step before producing its final answer. This is useful for complex tasks like math, logic, code generation, and multi-step analysis.Documentation Index
Fetch the complete documentation index at: https://docs.subconscious.dev/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
When thinking mode is enabled, the model generates internal reasoning tokens (wrapped in<think> tags) before the final response. These reasoning tokens help the model work through complex problems but are included in your output token usage.
Enabling Thinking Mode
Passchat_template_kwargs with enable_thinking: true in the request body. Since this is a Subconscious-specific extension, use the extra_body parameter in the OpenAI SDK:
Response Format
With thinking mode enabled, the model’s response includes reasoning wrapped in<think> tags followed by the final answer:
Streaming with Thinking
Thinking mode works with streaming. The reasoning tokens stream first, followed by the final answer:Python
When to Use Thinking Mode
Use thinking mode for:- Math and arithmetic problems
- Logic puzzles and reasoning tasks
- Complex code generation
- Multi-step analysis
- Tasks requiring planning or strategy
- Simple Q&A
- Creative writing
- Translation
- Summarization
- Tasks where speed matters more than accuracy
Thinking tokens count toward your output token usage. For simple tasks, leaving thinking mode off will be faster and more cost-effective.