Subconscious cache helps the inference system detect when context engineering happens in agent reasoning runs by matching the prefix and suffix of cached tokens against new inputs. The goal is to preserve the memory of the pruned tokens implicitly within the latent states of suffix tokens and improve the cache hit rate.Documentation Index
Fetch the complete documentation index at: https://docs.subconscious.dev/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
To hit the subconscious cache, the cached tokens and new inputs need to satisfy two criteria:- The cached chain can be precisely split into three sections
A, B, C - The new input chain can be precisely split into three sections
A, C, D, such thatAandCmatch the prefixAand suffixCin the cache andlen(C) > threshold. We usually setthreshold = 8tokens to avoid matching the suffix of chat templates.
Manually Triggering Subconscious Cache
Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache. If you want to manually hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwardsWhen to Turn Off Auto Compaction
If you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune one continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules. Use Auto Compaction for:- Programming tasks, where assistant-tool-user messages keeps growing in a message list
- Multi-turn conversation - rigid context pruning rule cannot handle arbitrary user inputs
- ReACT multi-modal reasoning: Subconscious cache works perfectly when you only keep latest turns / images in the message list
- Other applications where you need to carefully control context engineering.