Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.subconscious.dev/llms.txt

Use this file to discover all available pages before exploring further.

Deploy Subconscious models directly on local hardware for offline inference, ultra-low latency, and complete data privacy. All processing happens on-device, so no data ever leaves the machine.

Zero latency

Inference runs directly on device with no network round-trip

Fully offline

Works without internet connectivity

Complete privacy

Data never leaves the device

Supported Devices

Workstations & Servers

  • Desktop machines with dedicated GPUs
  • Development workstations
  • Edge servers and on-site hardware

Laptops

  • GPU-equipped laptops (NVIDIA, Apple Silicon)
  • Development and field use

Mobile Devices

  • iOS and Android deployment
  • On-device inference for mobile applications

Use Cases

  • Sensitive data processing for healthcare, legal, and financial documents that cannot leave the device
  • Field operations where deployments don’t have reliable internet access
  • Development for local testing and iteration without API costs
  • Edge computing for real-time inference at the point of data collection

How It Works

We provide optimized model packages for different hardware targets. The local runtime exposes the same OpenAI-compatible API on localhost, so your application code works unchanged:
Python
from openai import OpenAI

# Same API, running locally
client = OpenAI(
    api_key="local",
    base_url="http://localhost:8080/v1",
)

response = client.chat.completions.create(
    model="subconscious/tim-qwen3.6-27b",
    messages=[{"role": "user", "content": "Hello!"}],
)

Talk to Sales

Schedule a call to discuss local deployment for your organization.