System architecture & data flow

ClawTouch is a hardware-level AI desktop automation product built for enterprise use. This document covers what the system is made of, how it executes a task, where data travels, what it runs on, and how it plugs into the systems you already have. It's written for technical decision-makers, security and compliance reviewers, and operations teams.

01 · Product overview

A complete ClawTouch deployment is three things: a desktop client, a WeChat mini-program for management, and a custom USB HID device. The AI agent issues keyboard and mouse events through the HID device, which the operating system handles via the standard USB HID class — the same input channel used by any physical keyboard or mouse. It is not software emulation like AutoHotkey or PyAutoGUI, and it is not an RPA script that reaches into application APIs or the DOM. It is a plug-and-play hardware input layer. Whether the resulting interaction patterns conform to any specific third-party platform's terms of service is the operating customer's responsibility; ClawTouch ships the execution layer, not a circumvention guarantee.

At this stage ClawTouch is sold only to enterprise customers who can handle their own compliance review. We ship hardware, software, the built-in AI models, and ongoing operational support as one package. For business inquiries, write to support@tinqiao.com.

02 · System components

A full deployment has three parts:

┌─────────────────────┐      ┌─────────────────────┐
│  Desktop client     │ ←──→ │  Mini-program        │
│  (Windows)          │      │  (management UI)     │
│  Local planning     │      │  Accounts / billing  │
│  Built-in AI model  │      │  Devices / remote    │
└──────────┬──────────┘      └─────────────────────┘
           │ USB
           ▼
   ┌──────────────┐
   │  HID device   │
   └──────────────┘

Desktop client

A Windows application. It plans and decides locally, then drives the HID device over USB to actually press the keys. It works alongside whatever browser you use — a read-only Sensor extension lets it see DOM content and visible text. Task results and run logs stay on the customer's own machine.

Mini-program

A WeChat mini-program for the operations team. It handles accounts, subscriptions, device pairing, and day-to-day operational management. From a phone, an operator can check device status, review run history, and receive exception notifications. It's also where customer-support tickets and our managed customer-service channel live.

HID device

A custom USB HID device that sits between the desktop client and the computer as the physical keyboard/mouse output layer. HID is a standard interface every operating system supports natively — no drivers to install, no admin privileges required, no process injection.

For deployments of 50 devices or more, we can build a dedicated desktop admin console that replaces or augments the mini-program, so an IT team can manage everything from a Windows desktop instead of phones. See Deployment modes & private deployment.

03 · The three-layer model

Every desktop action runs through the same loop: Perception → Decision → Action. The whole loop is orchestrated locally by the desktop client.

Layer	What it does	How it's built
Perception	Reads the current screen and turns the UI into something the model can reason about	OS interfaces (window + control trees), the Sensor browser extension (read-only DOM and text), a vision model trained on desktop UI, and OCR — combined in a multi-channel fusion step
Decision	Looks at the current state plus the task goal and plans the next step	A dedicated LLM with context-aware step planning and multi-step task orchestration
Action	Turns the decision into keystrokes and clicks, then issues them physically	Keyboard and mouse signals sent through the USB HID device — the same OS input path a human's keyboard would take

All three layers run on the customer's own machine. The execution path itself doesn't depend on any external service.

04 · A task, end to end

Here's the typical data flow for a single task:

1. User delegates a task     (from the mini-program or desktop client)
2. Client parses the task    (turn it into an executable goal)
3. Perception reads screen   (snapshot current state)
4. Decision plans next step  (LLM inference)
5. Action layer dispatches   (client → USB → HID device)
6. HID device fires output   (keystrokes / mouse events)
7. Screen state changes      (back to step 3 — next loop)
8. Task completes or fails   (result returned to the user)

A few things worth calling out about that flow:

Planning happens locally. Raw screenshots are never uploaded.
LLM calls carry only what's needed — a text summary of the current state and the task goal. Raw screenshots don't leave the machine.
Task results stay local in client-side logs, with retention governed by whatever policy the customer sets.
Our backend isn't in the execution path. It handles accounts, subscriptions, and device metadata — not the task itself.

For the full picture on data handling, the local-first principle, and where our compliance boundaries sit, see Data security & compliance.

05 · How it differs from RPA and similar tools

Engineering-wise, ClawTouch is a desktop execution layer. It's fundamentally different from software emulation, traditional RPA, and browser-extension automation — both in how it runs and in what it can be used for:

Dimension	Software emulation (AutoHotkey / PyAutoGUI)	Traditional RPA (API / DOM-driven)	Browser-extension automation	ClawTouch (hardware-level)
Execution layer	Kernel-level input injection / OS APIs	Public APIs / DOM operations	Limited to what extensions can do	Physical output from a USB HID device
Where it works	Single-machine scripting	Systems that expose APIs	Inside the browser	Any Windows desktop application
Resilience to UI changes	Weak when coordinates are hard-coded	Weak (breaks when APIs or DOMs change)	Medium (DOM-dependent)	Strong (vision-based perception fills the gaps)
OS-level input path	Software emulation — not the path a human takes	Doesn't touch the OS input layer	Doesn't touch the OS input layer	The same path a real keyboard would take

ClawTouch isn't here to replace RPA or browser-extension automation. It fills the gap they leave: any Windows desktop application, still working after a UI redesign.

06 · Runtime environment

Item	Requirement
Operating system	Windows 10 / 11 (x64)
Privileges	No admin rights needed; no process injection; no need to disable antivirus
Hardware	One free USB port
Memory	8 GB minimum, 16 GB recommended
Network	Outbound access for LLM API calls plus lightweight heartbeats to our backend (not required for offline deployments)
Browser support	The Sensor extension works in Chrome, Edge, Firefox, and other mainstream browsers
Display	Adapts automatically to multi-monitor setups and arbitrary DPI / scaling ratios

ClawTouch doesn't rely on system-level hooks, kernel drivers, virtual display adapters, or custom input methods. Cross-environment compatibility is handled at the HID layer — because HID is a standard OS interface, behaviour stays consistent across Windows versions and hardware configurations.

07 · AI models & integrations

Built-in AI model

The enterprise package ships with our own built-in models, covering both task planning and desktop UI recognition. There's no API key to provision and no separate usage allowance to buy — every device subscription includes a generous monthly quota.

Bring your own LLM (optional)

You can also point ClawTouch at your own LLM service — whether that's a cloud provider you already pay for or an open-source model running on your own infrastructure. The desktop client calls it directly; our side stays out of the loop and doesn't see or store the requests and responses.

API / webhook integration (custom add-on)

For customers that want to wire ClawTouch into existing CRM, ticketing, or OA systems, we offer custom integrations — both directions, so operational data can flow into your systems and your systems can trigger ClawTouch tasks. The exact shape depends on what you already have.

Custom-trained models (custom add-on)

If you have proprietary data and want differentiated results, we can fine-tune a model on your own data and hand you exclusive weights as an alternative to the built-in model. See Deployment modes & private deployment.