System architecture & data flow
ClawTouch is a hardware-level AI desktop automation product built for enterprise use. This document covers what the system is made of, how it executes a task, where data travels, what it runs on, and how it plugs into the systems you already have. It's written for technical decision-makers, security and compliance reviewers, and operations teams.
01 · Product overview
A complete ClawTouch deployment is three things: a desktop client, a WeChat mini-program for management, and a custom USB HID device. The AI agent issues keyboard and mouse events through the HID device, which the operating system handles via the standard USB HID class — the same input channel used by any physical keyboard or mouse. It is not software emulation like AutoHotkey or PyAutoGUI, and it is not an RPA script that reaches into application APIs or the DOM. It is a plug-and-play hardware input layer. Whether the resulting interaction patterns conform to any specific third-party platform's terms of service is the operating customer's responsibility; ClawTouch ships the execution layer, not a circumvention guarantee.
At this stage ClawTouch is sold only to enterprise customers who can handle their own compliance review. We ship hardware, software, the built-in AI models, and ongoing operational support as one package. For business inquiries, write to support@tinqiao.com.
02 · System components
A full deployment has three parts:
┌─────────────────────┐ ┌─────────────────────┐
│ Desktop client │ ←──→ │ Mini-program │
│ (Windows) │ │ (management UI) │
│ Local planning │ │ Accounts / billing │
│ Built-in AI model │ │ Devices / remote │
└──────────┬──────────┘ └─────────────────────┘
│ USB
▼
┌──────────────┐
│ HID device │
└──────────────┘
Desktop client
A Windows application. It plans and decides locally, then drives the HID device over USB to actually press the keys. It works alongside whatever browser you use — a read-only Sensor extension lets it see DOM content and visible text. Task results and run logs stay on the customer's own machine.
Mini-program
A WeChat mini-program for the operations team. It handles accounts, subscriptions, device pairing, and day-to-day operational management. From a phone, an operator can check device status, review run history, and receive exception notifications. It's also where customer-support tickets and our managed customer-service channel live.
HID device
A custom USB HID device that sits between the desktop client and the computer as the physical keyboard/mouse output layer. HID is a standard interface every operating system supports natively — no drivers to install, no admin privileges required, no process injection.
For deployments of 50 devices or more, we can build a dedicated desktop admin console that replaces or augments the mini-program, so an IT team can manage everything from a Windows desktop instead of phones. See Deployment modes & private deployment.
03 · The three-layer model
Every desktop action runs through the same loop: Perception → Decision → Action. The whole loop is orchestrated locally by the desktop client.
| Layer | What it does | How it's built |
|---|---|---|
| Perception | Reads the current screen and turns the UI into something the model can reason about | OS interfaces (window + control trees), the Sensor browser extension (read-only DOM and text), a vision model trained on desktop UI, and OCR — combined in a multi-channel fusion step |
| Decision | Looks at the current state plus the task goal and plans the next step | A dedicated LLM with context-aware step planning and multi-step task orchestration |
| Action | Turns the decision into keystrokes and clicks, then issues them physically | Keyboard and mouse signals sent through the USB HID device — the same OS input path a human's keyboard would take |
All three layers run on the customer's own machine. The execution path itself doesn't depend on any external service.
04 · A task, end to end
Here's the typical data flow for a single task:
1. User delegates a task (from the mini-program or desktop client)
2. Client parses the task (turn it into an executable goal)
3. Perception reads screen (snapshot current state)
4. Decision plans next step (LLM inference)
5. Action layer dispatches (client → USB → HID device)
6. HID device fires output (keystrokes / mouse events)
7. Screen state changes (back to step 3 — next loop)
8. Task completes or fails (result returned to the user)
A few things worth calling out about that flow:
- Planning happens locally. Raw screenshots are never uploaded.
- LLM calls carry only what's needed — a text summary of the current state and the task goal. Raw screenshots don't leave the machine.
- Task results stay local in client-side logs, with retention governed by whatever policy the customer sets.
- Our backend isn't in the execution path. It handles accounts, subscriptions, and device metadata — not the task itself.
For the full picture on data handling, the local-first principle, and where our compliance boundaries sit, see Data security & compliance.
05 · How it differs from RPA and similar tools
Engineering-wise, ClawTouch is a desktop execution layer. It's fundamentally different from software emulation, traditional RPA, and browser-extension automation — both in how it runs and in what it can be used for:
| Dimension | Software emulation (AutoHotkey / PyAutoGUI) | Traditional RPA (API / DOM-driven) | Browser-extension automation | ClawTouch (hardware-level) |
|---|---|---|---|---|
| Execution layer | Kernel-level input injection / OS APIs | Public APIs / DOM operations | Limited to what extensions can do | Physical output from a USB HID device |
| Where it works | Single-machine scripting | Systems that expose APIs | Inside the browser | Any Windows desktop application |
| Resilience to UI changes | Weak when coordinates are hard-coded | Weak (breaks when APIs or DOMs change) | Medium (DOM-dependent) | Strong (vision-based perception fills the gaps) |
| OS-level input path | Software emulation — not the path a human takes | Doesn't touch the OS input layer | Doesn't touch the OS input layer | The same path a real keyboard would take |
ClawTouch isn't here to replace RPA or browser-extension automation. It fills the gap they leave: any Windows desktop application, still working after a UI redesign.
06 · Runtime environment
| Item | Requirement |
|---|---|
| Operating system | Windows 10 / 11 (x64) |
| Privileges | No admin rights needed; no process injection; no need to disable antivirus |
| Hardware | One free USB port |
| Memory | 8 GB minimum, 16 GB recommended |
| Network | Outbound access for LLM API calls plus lightweight heartbeats to our backend (not required for offline deployments) |
| Browser support | The Sensor extension works in Chrome, Edge, Firefox, and other mainstream browsers |
| Display | Adapts automatically to multi-monitor setups and arbitrary DPI / scaling ratios |
ClawTouch doesn't rely on system-level hooks, kernel drivers, virtual display adapters, or custom input methods. Cross-environment compatibility is handled at the HID layer — because HID is a standard OS interface, behaviour stays consistent across Windows versions and hardware configurations.
07 · AI models & integrations
Built-in AI model
The enterprise package ships with our own built-in models, covering both task planning and desktop UI recognition. There's no API key to provision and no separate usage allowance to buy — every device subscription includes a generous monthly quota.
Bring your own LLM (optional)
You can also point ClawTouch at your own LLM service — whether that's a cloud provider you already pay for or an open-source model running on your own infrastructure. The desktop client calls it directly; our side stays out of the loop and doesn't see or store the requests and responses.
API / webhook integration (custom add-on)
For customers that want to wire ClawTouch into existing CRM, ticketing, or OA systems, we offer custom integrations — both directions, so operational data can flow into your systems and your systems can trigger ClawTouch tasks. The exact shape depends on what you already have.
Custom-trained models (custom add-on)
If you have proprietary data and want differentiated results, we can fine-tune a model on your own data and hand you exclusive weights as an alternative to the built-in model. See Deployment modes & private deployment.