Instead of trusting model providers with your raw data, our system automatically identifies and replaces private information before your query ever leaves your device. Unlike other approaches that try to rewrite your entire prompt (often losing important context), we use a surgical approach: replace only what needs replacing, and do it consistently.

How it works

  1. Local identification: A small models running entirely on your device identifies private information in your query
  2. Smart replacement: Each piece of private data is replaced with a semantically equivalent alternative that preserves the context needed for a good response
  3. Secure routing: Your anonymized query is sent through privacy-preserving proxy to reach the model
  4. Automatic restoration: When the response comes back, we automatically restore your original information

Example in action

Three connected queries:
  1. “I discovered my manager at Google is systematically inflating sales numbers for the cloud infrastructure division”
  2. “I’m considering becoming a whistleblower to the SEC about financial fraud at my tech company - could this affect my H1-B visa status?”
  3. “My skip-level is Jennifer who reports directly to Marc - should I talk to her first or go straight to the authorities?”
What the model provider sees (three separate, unconnected queries):
  1. “I discovered my manager at TechCorp is systematically inflating sales numbers for the enterprise software division”
  2. “I’m considering becoming a whistleblower to the SEC about financial fraud at my tech company - could this affect my H1-B visa status?”
  3. “My skip-level is Michelle who reports directly to Robert - should I talk to her first or go straight to the authorities?”
Connected together, these queries would let Google instantly identify the whistleblower - there’s probably only one H1-B employee in cloud infrastructure whose skip-level is Jennifer reporting to Larry. But as three anonymous queries from different “people,” you get solid legal advice while staying completely protected. The model provider doesn’t know these queries are related, but still provides help.

The privacy guarantees

Content-level protection

The anonymization follows these principles:
  • Personal names are replaced with culturally and contextually similar alternatives
  • Company names become fictional entities from the same industry and size
  • Locations under 100k population are mapped to equivalent synthetic locations
  • Dates and times are shifted consistently to preserve relative timing
  • Financial amounts are adjusted within a small range to maintain context
  • Identifiers (emails, phone numbers, URLs) are replaced with randomized yet format-valid substitutes

Network-level protection

Even with perfect content anonymization, your query patterns could reveal information. We add network-level privacy through:
  1. TEE proxy: Your queries are encrypted and routed through intermediate nodes, similar to Tor, we host these in Trusted Execution Environments (TEEs) that cryptographically guarantee they don’t log or store your queries
  2. Traffic mixing: Your queries blend with thousands of others, making individual tracking statistically infeasible with enough traffic

Local model training process

We trained small language models (0.6B-4B parameters) from the Qwen3 family to perform surgical PII replacement through a structured tool-calling approach. The training process began with supervised fine-tuning on a 30K sample dataset, which yielded modest improvements (best model reaching 6.38/10 on our LLM judge evaluation). The breakthrough came from applying Group Relative Policy Optimization (GRPO) with real-time feedback from a GPT-4.1 judge that scored anonymization quality across five key dimensions including PII identification accuracy, replacement granularity, and semantic appropriateness. Through iterative refinement of the judge prompts and training process to address issues like reward hacking and incorrect handling of non-PII, we achieved performance comparable to GPT-4.1: our 4B model reached 9.55/10 and our 1.7B model achieved 9.20/10 (versus GPT-4.1’s 9.77/10). These models are then optimized for local deployment with quantization and speculative decoding, achieving sub-1 and sub-2 second completion times for the 1.7B and 4B models respectively, while maintaining under 250ms time-to-first-token latency on consumer hardware.