AI Document Anonymizer

Drop any document. The AI detects and replaces all personal data — names, emails, addresses, phone numbers, dates — privately, in your browser.

Model not loaded openai/privacy-filter · q4
Drop your document here
Supports .txt .md .csv .log .json .xml .srt .html · max 5 MB
or
Plain text 0 chars
Detects: Person Org Location Date Email / Phone / URL ID / Secret
Entities: 0
Analyzing…

How document anonymization works

Anonimus runs openai/privacy-filter — an open-source NER model trained to identify personally identifiable information in unstructured text. The model labels each word with one of eight privacy categories: person names, private addresses, email addresses, phone numbers, URLs, dates, account numbers, and secrets.

When you upload a document, the text is split into segments and fed to the model running inside a Web Worker thread. Nothing leaves your device — the model weights are downloaded once from the Hugging Face CDN, cached in your browser, and every subsequent run is fully offline.

01

Upload or paste your document

Drop any plain text file — .txt, .csv, .json, .md, .log, .srt or .html. You can also paste text directly.

02

AI scans for personal data

The privacy-filter model runs locally, reading each token and predicting whether it belongs to a protected category.

03

Download the clean version

Review highlighted entities in the annotated view, then copy or download the clean text — ready to share safely.

Why anonymizing documents matters

Organizations routinely share documents — contracts, support tickets, medical records, research datasets — that contain names, contact details, and other personal identifiers. Sharing those without redaction can violate GDPR, HIPAA, CCPA, or similar regulations depending on your jurisdiction.

Manual redaction is slow and error-prone. A single missed email address in a 50-page contract can constitute a data breach. Automated anonymization catches patterns a human reviewer might skim past.

GDPR compliance

Before sending customer data to third-party processors or publishing research, stripping identifiers reduces your compliance surface significantly.

Safe AI prompting

Anonymize internal documents before feeding them to cloud LLMs. No names or emails reach external APIs — only the structure you need analyzed.

Dataset preparation

Training data for machine learning must be free of real personal data. Sanitize corpora quickly before fine-tuning or publishing open datasets.

Legal discovery

Law firms and compliance teams redact third-party names from disclosed documents. Automated pre-screening saves hours of manual review.

Common use cases

Anonymize CSV exports before sharing

CRM exports, HR spreadsheets, and survey results often contain names and contact details. Drop the CSV, strip the PII, share safely with contractors or analytics teams.

Redact emails before forwarding

When escalating support tickets or forwarding client correspondence to third parties, remove the sender's personal details in one click.

Clean up AI training datasets

Sanitize text corpora scraped from public or internal sources before using them as fine-tuning data, reducing the risk of memorized personal information in outputs.

Prepare documents for LLM analysis

Before sending internal documents to ChatGPT, Claude, or similar cloud AI tools, anonymize them so names and emails never reach external APIs.

Frequently asked questions

Is my document uploaded to a server?

No. Anonimus processes everything locally in your browser using WebAssembly and optionally WebGPU. The document text never leaves your device and no data is transmitted to any server.

What types of personal data does it detect?

Eight categories: person names, private addresses, email addresses, phone numbers, URLs, dates, account/ID numbers, and secrets such as API keys. Each is replaced with a labeled placeholder like [PERSON] or [EMAIL].

Does it work on languages other than English?

The model was trained primarily on English text. It has some multilingual capability but accuracy degrades for non-English names and addresses. For critical non-English documents treat the output as a first pass and review manually.

How large can my document be?

The upload limit is 5 MB of plain text — roughly 2.5 million words. Very large files may take longer to process depending on your device.

Why does the model need to download on first use?

The quantized model weights are approximately 900 MB. They are downloaded once from the Hugging Face CDN and cached in your browser. All subsequent visits run fully offline from the cache.

Can I edit the anonymized output?

Yes. The Clean text tab is fully editable — manually fix any missed entities or adjust labels before copying or downloading the final document.

What file formats are supported?

Any plain text format: .txt, .md, .csv, .json, .xml, .log, .srt, .html. Binary formats like .docx or .pdf are not currently supported — export to plain text first.