How to Find Personal Information in Text
Last Updated: January 30, 2026
What Does the Analyzer Do?#
The Analyzer is like a detective. It reads through text and finds personal information that should be kept private. This could be things like:
- Someone's name
- An email address
- A phone number
- A credit card number
- A home address
This is useful because sometimes you need to share documents but you want to hide the private stuff first.
How to Use the Analyzer#
Step 1: Open the Analyzer#
- Go to anonym.legal and sign in
- Click "Analyzer" in the menu
- You will see a page with:
- A big text box (where you paste your text)
- Checkboxes to pick what to look for
- A language picker
- A slider for how careful the tool should be
- An "Analyze" button
What Are Entity Types?#
Entity types are the different kinds of personal information the tool can find. Think of them like categories.
People Information#
- PERSON: Names like "John Smith" or "Maria Garcia"
- EMAIL_ADDRESS: Email addresses like "john@email.com"
- PHONE_NUMBER: Phone numbers in any format
- DATE_TIME: Dates like "January 15, 2026" or times
- AGE: When someone's age is mentioned
- GENDER: Words that tell someone's gender
Money Information#
- CREDIT_CARD: Credit card numbers (with or without dashes)
- IBAN: Bank account numbers used in Europe
- US_SSN: Social Security numbers (used in the USA)
- TAX_ID: Tax identification numbers
Location Information#
- LOCATION: Addresses, cities, and countries
- IP_ADDRESS: Computer addresses (like 192.168.1.1)
Medical Information#
- MEDICAL_LICENSE: Doctor's license numbers
- MEDICAL_RECORD_NUMBER: Hospital patient numbers
Organization Information#
- ORGANIZATION: Company names like "Apple" or "Nike"
The tool can find 24+ different types of personal information!
You can also create your own custom types if you need to find something special.
Language Support#
The Analyzer works with 48 different languages! Here are some examples:
- English
- Spanish
- French
- German
- Chinese
- Japanese
- Arabic
- Hindi
- Russian
- Portuguese
- Italian
- And many more!
Good to know: The first time you use a new language, it might take a few extra seconds. After that, it will be fast.
What is the Confidence Threshold?#
The confidence threshold is like asking "How sure do you want the tool to be?"
Imagine you are looking for your friend named "Jordan" in a crowd. You might see someone who looks a little like Jordan, or someone who looks exactly like Jordan.
- Low threshold (0.3): "Find anyone who might be Jordan" - You will find more people, but some might not actually be Jordan
- Medium threshold (0.5): "Find people who probably are Jordan" - Good balance
- High threshold (0.8): "Only show me if you are really sure it is Jordan" - Fewer results, but more accurate
What Number Should You Use?#
- Start with 0.5 - This works well for most people
- Use 0.3-0.5 if you want to catch everything (even if some are wrong)
- Use 0.7-0.9 if you only want results the tool is very sure about
What Are Presets?#
Presets are like recipe cards. Instead of picking every setting yourself, you can use a preset that already has the right settings for common tasks.
Ready-Made Presets#
-
General PII Detection
- Good for everyday use
- Finds names, emails, phone numbers, addresses, credit cards
-
GDPR Compliance
- For European privacy rules
- Works with multiple languages
-
HIPAA Medical
- For hospital and doctor information
- Very careful and accurate
-
Financial Services
- For banks and money stuff
- Finds credit cards, bank accounts, Social Security numbers
-
Development & Testing
- For software developers
- Good for checking test data
-
Multi-Language European
- For documents in European languages
- Finds privacy information in multiple languages
How to Use a Preset#
- Click the "Preset" dropdown menu
- Pick a preset from the list
- All the settings are filled in automatically
- You can still change settings after if you want
How Tokens Work#
Using the Analyzer costs tokens. Think of tokens like arcade coins - you use them to play the game.
How Many Tokens Will I Use?#
It depends on:
- How much text you have
- How many entity types you picked
- How much personal information is found
Typical Token Costs#
| Amount of Text | Tokens Used |
|---|---|
| Short text (a paragraph) | 1-3 tokens |
| Medium text (a page) | 3-6 tokens |
| Long text (several pages) | 6-10 tokens |
The tool shows you an estimate before you click "Analyze" so you know what to expect.
Understanding Your Results#
After the tool analyzes your text, you will see:
- Highlighted text - The personal information is marked in different colors
- A results list showing:
- What type of information was found
- Where in the text it was found
- How sure the tool is (the confidence score)
- The actual text that was found
Example#
If your text says: "John Doe's email is john@example.com"
The results would look like this:
| Type | What Was Found | How Sure |
|---|---|---|
| PERSON | John Doe | 95% sure |
| EMAIL_ADDRESS | john@example.com | 98% sure |
Tips for Best Results#
- Start with a preset - It is the easiest way to begin
- Only pick what you need - Do not select every entity type. Pick only the ones you are looking for
- Test with a small sample first - Try a little bit of text before doing a lot
- Check your results - Always look at what the tool found to make sure it is correct
- Pick the right language - Make sure you select the language your text is written in
Fixing Problems#
Problem: Nothing Was Found#
Why this might happen:
- The confidence threshold is too high (the tool is being too picky)
- You picked the wrong language
- You did not select the right entity types
- Your text does not have the type of information you are looking for
What to try:
- Lower the confidence threshold slider
- Check that you picked the correct language
- Make sure you selected the right entity types
- Try different entity types
Problem: Too Many Wrong Results#
Why this might happen:
- The confidence threshold is too low (the tool is guessing too much)
- You selected too many entity types
What to try:
- Move the confidence threshold slider higher
- Only select the entity types you really need
Problem: The Tool is Slow#
Why this might happen:
- This is the first time using a new language (it needs to load)
- Your text is very long
- Slow internet connection
What to try:
- Wait a moment for the language to load (only slow the first time)
- Try smaller pieces of text
- Check your internet connection
Problem: Credit Cards Are Not Found#
Make sure:
- The CREDIT_CARD entity type is selected
- Try lowering the confidence threshold to 0.4
- The credit card number in your text is a real format
Credit cards can look like:
4532-1234-5678-9010(with dashes)4532 1234 5678 9010(with spaces)4532123456789010(just numbers)
More Help#
Check out these other guides:
- User Guide - USER_GUIDE.md - Everything about using the site
- Anonymizer Guide - ANONYMIZER_GUIDE.md - How to hide personal information
- Token System - TOKEN_SYSTEM.md - How tokens and payments work
- Quick Start - QUICK_START.md - Get started in 5 minutes
- FAQ - FAQ.md - Common questions and answers
Last Updated: January 30, 2026