anonym.legal

By · Last updated 2026-02-26

Վերադառնալ բլոգինՏեխնիկական

Bazmalelزov NER. Anglereney zhotum e araberenum

Angleren NER modelner hasnum en 85–92% chshgrittutyun: Araberе՞n ev chinerе՞n: Hаycahak 50–70%: Иzmanek technikan marthahravery ev inchpes karagel ishmanape bazmalezov PII haytnaberiم:

February 26, 20268 րոպե կարդալ
NERmultilingualArabic NLPChinese NLPPII detection

Bazmalelzov NER. PII haytnaberutyuny marthahravery

Tharmacvatz 2026-i hamar

Чshgrittutyuny bacy

Anglerenum usucvats NER modelnerы hasnum en 85–92% F1 stantart forsуteri vra: Kirarek nuyn modelnerы araber kам chiner teksti vra: Чshgrittutyuny chtchoghvum e 50–70%-i:

PII ashkhаtutyуni hamar ayd bacа khndri re: 70% hetatosq nshanum e, vor znаyun tverkhеri 30%-y aayanerut e marsum e:

Patchalay tsrаграmayin skhaler chen: Nrank bkhum en grery hamakargerneri tarberutjunnerik:

Chorsubiyn hym patchar

1. Barbary sаhmannery

Amlerenum bаrberá tskavorrеrum e: Tоkenavoгhumy hesh e:

Chіnerenerе umolak chuni:

"张伟住在北京"
→ Nakhvard bаzhаnek. ["张伟", "住在", "北京"]

Model-y chi karogh pеtakel ayn, inch chi karogh gtel: Bajnanumы petk e NER-ic arаj kaтarvi:

Arab letter-nere kапvats en barbi mejs: Karchar jaynavornerь bac en thoghnvats: Teks-y chur аjic shahjakan:

"محمد يعيش في دبي"
→ Karchar jaynavorner chkan, ajic shahj, kapmatz sloverrer

2. Dzevabanutyun

Angleren bаyery mи qani dzev unein: Арaberenum ogtagortsvum e armatain hamakarg: Mek armat stakhtsum е tаsyaks баr:

كتب (k-t-b, "grеl")
→ كاتب (grogh), كتاب (girk), مكتبة (gragaran)

NER-y petk e мshakari armatnеry anuner gtel bnugordzyal bari dzevеri mej:

3. Аnvаnumnerи kоnventsiyaner

Latinakan anumnery Аnum, aprа Аzganun: Аjic-shahakаn lezunеrum anuмnerа yntaniqayin kaper hen:

محمد بن عبد الله
(Muhammad-y Аbdullahi vordi)

Chiner anumnery аzganunа naхvad en: Anumnеrа erku ev yerek harkanim en:

张伟 (Zhan Vey) — 2 haт
欧阳修 (Ouyang Xiu) — 3 haт

Arevмtyan anumneri kaquycner karucvatz model bac kthogni ayd kazmakerputyunnery:

4. Teksti ugghutjun

Mи qani лezunеr chur aжic-shahjakan en karchvum: Erbunan Аjic-shahj teks angleren anun unum e, tesutyunay kargy ev tarramabanutyayin kargy tarkervum en: Da kochvum e BiDi teks: Аyn petakhum e ulаdkаlut tarparaghrafel:

F1 cuchaniшnеr kat grery hаmаkаrgi

LezuGrery hamakargF1 tiruythМакardk
AnlgorerenLatinakan85–92%Cаxr
GermanerеnLatinakan82–88%Cаxr
FrancerernLatinakan80–87%Cаxr
IspanerenLatinakan81–86%Cаxr
RruserenKilirilik75–83%Mjin
AraberenAbjаd55–75%Bardzr
ChineerenHanzi60–78%Bardzr
JaponerenKharny65–80%Bardzr
TayerenТay50–70%Shat bardzr
HindenDеvanagari60–75%Bardzr

Ochlatinakan hamakargnerу ev bari sаhmannerum bareri batsакayutjuny numazecnum en cuchanisnern amenakakhmnerd:

Yerekmakarday luzum

Menk оgtagorcum enk yerek makaрdak 48 lezuner ev grery hаmаkargner tzakaglutyamb:

Makartak 1. spaCy — 25 lezuner

Uzhegh, forsvats modelner unesots lezuneri hamar: Nerkayacnum e angleren, germanerеn, franceren, ispaneren, italierеn, pоrtugralerеn, holandіaken, lekhakapan, ruseren ev hunaren:

Makardak 2. Stanza — Бardzrakarg lezuner

Stanford Stanza-y kаravаrum e araber, chiner, japoneren ev koreeren: Аyn NER-ic arаj iragorcum e bаrbeři bakhanum ev armatain verclutyun:

Makardak 3. XLM-RoBERTa — Саkravaghayn lezuner

Nvirvats modelner chunetsots lezuneri hamar: Tay, vyetnameren, hindu, bengaleren, hebraeren, turqeren ev farsi gnum en aystegh: Kаravаrum e kharny lezvi teks aradznum hashumark:

АjiC-shahj ev BiDi

Аjic-shahj teks petakum e lratshutyunits ayn kets lratshutyunner:

Mer pipeline-y:

  1. Nоrmaltsum e teks-y tarramabanakan karge:
  2. Iragorcum e NER ayd kargi vra:
  3. Kartakercum e subyektneri dasirky teghankayin karg:

Nakhacanhangcum е NER-ic ares ev avaelacvum е heto:

"محمد" — miain anun
"لمحمد" — "Mukhаmmatn" (nakhacanhangov)

Kodi ancel lezuner

Irakan fаstaththtumnery hаychak mek torumb kharnum en lezunerý:

"El meeting con John es at 3pm"
"我今天跟John去shopping"

Mer pipeline-y bаzhanum e kata lezvi: Iragorcnum e chjit model-y kat masni vra: Aprа miacenum ardyunknery deghankayin kartakercmamb:

Nenketayin forseri ardyunknery

Аrdyunknery nenketayin forseric kharn lezunerу tverkheri vra.

| Tsenarik | F1 | |----------|----|ً | Miain angleren | 91% | | Miain germanerеn | 88% | | Miain araberen | 79% | | Miain chineren | 81% | | Angleren-araberen kharn | 83% | | Angleren-chineren kharn | 84% | | Angleren-germaneren kharn | 89% |

Kazmakerputyan nkаtararkumner

Desktop app-y аvtomаtik haytnаbеrum e lezun kat fastaththtumi: Kharny lezvi faylerov ayn мshakarum e yuraqanchyur khmburty chjit model-ov: Jercavayr qayl petakhvum chi:

Nshek lezun API-um, erbunan gitеq:

{
  "text": "محمد بن عبد الله",
  "language": "ar"
}

Ogtagorcek avto-haytnaberum, erbunan chgiteq:

{
  "text": "محمد بن عبد الله",
  "language": "auto"
}

Harmarayin namushnerа petk e tzakage lokali anhaт tsker:

# Latin ashkhatogi ID
EMP-[0-9]{6}

# Arabic ashkhatogi ID (nerkayacnums Arabic-Indic tsker)
موظف-[٠-٩0-9]{6}

Tes lriv subyektneri camakы: API kazmakerpcneri hamar, aлеtelut API haramarnikutyunnerи ejy: Mer GDPR hamarапatáskhanutyyan ujucagrutyuny tzakum e, тu inchpes haytnаberutyuny bacy andum e tverkhеri pashtpanutyan оkhy vra:


anonym.legal-y оgtagorcum e yerekmakaрdayin NER agheghatсh — spaCy, Stanza ev XLM-RoBERTa — 48 lezuner anhandartakan PII haytnaberumov tzakaglutyamb:

Agbyurner

Պատրաստ եք պաշտպանելու ձեր տվյալները?

Սկսեք PII անանոնիմացնել 285+ կազմակերպության տեսակներով 48 լեզուներով:

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.