By · Last updated 2026-06-05

返回博客GDPR 与合规

匈牙利NAIH:TAJ社保号与税务ID的GDPR合规

匈牙利NER准确率仅67%,低于欧盟82%的平均水平——这是NAIH 2024年评估的结论。本文深入解析TAJ-szám加权校验算法及adóazonosító jel的检测盲区。

June 5, 20267 分钟阅读
Hungary NAIHTAJ-szám detectionHungarian NERHungarian GDPR complianceAI DPIA

匈牙利NAIH:TAJ-Szám与GDPR技术合规要求

2026年更新版

匈牙利数据保护机构是NAIH。其2024年报告显示,匈牙利语的NER准确率仅为67%,欧盟平均水平为82%。这一差距带来了真实的合规风险——以英语或德语为基础构建的工具会在识别匈牙利标识符时产生大量漏检。

匈牙利语NER准确率偏低的原因

匈牙利语有三个特点会让标准NLP模型失效。

**黏着语特征:**匈牙利语在词根后添加后缀,同一个名字在句子中会呈现多种形式。「Kovács Péter」作为主语,在其他语法角色下变为「Kovács Péternek」。NER模型必须将所有变体映射到同一个人。

**姓名顺序:**匈牙利语姓在前,名在后。大多数NLP模型假设名在前,这种顺序颠倒会导致漏检。

**特殊字符:**匈牙利语使用ő和ű,与德语变音符不同。Windows-1250与UTF-8之间的混合编码也会导致识别失败。

以上三个因素解释了NAIH 2024年报告中准确率差距的主要原因。

TAJ-Szám:匈牙利社会保障号

TAJ-szám(Társadalombiztosítási Azonosító Jel,社会保险标识符)是一个9位数字编号,出现在医疗、薪资、社会福利和养老金记录中。

**校验算法:**将第1至8位分别乘以权重3、7、3、7、3、7、3、7,累加结果,对10取模,即为校验位。

此算法为匈牙利独有,与其他国家使用的Luhn算法不同。

根据NAIH 2024年报告,通用工具对TAJ-szám的检测准确率仅为61%。9位格式与匈牙利文件中的许多其他数字相似,缺少校验步骤的工具会产生大量误报和漏报。

Adóazonosító Jel:匈牙利税务ID

adóazonosító jel是一个10位个人税务号码,首位始终为8,出现在就业记录、税务申报和财务文件中。

**校验算法:**取第2至9位,分别乘以权重9、7、3、1、9、7、3、1,累加结果,对10取模即为校验位。结果为0则校验位为0。

NAIH执法案例显示,当工具以其他语言为基准配置时,人力资源文件中的此类编号常遭漏检。

关于该类编号在欧盟各成员国之间的对比分析,请参阅欧盟国家税务ID指南

NAIH对AI系统的DPIA要求

NAIH 2024年指引要求,任何AI系统在处理个人数据前,必须完成数据保护影响评估(DPIA)。这一要求比GDPR的一般性测试更为严格。DPIA必须涵盖:

  1. 数据流 — 训练数据、输入和输出
  2. 法律依据 — 每项活动均需留档记录
  3. 语言准确性 — 低于欧盟平均水平的语言须单独说明
  4. 人工审查 — 须提供核查自动化决策的机制

每次系统重新训练时,DPIA须每年更新。

对于在匈牙利数据上部署AI工具的团队,顺序是固定的:先完成DPIA,再进行部署。

最低技术管控要求

三项管控措施构成NAIH合规的基准要求:

  1. TAJ-szám检测(含模10校验) — 仅靠模式匹配不够
  2. adóazonosító jel检测(含校验验证) — 人力资源和财务场景的关键
  3. 支持黏着语的匈牙利语NER — 必须处理ő、ű及各类编码变体

关于中欧数据保护机构如何设定技术要求的对比,请参阅德国BFDI指南。关于中欧地区类似语言差距问题,请参阅捷克ÚOOÚ指南

参考来源

准备好保护您的数据了吗?

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.