NAIH ハンガリー: TAJ GDPR技術ガイド

ハンガリーのNAIHによるTAJ(社会保障番号)の検出・保護技術ガイド。

George CurtaMarch 6, 20267 分で読めます

Hungary NAIHTAJ-szám detectionHungarian NERHungarian GDPR complianceAI DPIA

NAIH ハンガリー：TAJ-Szám とGDPR技術要件

2026年版に更新済み

ハンガリーのデータ保護機関はNAIHです。2024年の報告書によると、ハンガリー語のNER精度はわずか67%です。EU平均は82%です。この差は実際のリスクを生みます。英語やドイツ語向けのツールはハンガリー語の識別子を見逃しやすいです。

ハンガリー語のNER精度が低い理由

標準的なNLPモデルを妨げる3つの特徴があります。

膠着語（アグルチネーション）： ハンガリー語は語根に接尾辞を追加して意味を表します。同じ名前が文中でさまざまな形をとります。「Kovács Péter」は主格ですが、「Kovács Péternek」は与格です。NERモデルはすべての形を1人の人物に結びつける必要があります。

名前の語順： ハンガリー語では苗字が先に来ます。多くのNLPモデルは名前が先に来ると想定しています。この逆順が検出漏れを引き起こします。

特殊文字： ハンガリー語はőとűを使用します。これらはドイツ語のウムラウトとは異なります。Windows-1250とUTF-8の混在エンコーディングもさらに失敗を引き起こします。

これら3つの要因が、NAIH 2024年報告書の精度差の大部分を説明しています。

TAJ-Szám：ハンガリーの社会保障番号

TAJ-szám（Társadalombiztosítási Azonosító Jel）は9桁の番号です。医療記録、給与明細、社会保障、年金口座に使用されます。

チェックデジット： 1〜8桁目に重み3、7、3、7、3、7、3、7を掛けます。合計を求めます。10で割った余りがチェックデジットです。

このアルゴリズムはハンガリー固有です。他国で使用されるLuhnアルゴリズムとは異なります。

NAIH 2024年報告書によると、汎用ツールのTAJ-szám検出精度はわずか61%です。9桁の形式はハンガリー語文書の多くの参照番号と見た目が似ています。チェックサムの手順なしでは誤検知が多くなります。

Adóazonosító Jel：ハンガリーの納税者番号

adóazonosító jelは10桁の個人納税者識別番号です。最初の桁は常に8です。雇用記録、税申告書、金融文書に使われます。

チェックデジット： 2〜9桁目を取ります。重み9、7、3、1、9、7、3、1を掛けます。合計を求めます。10で割った余りがチェックデジットです。余りが0の場合、チェックデジットは0です。

NAIHの執行事例では、他言語向けに設定されたツールを使った場合、この番号がHR文書で見逃されることが多いことが示されています。

加盟国間のこれらの番号の比較は、EU各国の納税者番号ガイドをご覧ください。

AIシステムに対するNAIHのDPIA要件

NAIH 2024年ガイダンスでは、個人データを処理するAIシステムのデプロイ前に完成したDPIAを義務付けています。これはGDPRの一般的なリスクベースのテストよりも厳格です。DPIAは以下を網羅する必要があります：

データフロー — 学習データ、入力、出力
法的根拠 — 各処理活動について文書化
言語精度 — EU平均を下回る言語に必要
人間によるレビュー — 自動化された意思決定を確認するメカニズム

DPIAはシステムの再学習時に毎年更新する必要があります。

ハンガリー語データにAIツールをデプロイするチームの場合、順序は固定です：まずDPIA、次にデプロイ。

最低限の技術的管理策

NAIH準拠の基準となる3つの管理策：

TAJ-száms検出（モジュロ10チェックサム） — パターンマッチングだけでは不十分
adóazonosító jel検出（チェックサム検証） — HRと財務文書に特に重要
ハンガリー語NER（アグルチネーション対応） — ő、ű、エンコーディングの変種を処理する必要あり

中央ヨーロッパのDPAが技術要件をどのように設定するかの比較はBFDI ドイツガイドをご覧ください。同様の言語ギャップについてはチェコÚOOÚガイドもご参照ください。

ソース

データを保護する準備はできましたか？

48言語で285以上のエンティティタイプを使用してPIIを匿名化し始めましょう。

無料トライアルを開始機能を見る

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

We follow these rules

GDPR (EU 2016/679).
ISO/IEC 27001:2022.
NIS2 (EU 2022/2555).
HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our company HQ is in Saarbrücken, Germany. Our servers run in Hetzner's Falkenstein datacenter.

Hetzner holds ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

We never sell your information to third parties.
We never train models on what you upload.
We never keep your work after you delete it.
We never share keys with any outside firm.
We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.

NAIH ハンガリー: TAJ GDPR技術ガイド

NAIH ハンガリー：TAJ-Szám とGDPR技術要件

ハンガリー語のNER精度が低い理由

TAJ-Szám：ハンガリーの社会保障番号

Adóazonosító Jel：ハンガリーの納税者番号

AIシステムに対するNAIHのDPIA要件

最低限の技術的管理策

ソース

関連する記事

セルフホストPII準拠監査: 環境一貫性

Presidio EU エンティティカバレッジギャップ GDPR

コンフィギュレーション・ドリフト: GDPR準拠リスク

データを保護する準備はできましたか？

NAIH ハンガリー: TAJ GDPR技術ガイド

NAIH ハンガリー：TAJ-Szám とGDPR技術要件

ハンガリー語のNER精度が低い理由

TAJ-Szám：ハンガリーの社会保障番号

Adóazonosító Jel：ハンガリーの納税者番号

AIシステムに対するNAIHのDPIA要件

最低限の技術的管理策

ソース

関連する記事

セルフホストPII準拠監査: 環境一貫性

Presidio EU エンティティカバレッジギャップ GDPR

コンフィギュレーション・ドリフト: GDPR準拠リスク

データを保護する準備はできましたか？

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow