Updated for 2026

Hungary's data authority is NAIH. Its 2024 report found that NER accuracy for Hungarian is only 67%. The EU average is 82%. That gap creates real risk. Tools built for English or German miss Hungarian identifiers at high rates.

Why Hungarian NER Scores Low

Three features of Hungarian break standard NLP models.

Agglutination: Hungarian adds suffixes to root words. The same name takes many forms in a sentence. "Kovács Péter" in subject position becomes "Kovács Péternek" in another role. NER models must link all those forms to one person.

Name order: Hungarian puts the family name first. Most NLP models expect given name first. That reversal causes missed detections.

Special characters: Hungarian uses ő and ű. These are not the same as German umlauts. Mixed encoding — Windows-1250 vs UTF-8 — also causes failures.

These three factors explain most of the accuracy gap in NAIH's 2024 report.

The TAJ-szám (Társadalombiztosítási Azonosító Jel) is a 9-digit number. It appears in healthcare, payroll, social benefits, and pension records.

Checksum: Multiply digits 1 to 8 by weights 3, 7, 3, 7, 3, 7, 3, 7. Add the results. Take modulo 10. That gives the check digit.

This algorithm is unique to Hungary. It is not the same as the Luhn algorithm used in other countries.

Generic tools detect TAJ-szám at only 61% accuracy, per the NAIH 2024 report. The 9-digit format looks like many other numbers in Hungarian documents. Without the checksum step, tools flag false positives and miss real ones.

Adóazonosító Jel: Hungary's Tax ID

The adóazonosító jel is a 10-digit personal tax number. The first digit is always 8. It appears in employment records, tax filings, and financial documents.

Checksum: Take digits 2 to 9. Multiply by weights 9, 7, 3, 1, 9, 7, 3, 1. Add the results. Take modulo 10. That is the check digit. A result of 0 means the check digit is 0.

NAIH enforcement cases show this number is often missed in HR documents when tools are set up for other languages.

See our EU national tax ID guide for how these numbers compare across member states.

NAIH's DPIA Requirement for AI Systems

NAIH's 2024 guidance requires a completed DPIA before any AI system processes personal data. This is more strict than the general GDPR test. The DPIA must cover:

Data flows — training data, inputs, and outputs
Legal basis — documented for each activity
Language accuracy — required for languages below the EU average
Human review — a way to check automated decisions

The DPIA must be updated each year when the system is retrained.

For teams deploying AI tools on Hungarian data, the order is fixed: DPIA first, then deployment.

Minimum Technical Controls

Three controls form the baseline for NAIH compliance:

TAJ-szám detection with modulo-10 checksum — pattern matching alone is not enough
Adóazonosító jel detection with checksum validation — critical for HR and finance
Hungarian NER with agglutination support — must handle ő, ű, and encoding variants

See our BFDI Germany guide to compare how Central European DPAs set technical requirements. For a similar language gap in Central Europe, see our Czech ÚOOÚ guide.

When This Approach Has Limits

Pairing Hungary's modulo-10 TAJ-szám checksum and the adóazonosító jel check with agglutination-aware NER is the right baseline — that part of the approach is sound. But limits remain worth stating plainly.

Multilingual detection accuracy varies, and Hungarian sits below the EU average. NAIH's own figures put Hungarian NER at 67 percent against an 82 percent EU average, and generic tools find the TAJ-szám at only 61 percent. Agglutination produces many suffixed forms of one name, family-name-first order confuses models that expect the reverse, and ő and ű behave differently across Windows-1250 and UTF-8. Each factor lowers recall, and the residual false-negative rate bounds whatever masking follows. The only honest number is one measured on held-out Hungarian documents, not a vendor default.

The TAJ-szám and adóazonosító jel formats need configuration and held-out testing. The unique modulo-10 weights for the TAJ-szám and the separate weighting and leading-8 rule for the adóazonosító jel are specific to Hungary and are not the Luhn algorithm used elsewhere. Both 9-digit and 10-digit formats look like ordinary reference numbers in HR and finance documents, so without the correct checksum a tool flags false positives and drops real ones. Configure each rule explicitly and confirm it against data you set aside, not against assumptions carried over from other countries.

The tool supports compliance but does not constitute it. NAIH requires a completed DPIA before any AI system processes personal data, updated each year on retraining, with documented data flows, legal basis, and a human-review path. A detector cannot produce that DPIA or stand in for the human review NAIH mandates over automated decisions. Accurate detection is one technical measure; the controller still owns the DPIA, the legal basis, and accountability for the whole posture the authority audits.

Sources

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.

Start Free Trial View Features

NAIH Hungary: TAJ-Szám and Adóazonosító Jel

Why Hungarian NER Scores Low

Adóazonosító Jel: Hungary's Tax ID

NAIH's DPIA Requirement for AI Systems

Minimum Technical Controls

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

NAIH Hungary: TAJ-Szám and Adóazonosító Jel

NAIH Hungary: TAJ-Szám and GDPR Technical Requirements

Why Hungarian NER Scores Low

TAJ-Szám: Hungary's Social Security Number

Adóazonosító Jel: Hungary's Tax ID

NAIH's DPIA Requirement for AI Systems

Minimum Technical Controls

When This Approach Has Limits

Sources

Related Articles

Self-Hosted PII Fails Compliance Audits

Presidio Misses 220+ GDPR Entities

Configuration Drift: A Hidden GDPR Risk

Ready to protect your data?

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow