By · Last updated 2026-06-05

返回博客GDPR 与合规

日本个人番号:Verhoeff算法与APPI合规指南

63%的通用工具无法在日语文件中检测到个人番号(マイナンバー)。个人番号采用Verhoeff算法——亚洲最复杂的国家ID校验方案,本文详解技术合规要点。

June 5, 20268 分钟阅读
Japan PPCMy Number VerhoeffJapanese language NERAPPI complianceJapanese PII

日本个人番号:APPI与Verhoeff校验

日本个人信息保护委员会(PPC)2024年共发布45项执法决定,同年还发布了日本首份AI隐私指引。PPC研究发现,63%的通用NLP工具无法在日语文件中检测到个人番号(マイナンバー)。如果您的团队处理日本居民的数据,这一差距意味着直接的APPI合规风险。

个人番号是什么

日本为每位居民分配唯一的12位标识符,即个人番号,属于个人番号制度(マイナンバー制度)的组成部分,覆盖税务、养老金、医疗保险和灾害应对领域。根据APPI,该标识符属于敏感数据,收集或共享须具备法律依据。

Verhoeff校验难题

个人番号使用Verhoeff算法生成校验位。Verhoeff是一种数学方法,能够捕获所有单位数字错误,以及所有相邻两位数字互换的错误,需要三张查找表才能运行,无法手动计算,必须通过代码实现。

这一点至关重要,原因有二。其一,日本12位格式与众多其他编码相似——发票参考号、文件ID和日期字符串均共享相同格式,没有Verhoeff校验,工具会将错误的值标记为个人番号。其二,大多数工具不使用Verhoeff,而是使用更简单的模10或模11校验,这些方法在此无效。

PPC研究发现,63%的工具要么跳过校验,要么使用了更简单的方法——两种问题同时存在:误报和漏报均有发生。

信用卡使用的Luhn算法相对简单,个人番号不使用Luhn算法,以Luhn算法为基础构建的工具在此场景下无效。

三套书写系统,一个名字

日语文本同时使用三种书写系统,工具必须全部支持。

**平假名(ひらがな):**用于语法和本土词汇,46个基础字符。

**片假名(カタカナ):**用于外来词和外来名字,46个基础字符。在日本的外国人姓名以此书写。

**汉字(漢字):**用于名词和姓名,常用字约2000个。

同一个人的姓名可能以四种形式出现:汉字(田中太郎)、平假名(たなかたろう)、片假名(タナカ タロウ)和罗马字(Tanaka Taro)。工具必须匹配全部四种形式,否则会漏检该人的大部分记录。

其他需检测的日本ID

**驾照(運転免許証番号):**12位数字,前两位标识都道府县。东京为10,大阪为62。这使工具能够验证该值对该地区是否有效。

**护照(旅券番号):**两个字母加七位数字,ICAO格式,日本使用特定字母对。

**健康保险证(健康保険証記号番号):**符号加编号,格式因保险机构而异。国民健康保险(国民健康保険)和协会管掌健康保险(協会けんぽ)使用不同格式。

**在留卡(在留カード番号):**面向外国居民,格式为两个字母、八位数字、两个字母,由法务省签发。

APPI的匿名化要求

APPI规定了严格的匿名化数据标准,称为「匿名加工信息」(匿名加工情報)。在一个关键方面,它比GDPR要求更高:匿名化必须可由第三方验证在技术上不可逆

为达到合规要求,组织必须:

  1. 删除所有直接标识符,包括个人番号。
  2. 处理所有准标识符组合。
  3. 使用k-匿名性或类似方法。
  4. 公开发布所采取步骤的概述说明。
  5. 永远不尝试对数据进行再识别。

PPC 2024年AI指引新增了一项具体规定:如果使用匿名化数据训练AI,则不得将该模型用于再识别个人。这是对针对APPI训练集实施模型反演攻击的直接禁止。

满足PPC标准需要四项能力:对个人番号进行Verhoeff校验,使用ja_core_news进行日语NER(含正确分词),跨汉字、假名和罗马字的姓名匹配,以及驾照的都道府县代码核验。

印度的Aadhaar号码同样需要Verhoeff校验,详情请参阅印度DPDPA技术合规指南。关于跨国标识符检测,请参阅GDPR下的欧盟国家税务ID检测

参考来源

准备好保护您的数据了吗?

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.