By · Last updated 2026-06-05

Rudi kwa BlogGDPR & Ufuatiliaji

My Number ya Japani: Verhoeff na APPI

63% ya zana za kawaida zinashindwa kutambua My Number katika hati za Kijapani. My Number hutumia algorithm ya Verhoeff — checksum ngumu zaidi ya kitambulisho cha kitaifa barani Asia.

June 5, 20268 dakika kusoma
Japan PPCMy Number VerhoeffJapanese language NERAPPI complianceJapanese PII

My Number ya Japani: APPI na Ukaguzi wa Verhoeff

Tume ya Ulinzi wa Taarifa za Kibinafsi ya Japani (PPC) ilitoa maamuzi 45 ya utekelezaji mwaka 2024. Pia ilichapisha mwongozo wa kwanza wa faragha wa AI wa Japani. Utafiti wa PPC uligundua kwamba 63% ya zana za kawaida za NLP zinashindwa kutambua My Number (マイナンバー) katika faili za Kijapani. Kama timu yako inashughulikia data ya wakazi wa Japani, pengo hilo linamaanisha hatari ya moja kwa moja ya APPI.

My Number ni Nini

Japani inawapa kila mkazi kitambulisho cha kipekee cha tarakimu 12. Hii ni My Number, sehemu ya Mfumo wa Nambari ya Mtu Binafsi (マイナンバー制度). Inashughulikia kodi, pensheni, bima ya afya, na majibu ya maafa. Kitambulisho hiki ni data nyeti chini ya APPI. Unahitaji sababu ya kisheria kukusanya au kushiriki.

Tatizo la Ukaguzi wa Verhoeff

My Number hutumia algorithm ya Verhoeff kwa tarakimu yake ya ukaguzi. Verhoeff ni mbinu ya kihisabati inayonasa makosa yote ya tarakimu moja. Pia inakasa makosa yote ambapo tarakimu mbili zinazopakana zinabadilishana. Inahitaji jedwali tatu za kuangalia kufanya kazi. Haiwezekani kuihesabu kwa mkono. Inahitaji msimbo.

Hili ni muhimu kwa sababu mbili. Kwanza, muundo wa tarakimu 12 wa Japani unafanana na misimbo mingine mingi. Marejeleo ya ankara, vitambulisho vya hati, na mfululizo wa tarehe vyote vinashiriki muundo huo huo. Bila ukaguzi wa Verhoeff, zana itapiga bendera maadili yasiyo sahihi. Pili, zana nyingi hazitumii Verhoeff. Zinatumia ukaguzi rahisi zaidi wa modulo-10 au modulo-11. Hizo hazifanyi kazi hapa.

Utafiti wa PPC uligundua kwamba 63% ya zana zinaacha ukaguzi au zinatumia mbinu rahisi zaidi. Matatizo yote mawili yanatokea kwa wakati mmoja: positivi za uongo na negativi za uongo.

Algorithm ya Luhn, inayotumika kwa kadi za mkopo, ni rahisi zaidi. My Number haitumii Luhn. Zana zilizojengwa kwa Luhn hazitafanya kazi.

Hati Tatu, Jina Moja

Matini ya Kijapani hutumia mifumo mitatu ya uandishi kwa wakati mmoja. Zana lazima ishughulikie mitatu yote.

Hiragana (ひらがな): Inatumika kwa sarufi na maneno ya asili. Herufi 46 za msingi.

Katakana (カタカナ): Inatumika kwa maneno ya kigeni na majina. Herufi 46 za msingi. Majina ya kigeni nchini Japani yanaonekana kwa hati hii.

Kanji (漢字): Alama kwa nomino na majina. Karibu 2,000 zinatumika kwa kawaida.

Jina la mtu mmoja linaweza kuonekana katika maumbo manne: Kanji (田中太郎), Hiragana (たなかたろう), Katakana (タナカ タロウ), na Romaji (Tanaka Taro). Zana lazima ioanishe maumbo yote manne. Kama inakosa moja, inakosa rekodi nyingi za mtu huyo.

Vitambulisho Vingine vya Kijapani vya Kutambua

Leseni ya udereva (運転免許証番号): Tarakimu 12. Tarakimu mbili za kwanza zinaonyesha mkoa. Tokyo ni 10. Osaka ni 62. Hii inaruhusu zana kukagua kama thamani ni sahihi kwa mkoa huo.

Pasipoti (旅券番号): Herufi mbili pamoja na tarakimu saba. Muundo wa ICAO. Japani hutumia jozi mahususi za herufi.

Kadi ya bima ya afya (健康保険証記号番号): Alama pamoja na nambari. Muundo unategemea mtoa bima. Bima ya Afya ya Kitaifa (国民健康保険) na Bima Inayosimamiwa na Jamii (協会けんぽ) zinatumia muundo tofauti.

Kadi ya makazi (在留カード番号): Kwa wakazi wa kigeni. Herufi mbili, tarakimu nane, herufi mbili. Wizara ya Haki inatoa kadi hii.

Kanuni ya Kutokujulikana ya APPI

APPI ina kiwango kikali cha data isiyojulikana kinachoitwa taarifa isiyojulikana (匿名加工情報). Inaenda zaidi ya GDPR katika eneo moja kuu. Kutokujulikana lazima kuthibitishwe na mtu wa tatu na kisiothibitishwe kiufundi.

Kutimiza, shirika lazima:

  1. Ondoa vitambulisho vyote vya moja kwa moja, ikiwa ni pamoja na My Number.
  2. Shughulikia mchanganyiko wote wa kiasi-kitambulisho.
  3. Tumia k-kutokujulikana au mbinu inayofanana.
  4. Chapisha maelezo ya jumla ya hatua zilizochukuliwa.
  5. Kamwe usijaribu kutambua upya data.

Mwongozo wa AI wa PPC wa 2024 unaongeza kanuni mahususi. Kama unafunza AI kwenye data isiyojulikana, huwezi kutumia mfano huo kutambua watu upya. Hii ni marufuku ya moja kwa moja ya mashambulizi ya ugeuzaji wa mfano dhidi ya seti za mafunzo za APPI.

Kukidhi viwango vya PPC, unahitaji mambo manne. Kwanza, uthibitishaji wa Verhoeff kwa utambuzi wa My Number. Pili, NER ya Kijapani kwa kutumia ja_core_news na uchanganuzi wa maneno unaofaa. Tatu, uoanishaji wa majina katika Kanji, Kana, na Romaji. Nne, ukaguzi wa msimbo wa mkoa kwa leseni za udereva.

India hutumia Aadhaar, ambayo pia inahitaji uthibitishaji wa Verhoeff. Mwongozo wa kiufundi wa uzingatiaji wa DPDPA wa India unafunika hilo kwa kina. Kwa utambuzi wa kitambulisho cha nchi nyingi, angalia utambuzi wa kitambulisho cha kodi cha kitaifa cha EU chini ya GDPR.

Vyanzo

Tayari kulinda data yako?

Anza kuanonymisha PII na aina 285+ za vitu katika lugha 48.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.