Punycode is an encoding (defined in RFC 3492) that converts Unicode domain names — with accents, non-Latin scripts or emoji — into the plain ASCII that the DNS can handle. The result is a label prefixed with xn--. It is the machinery that makes internationalized domain names work without changing the DNS itself.
The internet’s naming system was born speaking only basic English letters. Punycode is the elegant workaround that lets the whole world use its own scripts in domain names — a translation layer hiding in plain sight behind those cryptic xn-- strings.
What is Punycode?
Punycode is a way of writing Unicode characters using only the characters a domain name is allowed to contain: the letters a–z, the digits 0–9, and the hyphen. It is a reversible encoding — software can turn a Unicode label into Punycode and back again with no loss — so a name like café.com can travel through the ASCII-only DNS and still display correctly to a human.
An encoding (RFC 3492) that represents Unicode using the restricted ASCII set allowed in domain names. Internationalized labels become an ASCII form prefixed with xn--.
Why does Punycode exist?
When the DNS was designed, host names were limited to a narrow ASCII set — the so-called LDH rule (Letters, Digits, Hyphen). That was fine for English but excluded most of the world’s languages. Rather than rebuild the global DNS to be Unicode-aware end to end, engineers chose a smarter path: keep the DNS exactly as it is, and encode non-ASCII names into the permitted characters at the edges. Punycode is that encoding, and the wider framework that uses it is called IDNA (Internationalized Domain Names for Applications).
How does it work?
The process happens automatically in your browser:
- You enter a Unicode domain, e.g.
münchen.de. - The browser splits it into labels and encodes each non-ASCII label with Punycode, producing an
xn--form such asxn--mnchen-3ya.de. - That ASCII form is what is actually sent to the DNS to be resolved.
- For display, the browser decodes it back to the friendly Unicode name.
Punycode cleverly separates the “basic” ASCII characters from the non-ASCII ones and appends compact information describing where the Unicode characters belong — which is why the encoded strings look scrambled but are perfectly deterministic.
Punycode examples
| Human-readable (Unicode) | Punycode (ASCII) form |
|---|---|
münchen.de | xn--mnchen-3ya.de |
café.fr | xn--caf-dma.fr |
| A name in non-Latin script | xn--… (always starts xn--) |
The constant is the xn-- prefix: whenever you see it at the start of a label, you are looking at the encoded form of an internationalized name. For the broader topic of non-Latin domains, see internationalized domain names.
The security catch: homograph attacks
Punycode enables a genuine risk worth knowing about. Many characters in other scripts look identical to Latin letters — a Cyrillic “а” can be visually indistinguishable from a Latin “a.” A bad actor can register a name that looks like a famous brand but is encoded from different underlying characters, then use it for phishing. This is a homograph (or homoglyph) attack.
How browsers protect you
Modern browsers detect suspicious mixed-script labels and display the raw xn-- Punycode instead of the lookalike Unicode, so a fake “аpple.com” shows its true encoded form. If a familiar site suddenly appears as xn--…, treat it with suspicion.
Why Punycode matters
Punycode is the unsung enabler of a multilingual internet: it is what lets billions of people use domains in their own scripts while the underlying DNS stays simple and unchanged. Recognizing the xn-- prefix also makes you a safer browser, because it is a visible flag that a name is internationalized — and occasionally, that something is trying to impersonate a site you trust.
★ Key takeaways
- Punycode (RFC 3492) encodes Unicode domains into DNS-safe ASCII.
- Encoded labels always start with
xn--. - It lets the world use non-Latin scripts without changing the DNS itself.
- It can enable homograph lookalike attacks — browsers show
xn--to warn you.
Frequently asked questions
What is Punycode?
Punycode is an encoding that represents Unicode characters — accents, non-Latin scripts, even emoji — using only the ASCII letters, digits and hyphen that the DNS allows. It is how a domain written in, say, Arabic or with an accented é is stored and transmitted as plain ASCII.
What does the xn-- prefix mean?
The xn-- prefix marks an ASCII-compatible encoding of a Unicode label — it tells software “the rest of this label is Punycode, decode it back to Unicode for display.” So xn-- at the start of a label always signals an internationalized name.
Why is Punycode needed?
The DNS was designed to handle only a limited ASCII set — letters a–z, digits 0–9 and the hyphen. To allow domains in other scripts without rebuilding the DNS, Punycode encodes those characters into that permitted set, so existing infrastructure keeps working unchanged.
Do I ever type Punycode myself?
Almost never. You type the human-readable Unicode name and your browser converts it to Punycode behind the scenes. You mainly see the raw xn-- form in tools, certificates or when a browser shows it for security reasons.
Is Punycode a security risk?
It can enable “homograph” tricks, where characters from other scripts look identical to Latin letters and are used to mimic real domains. Browsers defend against this by showing the xn-- form when a name mixes scripts suspiciously, so users can spot the deception.
Can I convert Punycode back to readable text?
Yes — Punycode is fully reversible. Any xn-- label can be decoded back to its original Unicode characters with no loss, which is exactly how your browser turns the encoded form into the friendly name it displays. Online converters and developer tools can do the same translation in either direction.
Sources & further reading
- RFC 3492 — Punycode (the encoding specification)
- RFC 5890 — Internationalized Domain Names for Applications (IDNA)
- ICANN — Internationalized Domain Names
- Related: internationalized domains, what is DNS, domain hacks