`unicode/utf8`

Guided tour · Formatting & Strings · pkg.go.dev →

Work with UTF-8-encoded strings at the rune level: counts, encoding, decoding, validation.

Counting and validating

len(s) gives bytes, not runes. Use utf8 when you need the rune count.

s := "héllo"
fmt.Println(len(s))                          // 6 — bytes
fmt.Println(utf8.RuneCountInString(s))       // 5 — runes

Output

6
5

utf8.ValidString("hello")       // true
utf8.ValidString("a\xffb")      // false

Returns the rune and its byte width. Use for manual rune iteration; the range loop does this for you.

s := "héllo"
for i := 0; i < len(s); {
    r, w := utf8.DecodeRuneInString(s[i:])
    fmt.Printf("%c at %d (%d bytes)\n", r, i, w)
    i += w
}

Output

h at 0 (1 bytes)
é at 1 (2 bytes)
l at 3 (1 bytes)
l at 4 (1 bytes)
o at 5 (1 bytes)

buf := make([]byte, 4)
n := utf8.EncodeRune(buf, '🎉')
fmt.Printf("%d bytes: % x\n", n, buf[:n])

Output

4 bytes: f0 9f 8e 89

A plain for-range on a string already decodes UTF-8 — no utf8 package needed.

for i, r := range "héllo" {
    fmt.Printf("%d: %c\n", i, r)
}

Output

0: h
1: é
3: l
4: l
5: o