🧽 Remove non-alphanumeric characters from a string in Go

May 19, 2022 strings regex

Please consider supporting us by disabling your ad blocker 🙏

To clear a string from all non-alphanumeric characters in Go, it is best to use a regular expression that matches non-alphanumeric characters, and replace them with an empty string. Such a regex is defined as the negation of the set of allowed characters. For the English alphabet, it would be:

[^a-zA-Z0-9 ]+

In this regular expression, we use the Caret character after a square bracket [^, which means that it matches any character other than a lowercase and uppercase letter in the range a-z, a number, and a space character. “Not a letter, not a number, not a space” is our definition of a non-alphanumeric character.

However, when working with alphabets other than English, such a regular expression will not work properly. In that case, we can use the Unicode categories in the regex and instead of manually defining the range of letters and numbers, we can use the Unicode category \p{L} for letters and \p{N} for numbers:

[^\p{L}\p{N} ]+

This regular expression matches any character that is not a Unicode letter and number or a space character.

To compare these regular expressions and the results of functions that remove non-alphanumeric characters, see the following two examples.

Remove all non-alphanumeric characters for English alphabet strings

This is a classic example of removing non-alphanumeric characters from a string. First, we compile our regular expression that matches any character other than an English letter, number, or space. Then, we use the Regexp.ReplaceAllString() method to replace the matched non-alphanumeric characters with the empty string "". Look at the output and notice that this method removes both non-English letters (ـا, ą) and numbers (٦).

package main

import (
    "fmt"
    "regexp"
)

var nonAlphanumericRegex = regexp.MustCompile(`[^a-zA-Z0-9 ]+`)

func clearString(str string) string {
    return nonAlphanumericRegex.ReplaceAllString(str, "")
}

func main() {
    str := "Test@%String#321gosamples.dev ـا ą ٦"
    fmt.Println(clearString(str))
}

Output:

TestString321gosamplesdev

Remove all non-alphanumeric characters for non-English alphabet strings

This example works like the previous one, but by using a regular expression with Unicode categories, we also accept letters and numbers from alphabets other than English, such as Arabic.

package main

import (
    "fmt"
    "regexp"
)

var nonAlphanumericRegex = regexp.MustCompile(`[^\p{L}\p{N} ]+`)

func clearString(str string) string {
    return nonAlphanumericRegex.ReplaceAllString(str, "")
}

func main() {
    str := "Test@%String#321gosamples.dev ـا ą ٦"
    fmt.Println(clearString(str))
}

Output:

TestString321gosamplesdev ـا ą ٦

Thank you for being on our site 😊. If you like our tutorials and examples, please consider supporting us with a cup of coffee and we'll turn it into more great Go examples.

Have a great day!

👯 Remove duplicate spaces from a string in Go

Learn how to remove all redundant whitespaces from a string

introduction strings regex September 29, 2021

🐾 How to compare strings in Go

shorts introduction strings August 2, 2022

🔁 Repeat a string in Go

shorts strings July 20, 2022