Cookies management by TermsFeed Cookie Consent
Russia has invaded Ukraine and already killed tens of thousands of civilians, with many more raped or tortured. It's a genocide. We need your help. Let's fight back against the Russian regime.
Help Ukraine! Fight the Russian regime!

🧽 Remove non-alphanumeric characters from a string in Go

strings regex

Please consider supporting us by disabling your ad blocker

To clear a string from all non-alphanumeric characters in Go, it is best to use a regular expression that matches non-alphanumeric characters, and replace them with an empty string. Such a regex is defined as the negation of the set of allowed characters. For the English alphabet, it would be:

[^a-zA-Z0-9 ]+

In this regular expression, we use the Caret character after a square bracket [^, which means that it matches any character other than a lowercase and uppercase letter in the range a-z, a number, and a space character. “Not a letter, not a number, not a space” is our definition of a non-alphanumeric character.

However, when working with alphabets other than English, such a regular expression will not work properly. In that case, we can use the Unicode categories in the regex and instead of manually defining the range of letters and numbers, we can use the Unicode category \p{L} for letters and \p{N} for numbers:

[^\p{L}\p{N} ]+

This regular expression matches any character that is not a Unicode letter and number or a space character.

To compare these regular expressions and the results of functions that remove non-alphanumeric characters, see the following two examples.

Remove all non-alphanumeric characters for English alphabet strings

This is a classic example of removing non-alphanumeric characters from a string. First, we compile our regular expression that matches any character other than an English letter, number, or space. Then, we use the Regexp.ReplaceAllString() method to replace the matched non-alphanumeric characters with the empty string "". Look at the output and notice that this method removes both non-English letters (ـا, ą) and numbers (٦).

package main

import (
    "fmt"
    "regexp"
)

var nonAlphanumericRegex = regexp.MustCompile(`[^a-zA-Z0-9 ]+`)

func clearString(str string) string {
    return nonAlphanumericRegex.ReplaceAllString(str, "")
}

func main() {
    str := "Test@%String#321gosamples.dev ـا ą ٦"
    fmt.Println(clearString(str))
}

Output:

TestString321gosamplesdev

Remove all non-alphanumeric characters for non-English alphabet strings

This example works like the previous one, but by using a regular expression with Unicode categories, we also accept letters and numbers from alphabets other than English, such as Arabic.

package main

import (
    "fmt"
    "regexp"
)

var nonAlphanumericRegex = regexp.MustCompile(`[^\p{L}\p{N} ]+`)

func clearString(str string) string {
    return nonAlphanumericRegex.ReplaceAllString(str, "")
}

func main() {
    str := "Test@%String#321gosamples.dev ـا ą ٦"
    fmt.Println(clearString(str))
}

Output:

TestString321gosamplesdev ـا ą ٦

👯 Remove duplicate spaces from a string in Go

Learn how to remove all redundant whitespaces from a string
introduction strings regex

🥇 How to uppercase the first letter of each word in Go

shorts strings

🔠 String to uppercase in Go

shorts strings