To clear a string from all non-alphanumeric characters in Go, it is best to use a regular expression that matches non-alphanumeric characters, and replace them with an empty string. Such a regex
is defined as the negation of the set of allowed characters. For the English alphabet, it would be:
[^a-zA-Z0-9 ]+
In this regular expression, we use the Caret character after a square bracket [^
, which means that it matches any character other than a lowercase and uppercase letter in the range a-z
, a number, and a space character. “Not a letter, not a number, not a space” is our definition of a non-alphanumeric character.
However, when working with alphabets other than English, such a regular expression will not work properly. In that case, we can use the Unicode categories in the regex
and instead of manually defining the range of letters and numbers, we can use the Unicode category \p{L}
for letters and \p{N}
for numbers:
[^\p{L}\p{N} ]+
This regular expression matches any character that is not a Unicode letter and number or a space character.
To compare these regular expressions and the results of functions that remove non-alphanumeric characters, see the following two examples.
Remove all non-alphanumeric characters for English alphabet strings
This is a classic example of removing non-alphanumeric characters from a string. First, we compile our regular expression that matches any character other than an English letter, number, or space. Then, we use the Regexp.ReplaceAllString()
method to replace the matched non-alphanumeric characters with the empty string ""
. Look at the output and notice that this method removes both non-English letters (ـا, ą) and numbers (٦).
package main
import (
"fmt"
"regexp"
)
var nonAlphanumericRegex = regexp.MustCompile(`[^a-zA-Z0-9 ]+`)
func clearString(str string) string {
return nonAlphanumericRegex.ReplaceAllString(str, "")
}
func main() {
str := "Test@%String#321gosamples.dev ـا ą ٦"
fmt.Println(clearString(str))
}
Output:
TestString321gosamplesdev
Remove all non-alphanumeric characters for non-English alphabet strings
This example works like the previous one, but by using a regular expression with Unicode categories, we also accept letters and numbers from alphabets other than English, such as Arabic.
package main
import (
"fmt"
"regexp"
)
var nonAlphanumericRegex = regexp.MustCompile(`[^\p{L}\p{N} ]+`)
func clearString(str string) string {
return nonAlphanumericRegex.ReplaceAllString(str, "")
}
func main() {
str := "Test@%String#321gosamples.dev ـا ą ٦"
fmt.Println(clearString(str))
}
Output:
TestString321gosamplesdev ـا ą ٦