When working with external files or user input, it is often a good idea to remove invisible characters that can cause problems. These characters are “non-printable” - they do not occupy a space in printing and fall under the Other
or Separator
category in the Unicode
standard. For example, non-printable are:
- Whitespaces (except the ASCII space character)
- Tabs
- Line breaks
- Carriage returns
- Control characters
To remove non-printable characters from a string in Go, you should iterate over the string and check if a given rune is printable using the unicode.IsPrint()
function. If not, the rune should be ignored, otherwise it should be added to the new string.
Instead of iterating and manually creating a new string in the for
loop, you can use the strings.Map()
, which returns a copy of the string with all characters modified according to the mapping function. The best part is that the character is dropped if the mapping function returns a negative value for a given rune. So, we can return -1
for a non-printable character, and an unmodified rune if the unicode.IsPrint()
returns true
. See the following example:
|
|
Output
b eβhind
12
---
behind
6
The unicode.IsPrint()
returns true
for:
- letters
- marks
- numbers
- punctuation
- symbols
- the ASCII space character
There is also a function unicode.IsGraphic()
, that works almost the same, except that it returns true for all space characters in the category Zs
of the Unicode
standard.