When working with external files or user input, it is often a good idea to remove invisible characters that can cause problems. These characters are “non-printable” - they do not occupy a space in printing and fall under the Other or Separator category in the Unicode standard. For example, non-printable are:
- Whitespaces (except the ASCII space character)
- Tabs
- Line breaks
- Carriage returns
- Control characters
To remove non-printable characters from a string in Go, you should iterate over the string and check if a given rune is printable using the unicode.IsPrint() function. If not, the rune should be ignored, otherwise it should be added to the new string.
Instead of iterating and manually creating a new string in the for loop, you can use the strings.Map(), which returns a copy of the string with all characters modified according to the mapping function. The best part is that the character is dropped if the mapping function returns a negative value for a given rune. So, we can return -1 for a non-printable character, and an unmodified rune if the unicode.IsPrint() returns true. See the following example:
| |
Output
b ehind
12
---
behind
6
The unicode.IsPrint() returns true for:
- letters
- marks
- numbers
- punctuation
- symbols
- the ASCII space character
There is also a function unicode.IsGraphic(), that works almost the same, except that it returns true for all space characters in the category Zs of the Unicode standard.
