String Character Class and Emoji Filter
Filter out emoji and other classes of characters from a string using a built in Regular Expression. This component can be used as a replacement to my original String Emoji Filter.
v1 - 17 September 2022 @takbb Brian Bates
v1.1 23 September 2022 - bug fix - "Connector Punctuation" was removing spaces
For examples of the different "Unicode Character Classes" see
https://www.compart.com/en/unicode/category
This uses a Java Snippet with a java regex replaceall call, and inbuilt regular expressions to filter characters from a string based on pre-defined classes.
Choose the class, or classes of characters to be filtered from ths list provided. The filter converts the selected class names into regex character classes and then removes these using a java snippet.
Additional character classes and/or regex patterns may be added over time. Please let me know if specific character classes don't appear to work.
This is based on a subset of the character classes described here: https://www.regular-expressions.info/unicode.html in the section "Unicode Categories"
The list of categories that are currently implemented, with their unicode equivalent are here. Please see regex documentation on the internet to describe those categories. As this component uses Java, it is the java implementation of these regex patterns that is being utilised.
Unassigned Characters \p{Cn}
Formatting Indicators \p{Cf}
Control Characters \p{C}\p{Cc}
Half of UTF-16 Surrogate pair \p{Cs}
Codepoints Reserved for Private Use \p{Co}
Punctuation - Other \p{Po}
Symbols \p{S}
Symbols - Emoji and Other \p{So}
Symbols - Currency \p{Sc}
Symbols - Modifiers \p{Sk}
Mathematical Symbols \p{Sm}
Letters \p{L}
Letters - Upper Case \p{Lu}
Letters - Lower case \p{Ll}
Numbers \p{N}
Character Marks \p{M}
Non Spacing Marks \p{Mn}
Enclosing Marks \p{Me}
Separators \p{Z}
Space Separator \p{Zs}
Line Separator \p{Zl}
Paragraph Separators \p{Zp}
Other Numbers (e.g. superscript digits) \p{No}
Punctuation \p{P}
Dash Punctuation \p{Pd}
Connector Punctuation \p{Pc}
Please contact @takbb on the forum if you have suggestions for improvements or additional useful filter-classes
- Type: TableData Table InThe data table to be filtered