Generates for each row of a given input table a bit vector. The bit vectors are either generated from multiple numerical or string columns, a string column containing the bit positions to set, hexadecimal or binary strings or a collection column. In order to adjust the node settings please select first the source column object e.g. if the bit vector should be created from multiple string/numerical columns or from a single string/collection column. Depending on the selected option the corresponding dialog elements are enabled.
Bit vectors from multiple columns
In the case of multiple columns the bit positions in the resulting bit vector correspond to the column position in the input table. For example if the second and third column of a given input table is selected and the first column is omitted the bit vectors of each row will have length 2. The first bit of the bit vector is set if the value of the second column matches the selected criterion likewise the second bit of the bit vector is set if the value of the third column matches the selected criterion. The columns to consider when creating the bit vector can be specified in the multiple column selection section. Using the enforce exclusion/inclusion option the node can be configured to handle previously unknown columns. If the enforce exclusion option is selected all unknown columns are added automatically to the include list whereas if the enforce inclusion option is selected all unknown columns are added to the exclude list. The columns to include can be also defined by a pattern if the Wildcard/Regex Selection option is selected.Multiple string columns
The bit of a vector is set if the corresponding column value does match/does not match the specified pattern depending on the "Set bit if pattern does match/does not match" option. The pattern may contain wildcards such as '?' or '*' to match any one character or any sequence (including none) of characters. It can also be a complex regular expression .Multiple numeric columns
There are two options to determine if the bit is set for the value in the corresponding column or not:- either a global threshold is defined, then all values which are above or equal to the threshold are converted into set bits, all other bit positions remain 0, or
- a certain percentage of the mean of each column is used as a threshold, then all values which are above or equal to the percentage of the mean are converted into set bits. As an example let's say the mean percentage is set to 50% and the mean of col1 is 2 and the mean of col2 is 8. Then the corresponding bit for col1 is set if the value is above or equal to 1 and for col2 if the value is above or equal to 4.
Bit vectors from a single column
In the case of a single input column only the selected single column to be parsed is considered for the generation of the bit vectors.Single string column
In the case of a string input only the column containing the string is considered for the generation of the bit vectors. The string is parsed and converted into a bit vector. There are three valid input formats which can be parsed and converted:- Hexadecimal strings: strings consisting only of the characters 0-9 and A - F (where lower- or uppercase is not important). The represented hexadecimal number is converted into a binary number which is represented by the resulting bit vector.
- Binary strings: strings consisting only of 0s and 1s are parsed and converted into the according bit vectors.
- ID strings: strings consisting of numbers (separated by spaces) where the numbers refer to those positions in the bit vector which should be set. (Typical input format for association rule mining).