Customer Distributions
The Customer Distributions node generates input data for a Market Simulation. It takes an optional Input Attributes List to create a set of Customer Distributions representing the Willingness To Pay (WTP) of Customers in the Market. Each row in the set of Output Customer Distributions corresponds to the part-worth value of a Feature, or the WTP of a Product, for a Virtual Customer.
The Input Attributes List can define the Distribution Type and Input Parameters of each Output Customer Distribution. If the Input Attributes List does not define the Output Customer Distribution, then the Input Parameters from the Configuration Dialog are used. Unlike the similar Matrix Distributions node, the Output Customer Distributions from this node will not be correlated.
For example, if the user wishes to create a Normal (Gaussian) Customer Distribution, then the Mean and Standard Deviation (SD) is set according to either the Configuration Dialog, or overridden by the 'A' column (corresponding to the Mean) and the 'B' column (corresponding to the SD) in the Input Attribute List.
Or for example, if the user wishes to create a Uniform Customer Distribution, then the Minimum Value and the Maximum Value is again set according to either the Configuration Dialog, or overridden by the 'A' column (now corresponding to the Minimum Value) and the 'B' column (now corresponding to the Maximum Value) in the Input Attribute List.
The Output Customer Distributions from this Customer Distributions node can become part of a Customer Willingness To Pay Matrix (WTP Matrix) for a set of Products. The Input WTP Matrix can feed a downstream Market Simulation node or a Market Tuning node.
The Input Attribute List is optional. Missing values will be replaced by the defaults in the Configuration Dialog. If no input table is provided, then the Customer Distributions node will generate a single Customer Distribution with a Distribution Type and Input Parameters set according to the Configuration Dialog.
The available list of Distribution Types for the user to select from includes:
Normal (Gaussian): (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Normal (Gaussian) Distribution. The part-worth values can be drawn randomly or can have evenly changing gaps within a Normal Distribution of a given Mean and Standard Deviation (SD). The output values can be truncated by the Minimum and Maximum limits (if enabled). The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:
- Mean (A): Any floating-point (double) value
- Standard Deviation (B): Any value greater than > 0.0
Linear: (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Uniform (Linear) Distribution. The part-worth values can be drawn randomly or can be evenly spaced between the Starting Value and the Ending Value, optionally truncated by Minimum and Maximum limits. The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:
- Starting Value (A): Any floating-point (double) value (inclusive)
- Ending Value (B): Any floating-point (double) value (inclusive)
Asymptote End: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines but then rounds off and hugs the end value 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Asymptote Start: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the curve initially hugs the Start value and then steeply declines towards 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Beta: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Beta Distribution with a user-specified Alpha and Beta:
- Alpha (A): Any value greater than > 0.0
- Beta (B): Any value greater than > 0.0
Binomial: (Wikipedia) Generates a set of random integer part-worth values for each Virtual Customer in the shape of a Binomial Distribution with a user-specified Number of Trials and Probability of Success. Note that the Bernoulli distribution is a special case of the binomial distribution where just a single trial is conducted (Trials = 1). Configuration parameters include:
- Trials (A): Number of Trials is any integer value greater than > 0.0
- Probability (B): Probability of Success is any value between 0.0 and 1.0 (exclusive)
Cauchy: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Cauchy Distribution with a user-specified Median and Scale:
- Median (A): Any floating-point (double) value
- Scale (B): Any value greater than > 0.0
Chi-Square: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Chi-Square Distribution with a user-specified 'Degrees of Freedom'. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:
- Degrees of Freedom (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
Exponential: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an Exponential Distribution with a user-specified Mean. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:
- Mean (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
F: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an F Distribution with a user-specified 'Degrees of Freedom Numerator' and 'Degrees of Freedom Denominator':
- Degrees of Freedom Numerator (A): Any value greater than > 0.0
- Degrees of Freedom Denominator (B): Any value greater than > 0.0
Gamma: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Gamma Distribution with a user-specified Shape and Scale:
- Shape (A): Any value greater than > 0.0
- Scale (B): Any value greater than > 0.0
Inverse Gaussian: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Inverse Gaussian Distribution with a user-specified Mu and Lambda. As Lambda tends to infinity, the Inverse Gaussian distribution becomes more like a Normal (Gaussian) distribution:
- Mu (A): The Mean having any value greater than > 0.0
- Lambda (B): The Shape Parameter having any value greater than > 0.0
Poisson: (Wikipedia) The Poisson Distribution can be used for modeling the number of times an event occurs in an interval of time or space. Generates a set of random part-worth values for each Virtual Customer in the shape of a Poisson Distribution with a user-specified Probability and Entropy:
- Lambda (A): The Poisson Mean having any value greater than > 0.0
- Entropy (B): The Convergence criterion for cumulative probabilities (set to 0.0 by default)
Quadratic: (Wikipedia) The Quadratic Distribution starts at the y-intersect, decreases (or increases) to touch the x-intersect once, then increases (or decreases) again. The Distribution follows the equation [y = a ( x^2 - b )] with only one x-intersection occurring at the minimum (or maximum) of the y-value. The Quadratic Distribution can be used to model the 'Cost To Make' (CTM) a Product where the Marginal Cost initially falls with increased production, but then starts to increase again as resources become scarce and operational inefficiencies are magnified. As the minimum value is fixed at 0.0 it may be necessary to shift the values in this Distribution before using it in a Market Simulation model.
- X-Intersection (A): The CustomerID row in the Output Distribution where the curve touches the X-Axis once (the X-Intersection cannot equal = 0.0)
- Y-Intersection (B): The starting value of the Output Distribution where the curve intersects the Y-Axis (the Y-Intersection cannot equal = 0.0)
Sawtooth: (Wikipedia) The Sawtooth wave distribution looks like the teeth of a plain-toothed saw. The raw (unsorted) Distribution starts at zero and ramps upwards towards the Distribution's Amplitude. It reaches the Amplitude after the Distribution's Period, then drops to zero and starts again. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Sigmoid: (Wikipedia) Has the characteristic horizontal 'S-shaped' curve and is part of the family of Logistic Functions of the form [a / ( 1 + EXP(-b * (row - Customers/2) )]. The values selected from this function will be between the Start value and 0.0 zero such that the beginning of the curve hugs the start value, then steepens, then the end of the curve hugs the end value 0.0 zero. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Simple Bimodal: (Wikipedia) Generates a simple Bimodal Distribution (a 'two-humped' Customer Distribution) from two Normal (Gaussian) Distributions. The user specifies the 'First Mean' and the 'Second Mean' with the Standard Deviation (SD) automatically calculated to be a quarter of the distance between the two Means. The user specifies:
- First Mean (A): Half of the Virtual Customers will be distributed around the 'First Mean'
- Second Mean (B): Half of the Virtual Customers will be distributed around the 'Second Mean'. The 'First Mean' cannot equal the 'Second Mean'.
Sinusoidal: (Wikipedia) The smooth periodic oscillation generated from the sine function rising and falling between 0.0 and the Amplitude. The raw (unsorted) Distribution starts rising at half-Amplitude and reaches the Amplitude after a quarter-Period. It then curves downward and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Spike: (Wikipedia) Is a vertical 'S-shaped' curve that looks similar to a rotated Sigmoid function but is generated from a pair of Exponential Functions of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines, then rounds off, but then steeply declines again towards the end value 0.0 zero. Note that a sorted Normal Distribution will also generate a similar looking vertical S-shaped curve. Configuration parameters include:
- Start (A): Any value greater than > 0.0
- Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.
Square: (Wikipedia) The Square wave distribution alternates at a steady frequency between the Amplitude and 0.0 zero. The raw (unsorted) Distribution starts at the Amplitude and drops to zero after a half-Period. After the Distribution's Period, the wave is reset to its Amplitude and starts again. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
T: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a T Distribution with a user-specified Degrees of Freedom. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:
- Degrees of Freedom (A): Any value greater than > 0.0
- Then Add Fixed Value (B): Any floating-point value added after the random value is calculated
Triangle: (Wikipedia) The Triangle wave distribution raises and falls linearly between 0.0 and the Amplitude. The raw (unsorted) Distribution climbs steadily from half-Amplitude and reaches the Amplitude after a quarter-Period. It then falls steadily and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:
- Amplitude (A): The maximum height of the wave
- Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself
Weibull: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Weibull Distribution with a user-specified Shape and Scale:
- Shape (A): Any value greater than > 0.0
- Scale (B): Any value greater than > 0.0
Note: technical details concerning how the data generation is performed can be found by referring to the Apache Commons Math Library.
More Help: Examples and sample workflows can be found at the Scientific Strategy website: www.scientificstrategy.com.
Input Ports
- Type: Data Input Attribute List: (optional) The set of additional Products, Features, or other Attributes to add to the Output Customer Distributions Matrix. The Input Attribute List should include the following columns:
- Product (string): (optional) Unique Product Name or Product ID. The Products listed in this column can be added to the Output Customer Distributions table if the user selects this as the 'Attribute to Customer Distribution Column' in the Configuration Dialog.
- Feature (string): (optional) Name of the Feature associated with the Product. The Features listed in this column can be added to the Output Customer Distributions table if the user selects this as the 'Attribute to Customer Distribution Column' in the Configuration Dialog. If the user wishes to add Customer Distributions named using a [Product].[Feature] format then this column will need to be manually added by the user upstream of the Input Attribute List.
- Type (double): (optional) The Distribution Type and the shape of the generated part-worth values in the Customer Distribution for the Attribute. Any Distribution Type listed in the Configuration Dialog can be used, including "Normal", "Uniform", "Exponential", and "Simple Bimodal". If this 'Type' is missing then the default 'Default Distribution Type' (initially 'Normal' Distribution) from the Configuration Dialog will be used instead.
- A (double): (optional) The 'Input Parameter A' of the part-worth values to generate in the Customer Distribution for the Product, Feature, or Attribute. For a Normal Distribution, this 'A' value represents the Mean. If this 'A' value is missing then the default 'A' value from the Configuration Dialog will be used instead.
- B (double): (optional) The 'Input Parameter B' of the part-worth values to generate in the Customer Distribution for the Product Feature. For a Normal Distribution, this 'B' value represents the Standard Deviation (SD). If this 'B' value is missing then the default 'B' value from the Configuration Dialog will be used instead.
- Maximum (double): (optional) The ceiling Maximum of the part-worth values generated for the Product Feature Customer Distribution. If this 'Maximum' value is missing then the default 'Maximum' value from the Configuration Dialog, if enabled, will be used instead. Otherwise the part-worth values in the Customer Distribution will not be limited to a Maximum value.
- Minimum (double): (optional) The floor Minimum of the part-worth values generated for the Product Feature Customer Distribution. If this 'Minimum' value is missing then the default 'Minimum' value from the Configuration Dialog, if enabled, will be used instead. Otherwise the part-worth values in the Customer Distribution will not be limited to a Minimum value.
- Sort (string): (optional) The Sort order of each Output Customer Distribution can be set to either 'Random', 'Ascending', 'Descending', or 'None'. If this 'Sort' order is missing then the default 'Sort' order from the Configuration Dialog (initially set to 'None') will be used instead.
- Smooth (boolean): (optional) It is sometimes possible for the data points within a generated Customer Distribution to be smoothly distributed with an evenly changing Step Size. For example, when a 'Linear Distribution' is set to 'Smooth' the Step Size between data points is fixed. If this 'Smooth' column is missing then the default 'Smooth' CheckBox selection from the Configuration Dialog (initially unchecked for 'Randomly Distributed Data Points') will be used instead.
- Price (double): (optional) Price of the Product. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node.
- Cost (double): (optional) Cost of the Product or Feature. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node. The Cost cannot be negative.
- Quantity (integer): (optional) Quantity Sold of the Product. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node. The Input Quantity Sold would typically be compared against the Output Quantity Sold predicted by a Market Simulation node for testing and tuning.
- Type: Data Input Customer Distributions (double): (optional) A set of upstream Customer Distributions that will be appended before the newly generated Output Customer Distributions. If this optional table has been connected, then the user-defined 'Number of Customers' option in the Configuration Dialog will be ignored and the same number rows as the Input Customer Distributions will be generated. All duplicate Customer Distribution column names will be replaced by the newly generated Customer Distributions but with the data sorted in the same order as the Distribution being replaced.
- Distribution01, Distribution02, etc (double): The set of upstream Customer Distributions to be appended to the newly generated Output Customer Distributions. If this 'Customer Distributions' node is going to replace an upstream Customer Distributions then the newly generated data will first be sorted into the same order as the original Customer Distribution being replaced (unless forced to be sorted in either Ascending, Descending, or Random order).
Output Ports
- Type: Data Output Attribute List: The set of Products, Features, or other Attributes added to the Output Customer Distributions Matrix. These Attributes are directly passed-through from the Input Attribute List as a convenience to downstream nodes. For example, the Input Attribute List can include details about the 'Price' of Products or 'Cost' of Features. In addition, the Output Attribute List will contain these columns:
- Attribute: The unique Product, Feature, or Attribute Name with a matching column in the Output Customer Distributions Matrix.
- Type: The Distribution Type and the shape of the generated part-worth values in the Customer Distribution for the Attribute.
- Mean: The Mean of the part-worth values in the Output Customer Distribution Matrix for the Product, Feature, or Attribute. The Mean is calculated after the Distribution Type is generated. In general, the relative difference of the Means between related Attributes reflects the primary degree of Vertical Differentiation between each - particularly between Normal Distributions.
- SD: The Standard Deviation (SD) of the part-worth values in the Output Customer Distribution Matrix for the Product Attribute. The SD is calculated after the Distribution Type is generated. A Product lacking Vertical Differentiation (that is, having a low Mean) can still attract Customers if it has a relatively high SD, or if it has Horizontal Differentiation (that is, its Customer Distribution is uncorrelated) relative to other Products.
- Type: Data Output Customer Distributions (double): The set of Customer Distributions for each unique Attribute found in the Input Attribute List, or just a single Customer Distribution if no upstream Input Attribute List has been connected. The total number of Virtual Available Customers is equal to the number of rows in the Output Customer Distributions Matrix.
Extension
This node is part of the extension
Market Simulation nodes by Scientific Strategy for KNIME - Community Edition
v4.0.0Short Link
Drag node into KNIME Analytics Platform