- Type: Workflow Port ObjectFirst ModelThe currently deployed model captured with Integrated Deployment.
- Type: Workflow Port ObjectSecond ModelThe new model, also captured with the Integrated Deployment, to be compared with the original, and optionally deployed.
- Type: TableDeployment dataDeployment data with target/ground truth and timestamp: - A table of instances gathered after the deployment where the currently deployed model (first input) was previously applied. - The collected afterwards ground truth (also called target) column should also be available. - The timestamp needs to be already converted to either "Date&Time" Column Type or similar as long as it has a date and it is not String Column Type.
This Component generates a view by comparing the performance of two models captured by Integrated Deployment. The Component displays the performance of a new model starting from the date chosen by the user in the configuration dialogue, while the performance of the original model is displayed for all the dates in the input data. In a deployment scenario, this component compares the performance of the previous deployed model with the recently retrained model given the chosen evaluation metric. The view works for machine learning classifiers for binary as well as multiclass targets. The Component requires the deployment data with timestamps (dates) and target columns in order to showcase the performance over time. In the Interactive View generated, the performance metric is plotted with respect to the time axis, and further, a trend line is plotted based on this performance of each model. A “Deploy” button has been provided in the view. Based on the model performance the user can decide if deployment is necessary of the model provided in the second input port. This deployment decision is given at the output of the component via a flow variable. Connect the flow variable output to the workflow branch which deploys the model. Such a branch should execute only if the user checked the box in the view and applied its settings (Apply&Close lower right corner). CAPTURED MODEL REQUIREMENTS (Top and Middle Port) We recommend using the "AutoML" components with this component. All you need is connect the two components via the black integrated deployment port. You can also monitor customly trained models with this component. When providing models not trained by the “AutoML” components, you need to satisfy the below black box requirements: - The models should be captured with Integrated Deployment and have a single input and single output of type Data. - All features columns have to be provided at the model input. - Any other additional columns that are not features can be provided at the model input. - The model output should store all the model input data (features and non-features) and present attached the output predictions columns. - The model output predictions should be one String type and “n” Double type, where “n” is the number of classes in the target column. - The String type prediction column should be named “Prediction([T])” where [T] is the name of your target class (e.g. “Prediction (Churn)”). - The Double type prediction columns should be named “P ([T]=[C1])”, “P ([T]=[C2])”, …, “P (T=[Cn])”, where [Cn] is the name of the class that probability is predicting (e.g. “P (Churn=not churned)” and ”P (Churn=churned)” in the binary case). Additionally, if you are not using the AutoML component, you need to provide a flow variable called “target_column” of type String with the name of your ground truth/target column in the model ports of the “Model Monitor View (Compare)“ Component. INPUT DEPLOYMENT TABLE REQUIREMENTS (Bottom Port) - All features columns that were used in the training of the captured models - Availability of target column and timestamp column. Each record timestamp tracks the date in which the currently deployed model (first input) was applied on that data row. The timestamp should be of “Date&Time” column Types. “Time” and “String” types are not supported. Use the “String to Date&Time” node. The timestamp column should be uniformly distributed across the sample: time ranges in between dates where samples are missing should be somewhat constant.
- Type: Flow VariableScored deployment dataFlow variable to activate downstream workflow branch for deployment update.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.3.0
By using or downloading the component, you agree to our terms and conditions.