Type | Name | |
---|---|---|
AB_NYC_2019.csv | ||
This dataset is useful for analyzing Airbnb listings in terms of pricing, location, host activity, and availability.
The dataset contains information about Airbnb listings with the following attributes:
1. Row ID: Unique identifier for each row. 2. ID: Unique identifier for each listing.
3. Name: Name of the listing. 4. Host ID: Unique identifier for the host. 5. Host Name: Name of the host. 6. Neighbourhood Group: Broad area or borough (e.g., Manhattan, Brooklyn). 7. Neighbourhood: Specific neighborhood within the borough. 8. Latitude: Latitude coordinate of the listing. 9. Longitude: Longitude coordinate of the listing. 10. Room Type: Type of room (e.g., Private room, Entire home/apt). 11. Price: Price per night.12. Minimum Nights: Minimum number of nights required for booking.13. Number of Reviews: Total number of reviews received.14. Last Review: Date of the most recent review.15. Reviews per Month: Average number of reviews per month.16. Calculated Host Listings Count: Total number of listings the host has.17. Availability 365: Number of days the listing is available in a year.
1. CSV Reader
- Description: This node reads data from a CSV file.
- Purpose: To import the dataset into the workflow for further processing.
2. Partitioning
- Description: This node splits the dataset into training and test sets.
- Purpose: To create separate datasets for training the model and evaluating its performance.
3. Random Forest Learner (Regression)
- Description: This node trains a Random Forest regression model using the training dataset.
- Purpose: To create a predictive model based on the training data.
4. Random Forest Predictor (Regression)
- Description: This node applies the trained Random Forest model to the test dataset to make predictions.
- Purpose: To generate predictions on the test data using the trained model.
5. Numeric Scorer
- Description: This node evaluates the performance of the regression model by comparing the predicted values to the actual values.
- Purpose: To assess the accuracy and performance of the model.
6. ROC Curve (legacy)
- Description: This node generates a Receiver Operating Characteristic (ROC) curve to visualize the performance of the model.
- Purpose: To provide a graphical representation of the model's performance, particularly in terms of true positive rate and false positive rate.
This workflow is designed for performing regression analysis using a Random Forest model, evaluating its performance, and visualizing the results. Connect with me at guharaysree@gmail.com if there's anything else you'd like me to add or modify to match with your niche and requirements!