This workflow demonstrate how machine learning, specifically using the Logistic Regression, can be used to predict whether a patient has heart disease. The goal is to distinguish between patients with heart disease and those without, based on various health-related attributes.
Dataset attribute:
age: Age of the patient (in year)
sex: Sex of the patient (1 = male, 0 = female)
chest pain type (cp): Type of chest pain (4 values: 1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptomatic)
resting blood pressure (tresbps): Resting blood pressure in mm Hg
serum cholesterol (chol): Serum cholesterol in mg/dl
fasting blood sugar > 120 mg/dl (fbs): Fasting blood sugar level > 120 mg/dl (1 = true, 0 = false)
resting electrocardiographic results (restecg): Electrocardiographic results (values: 0 = normal, 1 = having ST-T wave abnormality, 2 = left ventricular hypertrophy)
maximum heart rate achieved (thalach): Maximum heart rate achieved during exercise
exercise induced angina (exang): Whether exercise induced angina (1 = yes, 0 = no)
oldpeak: ST depression induced by exercise relative to rest
slope of peak exercise ST segment (slope): Slope of the peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping)
number of major vessels: Number of major vessels colored by fluoroscopy (0-3)
thal: Thalassemia (0 = normal, 1 = fixed defect, 2 = reversible defect)
target (Heart Disease): Target variable indicating heart disease (1 = disease, 0 = no disease)