
Overview
This project explores and classifies plant growth conditions based on several environmental and cultivation variables. It is structured into two main stages:
Data treatment and feature engineering
Machine learning classification
The dataset was sourced from CSV files containing details on plant type, sunlight exposure, soil preferences, and watering frequency. The final goal is to predict the most suitable conditions for plant growth using classification models.
Problem Statement
How can we classify plant growth conditions based on features like sunlight, soil type, water needs, and environmental factors? This classification can aid gardeners and agriculture professionals in making data-driven decisions for optimal plant cultivation.
Dataset Description
Source:
plants.csv
andplants_model.csv
Key Features:
Sunlight exposure
Soil type
Watering frequency
Environmental context indicators
Type: Categorical and numeric attributes related to cultivation
Data Treatment
The data was cleaned and preprocessed using:
Handling of missing values
Standardization and formatting of categorical variables
Basic outlier detection
Encoding of string-based features into numeric types
A feature dictionary was also created to document the meaning of each column.
Feature Engineering
Derived new variables such as:
Combined experience indicators based on environmental compatibility
Scaled indexes to standardize watering and sunlight attributes
These new features were critical for improving model interpretability and performance.
Modeling and Classification
Multiple classification algorithms were tested:
Decision Tree
Random Forest
K-Nearest Neighbors
Logistic Regression
Clustering with KMeans
The KMeans algorithm was applied to identify plant condition groupings.
Process Highlights
Selection of number of clusters (k) based on the elbow method
Fitting the model to transformed features
Labeling data points according to cluster membership
Conclusions
The model successfully classifies plant condition categories with a high accuracy rate.
Feature engineering based on sunlight and soil played a key role in model performance.
The workflow is modular and easily expandable to other similar datasets.
Tools and Libraries Used
Jupyter Notebook
Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn