Indoor Plants Studies

Indoor Plants Studies

May 2025

May 2025

Analyses

Analyses

Analyses

Clustering

Clustering

Clustering

Overview

This project explores and classifies plant growth conditions based on several environmental and cultivation variables. It is structured into two main stages:

  • Data treatment and feature engineering

  • Machine learning classification

The dataset was sourced from CSV files containing details on plant type, sunlight exposure, soil preferences, and watering frequency. The final goal is to predict the most suitable conditions for plant growth using classification models.

Problem Statement

How can we classify plant growth conditions based on features like sunlight, soil type, water needs, and environmental factors? This classification can aid gardeners and agriculture professionals in making data-driven decisions for optimal plant cultivation.

Dataset Description

  • Source: plants.csv and plants_model.csv

  • Key Features:

    • Sunlight exposure

    • Soil type

    • Watering frequency

    • Environmental context indicators

  • Type: Categorical and numeric attributes related to cultivation

Data Treatment

The data was cleaned and preprocessed using:

  • Handling of missing values

  • Standardization and formatting of categorical variables

  • Basic outlier detection

  • Encoding of string-based features into numeric types

A feature dictionary was also created to document the meaning of each column.

Feature Engineering

Derived new variables such as:

  • Combined experience indicators based on environmental compatibility

  • Scaled indexes to standardize watering and sunlight attributes

These new features were critical for improving model interpretability and performance.

Modeling and Classification

Multiple classification algorithms were tested:

  • Decision Tree

  • Random Forest

  • K-Nearest Neighbors

  • Logistic Regression

Clustering with KMeans

The KMeans algorithm was applied to identify plant condition groupings.

Process Highlights

  • Selection of number of clusters (k) based on the elbow method

  • Fitting the model to transformed features

  • Labeling data points according to cluster membership

Conclusions

  • The model successfully classifies plant condition categories with a high accuracy rate.

  • Feature engineering based on sunlight and soil played a key role in model performance.

  • The workflow is modular and easily expandable to other similar datasets.

Tools and Libraries Used

  • Jupyter Notebook

  • Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn

Visualizations