Split Your Dataset for Training, Validation and Testing: Using split-folders

When training machine learning models, especially for image classification, it’s important to split your dataset into training, validation, and test sets.
Instead of writing manual scripts to shuffle and copy files, you can use a simple yet powerful Python library called split-folders.
What is split-folders?
split-folders is a Python utility that automatically splits a folder of images (or any files) into multiple subfolders typically train, val, and test using a given ratio.
Example folder before splitting:
Dataset/
├── with_mask/
└── without_mask/
After splitting:
splitted/
├── train/
│ ├── with_mask/
│ └── without_mask/
├── val/
│ ├── with_mask/
│ └── without_mask/
└── test/
├── with_mask/
└── without_mask/
Installing split-folders
First, install it inside your virtual environment:
pip install split-folders
Methods to use split-folders for splitting dataset
Method 1: Python Script (Recommended)
You can use it directly inside a Python file.
For example, split_dataset.py:
import splitfolders
splitfolders.ratio(
"Dataset", # Input folder
output="splitted", # Output folder
seed=42, # For reproducibility
ratio=(0.7, 0.2, 0.1) # Train, Val, Test split
)
Then run:
python split_dataset.py
Why this is better:
Works inside Jupyter or scripts
Easy to adjust ratios and output names
Easier to reuse in automation or experiments
Method 2: Command Line Usage
If you prefer running it directly from the terminal, you can use:
splitfolders --output splitted --ratio .7 .2 .1 --seed 42 -- Dataset
Here:
--output splitted→ the destination folder--ratio .7 .2 .1→ 70% train, 20% val, 10% test--seed 42→ ensures repeatability-- Dataset→ path to your dataset
Also the code can be written as:
split_folders --output splitted --ratio .7 .1 .2 -- Dataset
You can also tweak ratios and output names like:
splitfolders --output checksplit --ratio .7 .1 .2 --seed 42 -- Dataset
CLI is great for quick one-time use
But not ideal if you want to reuse or track configurations later
PRO TIP:
Always fix a random seed (like seed=42) to ensure you get the same split every time.
Note:
split-folders does not modify or split annotation files (like JSON, CSV, or XML) automatically.
It only splits folders containing images, videos or data files organized by class names.
