Classify¶
What it does¶
Categorizes CSV files into subdirectories based on their header rows. This is useful for organizing large collections of CSV files that share similar structures.
Python usage¶
from csvsmith.tools.classify import CSVClassifier
classifier = CSVClassifier(
source_dir="raw_data",
dest_dir="organized_data",
auto=True # Automatically create categories based on headers
)
classifier.run()
CLI usage¶
csvsmith classify raw_data/ organized_data/ --auto
Behavior notes¶
- Modes:
strict: Matches headers exactly (including order).relaxed: Ignores column order and case.
Auto-categorization: If
--autois used, files with the same header structure are moved into the same generated subdirectory.Dry Run: Use
--dry-runto see which files would be moved without actually moving them.