csvsmith documentation¶
csvsmith is a toolkit for cleaning, transforming, and managing CSV-oriented workflows.
It provides both a command-line interface (CLI) and a Python API for building reproducible data processing pipelines.
Example¶
csvsmith excel-to-csv input.xlsx -o output.csv
csvsmith dedupe data.csv -o clean.csv
csvsmith clean-currency-numeric '$1,234.56'
Getting started¶
Tools¶
Python API¶
- Python API
- csvsmith package
CSVClassifierDropRowsBySubstringRelationResultStringDistanceadd_row_digest()analyze_pair()clean_numeric()count_duplicates_sorted()dedupe_with_report()excel_to_csv()find_duplicate_rows()find_matches_in_csv()move_by_suffix()read_csv_rows()save_csv()strict_concat_rows()write_csv_rows()
- csvsmith.tools
- csvsmith.utils