Python API¶
csvsmith also provides a Python API for integrating CSV cleaning and transformation logic into scripts, applications, and data-processing workflows.
Use the Python API when you want to:
call csvsmith functionality from Python code
reuse cleaning or matching logic programmatically
build repeatable pipelines without shell commands
—
Overview¶
The Python API complements the command-line interface (CLI).
Use the CLI for one-off tasks and shell workflows.
Use the Python API when you need direct integration in Python code.
For command-oriented usage, see CLI Reference.
For detailed module reference, see:
—
Typical import style¶
Import from public modules whenever possible.
# Example: utility-style import
from csvsmith.utils.clean_numeric import parse_number
value = parse_number("1,234")
—
Example¶
Clean a list of numeric-like values:
from csvsmith.utils.clean_numeric import parse_number
values = ["1,200", "¥3,000", "N/A", " 42 ", 7]
cleaned = [parse_number(v) for v in values]
print(cleaned)
Expected result:
[1200, 3000, None, 42, 7]
—
When to use the Python API¶
The Python API is a good fit when you want to:
preprocess values before writing CSV output
integrate csvsmith logic into a larger ETL or data-cleaning script
test data-processing behavior directly in Python
avoid shelling out to CLI commands from application code
—
Relationship to tool pages¶
Some tool pages describe behavior that is also useful from Python.
Examples:
The tool pages explain user-facing behavior, while the API reference documents modules, functions, and classes.
—
Reference¶
For full module-level documentation, see:
—
Notes¶
Prefer stable, public imports over internal implementation modules.
Keep CLI usage and Python usage documented separately.
If a function is primarily internal, document it in the API reference rather than this overview page.