Python API

csvsmith also provides a Python API for integrating CSV cleaning and transformation logic into scripts, applications, and data-processing workflows.

Use the Python API when you want to:

  • call csvsmith functionality from Python code

  • reuse cleaning or matching logic programmatically

  • build repeatable pipelines without shell commands

Overview

The Python API complements the command-line interface (CLI).

  • Use the CLI for one-off tasks and shell workflows.

  • Use the Python API when you need direct integration in Python code.

For command-oriented usage, see CLI Reference.

For detailed module reference, see:

Typical import style

Import from public modules whenever possible.

# Example: utility-style import
from csvsmith.utils.clean_numeric import parse_number

value = parse_number("1,234")

Example

Clean a list of numeric-like values:

from csvsmith.utils.clean_numeric import parse_number

values = ["1,200", "¥3,000", "N/A", " 42 ", 7]
cleaned = [parse_number(v) for v in values]

print(cleaned)

Expected result:

[1200, 3000, None, 42, 7]

When to use the Python API

The Python API is a good fit when you want to:

  • preprocess values before writing CSV output

  • integrate csvsmith logic into a larger ETL or data-cleaning script

  • test data-processing behavior directly in Python

  • avoid shelling out to CLI commands from application code

Relationship to tool pages

Some tool pages describe behavior that is also useful from Python.

Examples:

The tool pages explain user-facing behavior, while the API reference documents modules, functions, and classes.

Reference

For full module-level documentation, see:

Notes

  • Prefer stable, public imports over internal implementation modules.

  • Keep CLI usage and Python usage documented separately.

  • If a function is primarily internal, document it in the API reference rather than this overview page.