Clean Numeric and Currency

What it does

Normalizes and extracts numeric values from messy text strings. It handles group separators (thousands), and varied decimal separators.

The clean-currency-numeric tool additionally handles currency symbols (like $, ¥, ).

Python usage

from csvsmith.utils.clean_numeric import clean_numeric, clean_currency_numeric

# Basic numeric cleaning
val = clean_numeric("1,200.50")  # Returns 1200.5

# Using localized separators (e.g., German style)
val = clean_numeric("1.200,50", sep=".", decimal=",")  # Returns 1200.5

# Cleaning values with currency symbols
val = clean_currency_numeric("¥5,000")  # Returns 5000.0

# Relaxed mode returns the original value if it can't be parsed
val = clean_numeric("Not a number", relaxed=True)  # Returns "Not a number"

CLI usage

To clean a standard numeric string:

csvsmith clean-numeric "1,200.50" --sep "," --decimal "."

To clean a value that includes a currency symbol:

csvsmith clean-currency-numeric "¥5,000" --sep "," --decimal "."

Note

When using currency strings starting with $ (e.g., "$1234.56") in shell scripts, be aware that the shell might attempt to expand it as a variable. Always use single quotes ('$1234.56') to prevent unexpected expansion.

Behavior notes

  • Group Separators: Commas, underscores, and non-breaking spaces are handled.

  • Negative Values: Supports leading minus signs or values enclosed in parentheses (e.g., (100) becomes -100).

  • Default Separators: Defaults to , for thousands and . for decimal.