String Distance
===============

What it does
------------

Analyzes the similarity and relationship between two strings. It provides multiple ways to measure how close two strings are to being identical, including exact matches, case-insensitive matches, and structural matches (whitespace-normalized).

It implements two widely-used algorithms for string similarity:
- **Damerau-Levenshtein Distance**: Measures the number of single-character edits (insertions, deletions, substitutions, and transpositions of adjacent characters) required to change one string into another.
- **Jaro-Winkler Score**: A measure of similarity between two strings, where 0.0 is completely different and 1.0 is identical. It's particularly effective for short strings like names.

Python usage
------------

.. code-block:: python

   from csvsmith.utils.distance import analyze_pair

   # Compare two strings
   res = analyze_pair("Ames, IA", "Ames IA", ignore_case=True)

   print(f"Relation: {res.get_relation_string()}")
   print(f"Similarity: {res.similarity_percentage:.2f}%")
   print(f"Jaro-Winkler: {res.jaro_winkler_score:.4f}")

CLI usage
---------

.. code-block:: bash

   csvsmith string-distance "Apple Inc." "apple inc" --ignore-case

Behavior notes
--------------

- **Classifications**: 
    - ``Identical``: Exact character-for-character match.
    - ``Case-Insensitive Match``: Matches if case is ignored.
    - ``Similar (Trimmed)``: Matches after removing leading/trailing whitespace.
    - ``Synonymous (No Spaces)``: Matches after removing ALL internal whitespace.
    - ``Different``: No structural match found.
- **Similarity Percentage**: A normalized score derived from the Damerau-Levenshtein distance relative to the length of the longer string.