Ambect Solutions LLC v1.0 is live

After months of building, the Ambect API is publicly available. This post covers what we shipped in v1.0, how the normalization pipeline works, and what’s on the roadmap.

What we shipped

v1.0 ships five normalization endpoints under a single API key:

/v1/normalize/company — strips legal suffixes, expands acronyms, maps synonyms, and returns a stable canonical token sequence suitable for exact and fuzzy matching across 100+ countries.
/v1/normalize/address — parses and normalizes postal addresses using libpostal (trained on 1B+ addresses), with optional geocoding on Growth and Business plans.
/v1/normalize/phone — parses any phone number format and returns E.164, national, and international representations with line-type classification.
/v1/normalize/email — normalizes email addresses and URLs, flags disposable email domains.
/v1/normalize/identifier — validates and normalizes SSN, EIN, EU VAT, SIC codes, and stock tickers with auto-detection.

All endpoints return sub-5ms responses on the Growth and Business plans. The Starter plan includes company and email normalization.

How the company pipeline works

Entity deduplication breaks down at the suffix: “Acme LLC”, “Acme, L.L.C.”, and “ACME Limited Liability Company” are the same company, but a naive string comparison will miss all three. The pipeline handles this in nine deterministic stages:

Transliterate — convert Cyrillic, Arabic, CJK, and other non-Latin scripts to Latin equivalents using country-aware rules.
Lowercase — normalize case.
Punctuation — strip dots, commas, and special characters.
Whitespace — collapse and trim.
Legal suffix — detect and strip the suffix (LLC, Ltd, GmbH, S.A., …) using a database of 466 suffixes across 100+ countries, and record it separately in the response.
Acronym expansion — expand known acronyms to canonical long forms.
Stop words — remove noise tokens by entity type and country.
Synonym mapping — map 3,000+ synonyms to canonical tokens.
Sort tokens — sort remaining tokens so word order doesn’t affect matching.

The output is a canonical string and atokens array. Two records are a match when their canonicals are identical. For fuzzy matching, the token arrays are ready for Jaccard similarity or embedding.

The legal suffix step is where most false negatives live. “GmbH” in Germany, “S.r.l.” in Italy, and “Pty Ltd” in Australia all signal the same entity structure — a private company with limited liability — but they look nothing alike to a string matcher. The suffix database now covers 466 entries with full metadata: liability type, public/private status, country list, and global equivalents. You can browse it at /glossary.

What’s next

A few things already in progress for v1.1 and v1.2:

Batch normalization — send up to 1,000 records per request for bulk deduplication pipelines.
Transliteration tools — standalone endpoints for Cyrillic→Latin, Arabic→Latin, and other common data-cleaning tasks.
Confidence scores — per-field confidence signals so you can route low-confidence outputs to a human review queue.

If there’s a normalization problem you’re hitting that isn’t covered above, reach out. The roadmap is largely shaped by what users actually run into.

Get started free →