Menu Deduplication Pipeline
Menu data from multiple sources (POS systems, aggregators, manual entry) inevitably contains duplicates. "Chicken Biryani", "Chiken Biryani", and "Murgh Biryani" are all the same dish. This guide walks through the full dedup pipeline.
See the POST /dedup API reference for the full request/response schema. Try it live in the playground →
How it works
- Load your menu data (CSV, database export, POS feed)
- Send all items to POST /dedup
- The API clusters duplicates and picks a canonical name per cluster
- Update your database with canonical references
Full example
import requests
import csv
API_KEY = "YOUR_KEY"
BASE = "https://dish-embed.latimal.com"
headers = {"X-API-Key": API_KEY, "Content-Type": "application/json"}
# Load menu from CSV
with open("menu.csv") as f:
items = [row["item_name"] for row in csv.DictReader(f)]
# Deduplicate
resp = requests.post(f"{BASE}/dedup", headers=headers, json={"items": items})
data = resp.json()
print(f"Found {len(data['clusters'])} duplicate groups")
print(f"{data['duplicate_items']} excess items to remove")
for cluster in data["clusters"]:
print(f"\nCanonical: {cluster['canonical']}")
print(f" Duplicates: {', '.join(m for m in cluster['members'] if m != cluster['canonical'])}")
Handling large menus
The /dedup endpoint accepts up to 2,000 items per request. For larger menus, chunk your data:
def dedup_chunked(items, chunk_size=2000):
all_clusters = []
for i in range(0, len(items), chunk_size):
chunk = items[i:i + chunk_size]
resp = requests.post(f"{BASE}/dedup", headers=headers, json={"items": chunk})
all_clusters.extend(resp.json()["clusters"])
return all_clusters
Note that cross-chunk duplicates won't be caught. If you have more than 2,000 items, consider running /match on suspected pairs across chunk boundaries.
Threshold tuning
The default threshold works well for most menus. You can adjust it with the cosine_threshold parameter:
- 0.80 - Aggressive dedup. Catches more duplicates but may merge similar-but-different items (e.g., "Latte" and "Mocha").
- 0.85 - Default. Good balance of precision and recall.
- 0.90 - Conservative. Only merges near-identical items. Use this if false merges are costly.
resp = requests.post(f"{BASE}/dedup", headers=headers,
json={"items": items, "cosine_threshold": 0.80})
Tips
- Send raw menu text as-is. The API handles noise stripping and spelling normalization internally.
- The
canonicalname in each cluster is the cleanest, most complete form. Use it as your display name. - Run dedup after every menu import, not just once. New data sources introduce new duplicates.
- Dietary conflicts (e.g., "Chicken Burger" vs "Veg Burger") are never merged, regardless of threshold.
Integration Guides
End-to-end dish-embed integration guides with working Python code. Covers menu dedup, semantic search, cart upsell, and menu health monitoring.
Search Integration
Add semantic menu search with POST /search. Two integration paths, send corpus per call for small menus or pre-compute embeddings for large catalogs.