What this tool does
JSONL Deduplicator removes repeated records from line-delimited datasets before they flow into training, evaluation or batch-processing pipelines. It can compare full records or a single selected field so you can choose whether duplicate detection should be strict or task-focused.
This is especially useful when data comes from merged exports, repeated collection jobs or iterative prompt-building workflows where near-identical records can quietly inflate the dataset.
- Deduplicate by full normalized record or by one key field.
- Normalize whitespace before matching repeated prompts and answers.
- Inspect which lines were removed and which earlier lines they matched.