From Prompts to Deployment: Auto-Curated
Domain-Specific Dataset Generation via Diffusion Models

Winter Conference on Applications of Computer Vision Workshop (WACVW), 2026

Dongsik Yoon, Jongeun Kim^‍

HDC LABS

Abstract

In this paper, we present an automated pipeline for generating domain-specific synthetic datasets with diffusion models, addressing the distribution shift between pre-trained models and real-world deployment environments.
Our three-stage framework first synthesizes target objects within domain-specific backgrounds through controlled inpainting. The generated outputs are then validated via a multi-modal assessment that integrates object detection, aesthetic scoring, and vision–language alignment.
Finally, a user-preference classifier is employed to capture subjective selection criteria. This pipeline enables the efficient construction of high-quality, deployable datasets while reducing reliance on extensive real-world data collection.

Proposed Methods

Our automated dataset generation pipeline has two prerequisites:
‍- Domain-specific background images: scenes that are intrinsically difficult to collect or tied to a specific site (e.g., underground parking lots, elevator CCTV views, and home security camera footage). Because these images constitute the backgrounds in which the model will operate in real-world settings, we recognize that the number of obtainable images may be limited.
‍- Target object: the object to be synthesized within the given domain context. While these objects are typically common and could feasibly appear in domain images, we focus on natural object--scene combinations that are scarce in publicly available datasets (e.g., underground parking lot with a fire, elevator CCTV viewpoint with a dog, robot vacuum cleaners in home security footage).

Overview of the proposed three-stage, diffusion-based dataset generation and auto-curation pipeline.

Accepted samples

Success image that passes all three stages, demonstrating target object synthesis within a domain-specific background.

Rejected samples

Rejected @ Stage 1 → Instance Detection & Aesthetic Filtering

Negative images that failed object synthesis in Stage 1 due to low detection and aesthetics scores.

Rejected @ Stage 2 → Preference Score Filtering

Discarded images rejected in Stage 2 due to poor viewpoint/pose from annotators despite successful object synthesis.

From Prompts to Deployment: Auto-Curated Domain-Specific Dataset Generation via Diffusion Models