Industrial computer vision needs data before it can build trust, and trust before users will tolerate the imperfections that come with early data. GenAI promises to break that deadlock — but the domain gap between human-centric generative models and featureless industrial parts runs deeper than expected. This review tests three GenAI strategies (personalization, augmentation, CAD synthesis) against MVIP, a 308-class dataset of used car components, and finds that the gap is linguistic as much as visual: the models know "rusty" as a descriptor of dogs, not generators.
Industrial AI requires data to be predictable. Users require predictability to trust AI. Trust is required before users will tolerate the uncertainty of early data collection. The cycle stalls before it starts — and when early models disappoint, the lost trust is almost impossible to recover.
Active learning can ramp up data incrementally, but the performance dip during ramp-up is itself the trust-killer. GenAI offers a different entry point: generate plausible training data before any production deployment, so the first version the user sees already has enough coverage to behave reasonably.
Each strategy occupies a different point on the fidelity–diversity tradeoff. None is universally better — each has a specific failure mode in the industrial context.
Perfusion generates recognizable objects. The fidelity is not sufficient for MVIP's fine-grained classification challenge, but it is sufficient for pretraining. The most important breakage is linguistic.
DA-Fusion's single parameter — diffusion intensity — produces two qualitatively different outputs that serve different training purposes. The split is clean and deliberate.
ResNet18 trained from scratch on all 308 MVIP classes. Encoder pretrained with supervised contrastive learning, then frozen. Final classification layer finetuned on labeled data. OoD data is measured for its confidence effect, not accuracy — because confidence calibration is the trust signal users see.
| Dataset | Top-1 Accuracy | avg. ID Confidence | avg. near-OoD Confidence | Conf. Δ |
|---|---|---|---|---|
| MVIP (baseline) | 71.4% | 69% | 62% | 7% |
| + ID Data (DA-Fusion 20%) | 77.5% | 76% | 65% | 11% |
| + OoD Data (DA-Fusion 50%) | 75.3% | 40% | 35% | 5% |
The OoD result is not a failure — it is a feature. Collapsing model confidence on near-OoD inputs is precisely what makes the model safer to deploy. A user sees low confidence and defers to a human. The confidence gap (Δ) between ID and OoD data is the operational signal: ID data widens it to 11pp, giving users a clearer uncertain/confident split.
NVDIFFRECMC reconstructs textured 3D models directly from image arrays. When it works, the results are photorealistic and simulation-ready. When it fails, it fails silently — producing geometry that looks plausible but is metrically wrong.
RUSTY
IS NOT
A DOG.
TEACH IT.