The moat is the garment, not the model

Frontier image models are becoming a commodity. The durable asset is the garment-identity graph: every image we process, linked to the physical garment it depicts.

Frontier labs release new image models every few months, each one more capable and less expensive than the last. For a company that generates product imagery, that trend is welcome, because it lowers our costs with every release. It also makes clear where our advantage cannot come from. A model we reach through an API is a model our competitors can reach on the same terms, so the model itself will never be the thing that sets us apart.

Our advantage comes from the images that move through Wearly Studio. These are real garments from real catalogs, processed through a workflow that records what each garment is and which physical SKU it belongs to. That record is proprietary, it grows every day, and no general-purpose lab can reconstruct it from public data.

Today we are building a data layer over those images. Every image is classified by its role, whether it is a front view, a back view, a detail shot, an on-model image, or a flat lay. Brand labels are detected and read. Garments are separated from the model and the scene. Each image is then grouped to the one physical SKU it depicts. Lightweight classifiers handle the high-volume decisions and careful segmentation handles the precise ones. This is product work before it is research, because the annotation layer already improves generation quality and control in the studio.

The durable asset is narrower than the data layer as a whole. The masks are not the asset, because segmentation can be re-derived and any team can run a segmenter. The asset is the garment-identity graph: every image linked to the same physical SKU and its role, with a human-verified core and attributes ready for captioning. That linkage exists only because the workflow produced it as a byproduct of real work. It cannot be scraped from the web, and it cannot be added back after the fact.

This points toward a specific kind of model, and it is not a better general image model. It is a narrow, domain-specialized one that is strong where general models are weak and where our data is decisive. The two cases that matter are garment identity, meaning the same physical garment held faithfully, and multiview consistency, meaning the same SKU kept coherent across poses, scenes, and try-on. The work goes into domain-specific structure rather than into matching frontier scale.

We intend to measure before we train. We are building a benchmark from the graph that scores every frontier release on garment identity. If general models close the gap on their own, we will keep buying inference and put the savings elsewhere. If the gap holds, we will train, and we expect it to hold because the data the task requires is the data we are uniquely positioned to own.

Product and research draw on the same loop. The workflow processes garments, the garments grow the graph, the graph trains models, and better models improve the workflow. The data is the advantage at both ends of that loop, which is why we treat it as the core of the research program rather than a side effect of the product.