elidukes.xyz

[ note ] // usdq

usdq

A deliberately low-level, JSON-native CLI for authoring OpenUSD from the terminal.

pip install usdq on PyPI ↗

01 Sim2real

Cheap, scriptable 3D scenes are the substrate for training and testing policies in simulation before they touch hardware.

unlockrobot policies
cost↓ data
02 Built environment

Construction and digital twins live in USD. Terminal authoring lets agents assemble and edit them at a fraction of the time cost.

inputBIM / CAD
outputUSD twin
03 Open judging

A thin tool over USD turns spatial reasoning into a library useful for on-policy RL.

probespatial
intentmodel, not tool

What it is

usdq is a JSON-native CLI for authoring OpenUSD from the terminal. It gives an agent a typed verb surface over USD (author a prim, set an attribute, build a material, bind it, render) with structured JSON in and structured JSON out, exit codes, and no DCC app in the loop. The core bet is that agents should not author complex scene files through loose prose; they should call typed operations, read structured results, and iterate against a real asset graph.

Why it's needed

Coding agents are improving fast, and they are already fluent at exactly this shape of work: call a tool, read its output, correct, repeat. The moment 3D asset and scene authoring moves into the terminal, the time cost of producing OpenUSD collapses, and USD is the substrate underneath the use cases that matter: sim2real (training and validating policies in simulation), construction and the built environment, and digital twins.

Why OpenUSD

OpenUSD is the emerging standard for 3D interoperability and scene composition, used by Nvidia, Meta, Apple, Pixar, Autodesk, and plenty of other participants. It's emerged as the common substrate that sim2real and robotics, production animation and VFX studios, and asset authoring for game engines are all converging on, with more domains following. Its layering and composition model is exactly what lets independently authored assets, references, and edits stack into one coherent scene, and it comes with the benefit of enabling git-style version control plus authored intent, extremely beneficial for training agents. The terminal lets usdq take advantage of other infrastructure like the Harbor Framework, and naturally lets coding agents gain aptitude in scene authoring over time.

Deliberately low level

usdq is intentionally dumb about design. It reduces the mental cost of authoring OpenUSD (the bookkeeping of prims, layers, references, materials, UVs, and composition) and stops there. It does not ship a furniture catalog, a layout solver, or a "make a tasteful room" macro, and it never will. That's the entire point: by keeping the tool low-level, what gets exercised is the model's own spatial and design reasoning, not the tool's. usdq exists to let us judge models on that capability, not to wrap it away behind helpers that would launder the result. As a result, the assets that agents currently ship are often imprecise, though given an asset library, that quickly lessens as a problem.

How it works

The surface is 58 verbs in three tiers: plumbing primitives that map onto USD, sugar verbs that bundle canonical patterns (make-pbr-material, make-mesh-box, bind-material, set-transform), and introspection verbs (explain, ls, bbox, verbs) so an agent can discover the surface and read the stage back. usdq-render drives three back ends: usdrecord (Storm), Blender Cycles, and a Three.js web preview.

The speed comes from a long-running daemon over Unix sockets that keeps USD's Python (pxr) resident instead of paying its ~250ms import on every call. On a 15-operation agent workflow (the realistic case) that's about 4.5× faster than the naive approach of spawning a fresh pxr script per op: 0.90s vs 4.08s (≈16.7 vs 3.7 ops/sec). The gap widens with N, because the import is amortized once instead of paid every time.

Scenes agents authored through usdq

Every one of these was authored by a coding agent, entirely through the verb surface, and tagged below with which one.

You'll note that one-shotting these often creates asset-composition issues. There's a lot of RL still to be done here on spatial data for agents.

Future work and benchmarking

A low-level authoring surface is the thing that makes spatial evals tractable. The directions I want to push toward:

Blueprint-Bench 2.0, but for OpenUSD. Instead of grading flat blueprint reading, judge whether a model can author a coherent scene in USD, and score the result for spatial awareness, scale, and adjacency.

Sim-ready (simready) asset creation. Can a model produce assets that are actually usable downstream (correct units, pivots, UVs, materials, collision), not just renders that look fine from one angle? Simready is a standard created by Nvidia with a verifiable framework amenable to grading. Coding agents enabling far faster and cheaper creation of simready assets and Isaac Sim scenes may drop the cost curve of sim2real quickly enough that robotics foundation models get cheap, Internet-scale data collection.

Scene composition over libraries. Given an asset library, a BIM model, or a set of references, can a model compose a valid, well-organized scene? That's the task that connects usdq to construction and digital-twin work.

What is not done

Still early (v0.1.0). The surface needs sharper diagnostics, fuller variant-authoring workflows, a diff verb for opinion-level comparison, and, most of all, the benchmarks above, so "did the model get the space right" becomes a number instead of a judgment call.

There's clearly a ton of work left to enhance models' spatial awareness, layout, and design, but all of it looks hill-climbable, with vast implications for physical-world design, robotics, digital twins, sim2real, and more. Add in the advances in neural rendering, and there's plenty to build with OpenUSD as the substrate.

← back to projects