"Session Has No Input Names": The Anatomy of a Truncated Model Download
ONNX Runtime doesn't throw when you hand it a half-downloaded model. It builds a broken session that fails cryptically three tools later. The failure chain and the two checks that make it fail at the source.
One of the most-reported, longest-lived errors in our background remover was BiRefNet: session has no input names. It looks like a model problem. It's actually a delivery problem — and tracing it taught us a general lesson about how silent corruption propagates.
The failure chain
Here's what actually happens:
- The model file fetch is interrupted — a flaky connection, a 503 on the route, a cache entry that was written partially. You now have a truncated buffer of bytes.
- You hand those bytes to
InferenceSession.create(). ONNX Runtime does not throw. It builds a session object from whatever it got. That session'sinputNamesis an empty array[]. - Nothing notices. The broken session gets cached as if it were fine.
- Later — sometimes in a different tool entirely — code tries to feed an input to
session.inputNames[0], readsundefined, and crashes with a message that has nothing to do with the real cause: a download that dropped.
The error surfaces far from where it was born. That distance is what made it feel unfixable for ~50 update rounds: people debugged the tool that crashed instead of the download that failed.
The fix: validate at the boundary
The cure is to refuse a bad buffer the moment it arrives, before it can become a poisoned cached session:
- Byte-length validation. Compare the received byte count against the response's
content-length. If they don't match, the download was truncated — throw immediately with "got X of Y bytes," not a crypticundefinedlater. (A missingcontent-lengthheader is its own tell, and it also breaks download progress bars — so we ensure it's present.) - Non-empty I/O assertion. Right after
InferenceSession.create(), assertinputNames.length > 0andoutputNames.length > 0. If either is empty, the model is corrupt or incomplete — release it, evict the bad cache entry, and throw a clear typed error telling the user to reload to re-download.
That last step matters: without evicting the cache, a corrupt session gets reused forever, and "reload the page" doesn't help because the broken bytes are still on disk. Evict-then-fail is what makes a retry actually re-fetch.
We verified the delivery, too
Before assuming the code was the whole story, we audited the live model URLs from the browser with our real origin. Every model file returned HTTP 200 with a correct content-length, proper CORS, and range support. So the network layer was healthy — meaning the surviving occurrences were exactly the transient-truncation and cache-poisoning cases the two checks above are built to catch.
The general lesson
Silent corruption is worse than a crash because it travels. A library that "helpfully" doesn't throw on bad input hands you a broken object that detonates later, somewhere unrelated. The defense is to validate at the boundary where the data enters — byte count and structural assertions — so the failure happens at the source, names its real cause, and cleans up after itself.