API-First Is Backwards. Data and Index-First

“API-first” has quietly become the default rallying cry of modern e-commerce. It shows up in pitch decks, platform marketing, and architecture diagrams as if it were self-evidently correct. I’ve spent the better part of two decades building commerce platforms, and I’ve come to think it’s backwards — especially for the one asset that actually determines whether a store can scale: product data.

Let me explain why, and what I think the right starting point is.

An API is a view, not a foundation

The core problem with “API-first” is that it inverts the dependency chain. An API is a view over data — a contract that exposes some shape of something that already exists underneath. When you make that contract your foundation, you’re building a house starting with the windows.

Everything you can ever do is now bounded by someone else’s schema. Their field types. Their size limits. Their rate limits. The data they decided is worth modeling, and — just as important — the data they decided you’re not allowed to put in. At small scale you don’t feel the walls. At real scale, those walls are exactly the thing strangling you: the attribute you can’t add, the nested structure that won’t serialize, the ranking signal you can’t store, the enrichment that has nowhere to live.

You didn’t choose those constraints. You inherited them, permanently, the day you made the API the first thing.

What API-first actually costs you for product data

Product data is the worst possible thing to gate behind someone else’s API, because product data is messy, deep, and specific to your business. It wants to be nested. It wants custom attributes nobody anticipated. It wants fitment tables, compatibility graphs, enrichment layers, computed fields, and ranking signals that exist nowhere in a generic platform’s data model.

API-first platforms answer that ambition with proprietary gates: type restrictions, payload size caps, “metafields” that bolt on awkwardly, and feeds that quietly reject the records that don’t fit their mold. You spend your engineering time negotiating with their model instead of building yours.

To be fair — and I want to be fair — API-first isn’t always wrong. If a platform is doing genuinely hard work for you (fraud, tax, payments, compliance), consuming their API is exactly right. The mistake is treating “API-first” as an architectural philosophy rather than what it really is: a convenient integration tactic for the parts you’ve chosen not to own.

Index-first: own the model, generate the API

When we designed the search-commerce-next approach, we started from the opposite end. We built it around the data, on an Elasticsearch-based document model, with the explicit goal of near-limitless flexibility in how that data gets extended and used.

That single decision changes everything downstream:

You can store complex data — nested objects, arbitrary attributes, computed fields, enrichment layers — without asking permission from a vendor’s schema.
You can prioritize and rank products the way your business actually works, because boosting and relevance live in your index, not in someone else’s black box.
You can extend the model the day you need to, not the day a roadmap allows it.
You can generate an API on top of your own data almost instantly. This is the part that quietly upends the whole “API-first” premise.

An API over data you own is a disposable artifact. You can regenerate it, reshape it, version it, and throw it away. An API you depend on from someone else is a ceiling you can’t move.

AI-readiness is a property of your data, not a feature you buy

Here’s where index-first stops being an architectural preference and becomes a competitive one.

AI-readiness is not a feature you bolt onto a store. It’s a property of how your data is structured. If your product catalog lives behind a proprietary API with type and size limits, you can’t cleanly feed it into embeddings, semantic search, or an LLM’s context window — you’re constantly fighting the format.

Make it concrete. Take an automotive catalog, where a single tire isn’t really a single record — it’s a record wrapped in structure. It carries fitment (the years, makes, and models it fits), engineering specs (load index, speed rating, section width, aspect ratio, revolutions per mile), and the relationships that connect all of it. In a generic platform model, most of that either doesn’t fit or gets flattened into string-soup metafields you then have to reparse. In a document model you own, it lives natively — nested, queryable, and intact.

Now watch what that same structure does for AI. A shopper asks, “what fits my 2020 Jeep Grand Cherokee, and which set rides quietest?” — and the data needed to answer it is already there, already shaped, already retrievable. The fitment graph scopes the candidates, the spec fields rank them, and the enrichment text gives a chat agent something true to say. That’s a retrieval pipeline with no data-wrangling step, because the wrangling was the foundation.

And it’s the same data model the whole way down. One intelligent index powers search, merchandising, recommendations, RAG for chat agents, and whatever interface comes next. You don’t build an “AI layer” — you point new experiences at the data you already own. There’s a lot more to say here about RAG patterns, agent design, and what one shared index really unlocks once you stop scattering your data across a dozen systems — that’s its own post, coming soon. The point for now is simpler: you don’t make an index-first store AI-ready as a project. It simply is, because the foundation was the data model the whole time.

The headless realization

This brings me to an observation I keep coming back to. A “headless storefront” only needs someone else’s API — Shopify, BigCommerce, take your pick — if that platform is actually doing something for you. I’m naming names to be concrete, but this isn’t really about Shopify or anyone else specifically; it holds for any commerce tool you’ve placed between yourself and your own data.

So ask the question honestly: what is the platform doing for you that justifies making its API your foundation?

In the agentic-coding era, building your own API layer is no longer the heavy lift it once was. Spinning up endpoints over a data model you control is fast, cheap, and increasingly trivial. Which means most of the dependency is habit, not necessity. The reasons to lean on a platform’s API have collapsed to the parts that are genuinely hard and regulated.

The one that survives the scrutiny is checkout. Payments, PCI scope, fraud, and the order lifecycle are hard for good reasons, and reinventing them carelessly is a great way to lose money. That’s the one place where “let someone else’s API do the work” still makes sense.

Unless, of course, you’ve already solved that part too.

The principle

Strip away the slogan and it comes down to a single distinction:

API-first asks: how do I integrate? Index-first asks: what do I own?

An API is a contract. Data is an asset. You don’t build a company at scale on someone else’s contract — you build it on the model you own, and you generate whatever contracts you need on top of it.

Start with the index. The API will take care of itself.