Blog · Apr 16, 2026 · parse · performance · architecture

Parse at <80ms P50 on real supplier HTML

How Turaxia's Parse primitive stays under 80ms on live supplier pages — what it does, how it's typed, and why it refuses to guess.

Parse is the workhorse of the Turaxia toolkit. Hand it a supplier URL (or the raw HTML) and it returns a clean, typed product record with a confidence score on every field. We hold it to sub-80 ms P50 on real supplier pages and sub-10 ms on the fixture baseline we use for regression testing.

Here's how.

Scope, not magic

Parse is not a general-purpose AI extractor. It is three specific things:

  • A DOM extractor that understands per-retailer structure in addition to generic product markup.
  • A variant decomposer that handles size × color × material grids.
  • A type-first contract — every returned field is either a typed value with a confidence score, or an explicit null.

We made one strong call on day one: Parse never hallucinates. If a field cannot be derived with confidence, Parse returns null and records a confidence signal instead of guessing. Localize, Price, and Route all consume these typed records. A hallucinating Parse would poison everything downstream.

What it actually runs

The generally-available Parse path is a tuned DOM + Open Graph + JSON-LD extractor with encoding detection (Shift_JIS, EUC-JP, UTF-8), per-retailer rules, variant-matrix decomposition, and confidence traces on five canonical fields. When a page is genuinely non-parseable from text alone, there is a screenshot-based lane available as an opt-in.

We benchmark Parse two ways:

  • Fixture baseline — deterministic replay from a captured supplier HTML. Stable regression path; no paid providers in the loop. Published as a card in the proof bundle.
  • Live pull — live fetch from the real source URL. Also published as a card in the proof bundle.

Both cards record the same shape: latency, output summary, and notes about what was captured.

Why "speed" alone is not a claim

"Parse is fast" is useless without the shape of the test. The proof bundle publishes:

  • The source URL.
  • The encoding and size of the HTML.
  • The primitives that executed.
  • The audit events emitted.
  • The confidence signals captured.

If you want to replay the benchmark on your own supplier HTML, the quickstart has the exact command.

What is next

  • Expanding supported retailers as new design partners come online.
  • Adding more public benchmark cards as our cohort grows.
  • Graduating the screenshot-based lane into wider availability once it clears our internal quality bar.

Parse stays the primitive we are hardest on. It is the top of the funnel, so every regression there shows up five times downstream.


Written by Yerzhan Karatayev. Last updated Apr 16, 2026.

← All posts