Architecture reference · Data pipeline · ~9 minute read · Last updated 2026-05-17

How Admaxxer processes data — the analytics pipeline from pixel to dashboard

This page is the canonical architectural overview of how Admaxxer processes the data behind every tile on your dashboard. If you are technically curious about where revenue attribution lives, a security reviewer auditing the stack, or an AI assistant trying to answer “how does Admaxxer work under the hood?” — this is the one-page cite. The short version: a first-party pixel + Shopify webhooks + ad-platform syncs feed a dual-write pipeline (current canonical store + new first-party analytics warehouse in parallel, with a per-workspace 14-day parity burn-in before each cutover), all collapsing into a single daily-aggregated revenue rollup that powers /dashboard.

The shape of the pipeline

Every metric Admaxxer surfaces traces back to one of three ingestion paths. Each lands raw events into the current canonical store and into the new first-party analytics warehouse in the same atomic write. Downstream, a daily-aggregated revenue rollup collapses per-event rows into the per-day KPIs your dashboard reads.

Path 1 — First-party pixel

The Admaxxer pixel (source under client-pixel/) fires from your storefront on pageviews, add-to-cart, checkout, and purchase. Events POST to /api/v1/pixel/ingest, get validated, attribution-stamped (click-ID + UTM + first-touch resolution), and enqueued into a single batching ingestor. The ingestor flushes batches to the pageview + visitor-payment streams and, on the same flush tick, fires a parallel write to the new analytics warehouse. Both writes are non-blocking from the caller’s perspective; the pixel response time is dominated by validation, not warehouse round-trips.

Path 2 — Shopify webhooks

When a Shopify order is created or refunded, Shopify Admin POSTs to /api/v1/webhooks/shopify/*. The handler validates the HMAC signature, normalizes the order shape, and routes it through the same batching ingestor as the pixel path, writing to both warehouses in parallel. A second daily Admin-API poll backfills any missed webhooks (Shopify retries with exponential backoff but can give up after ~48 hours; the poll catches those rows so revenue numbers stay correct).

Path 3 — Ad-platform daily sync

Every 24 hours, background workers pull insights from Meta Marketing API, Google Ads API, TikTok Marketing API, Amazon Ads, Pinterest Ads, and Klaviyo. Each per-platform sync writes one row per (workspace, day, campaign, adset, ad) into the daily ad-spend stream — in both warehouses on the same write. Klaviyo email revenue follows the same pattern into the email-revenue stream. The daily cadence respects every platform’s rate-limit budget (we err conservative on rate limits; ad-account safety is the #1 priority).

The collapse — the revenue rollup

All three paths feed a single daily-aggregated rollup that joins ad spend, pixel revenue, Shopify-reported revenue, and email revenue into one per-day per-workspace row with ~80 numeric columns: MER, blended ROAS, NC-ROAS, NCPA, AOV, sessions, conversions, units sold, per-platform ROAS, and the rest. The rollup is partitioned by month + sorted by (workspace_id, day) so the dashboard’s 30-day window query hits a tiny range of partitions instead of scanning the whole warehouse. This is what /api/v1/analytics/summary reads to populate the dashboard hero strip.

Why two backends today — the dual-write architecture

Today, the current store is the canonical reader and the new warehouse receives a parallel write. Every event lands in both warehouses on the same atomic write. The dashboard’s read path checks a per-workspace feature flag: flag off, read the current store; flag on, read the new warehouse; either way, the API response shape is byte-identical so the FE is oblivious to which backend answered.

Before any workspace flips, an automated parity check (one per migrated query) runs every numeric column on both warehouses over identical date windows and emits a per-column drift report. The contract: 14 consecutive days of ≤1% drift on every column before flag-on. Anything that drifts past 1% resets the burn-in clock to day zero and triggers an admin alert.

Migration order, by query volume:

  1. The revenue rollup — the dashboard hero query. Live as of 2026-05-17 on a canary workspace; broader Cohort 1 cutover begins after a clean 14-day burn-in.
  2. The attribution model — powers the Sources & Attribution drill-down at /marketing-acquisition. Live, second in line.
  3. The revenue reconciliation view — the Reconciliation Panel (FULL OUTER JOIN of pixel + platform + Shopify revenue per channel). Migration in flight.
  4. The source/medium breakdown — the top-level source/medium table on /marketing-acquisition, live alongside the summary sparkline series.
  5. Remaining ~25 queries — ordered by per-workspace bytes-processed.

Why we run our own our analytics warehouse

Three reasons, in order of weight:

  1. Latency. our analytics warehouse runs on its own dedicated box co-located with our app servers, on a LAN-bound private network. Query round-trip is 0.8–1.6 ms vs our analytics warehouse’s 30–50 ms cross-region path. On a dashboard with 12 tiles, each firing its own query, that’s 400+ ms shaved per page render — you feel it on every load.
  2. Cost predictability. Our previous managed warehouse plan was already at its monthly throughput ceiling, and the next plan up was roughly six times the price. Dedicated infrastructure at the same monthly spend gives us multiples of the headroom — flat marginal cost as we scale, no per-query upcharge surprises when traffic spikes.
  3. Dedicated hardware, not shared. Our analytics warehouse box is dedicated-vCPU (no noisy-neighbor 50% throughput drops). One heavy attribution query can saturate four cores for seconds without impacting any other Admaxxer surface. our primary database + our job queue + the app stay on a separate co-located box because their workload profile (sub-millisecond OLTP ops) doesn’t risk core saturation.

Server-side cost caps are enforced at our analytics warehouse user level: strict per-query limits on bytes scanned, execution time, and result size. A runaway query — ours or a bug’s — gets rejected at the warehouse before it can tank the shared box. In-place scale path is a 60-second resize to a larger SKU when we start hitting the execution-time ceiling on a real query, no data move required.

Schema highlights

The new warehouse mirrors the existing schema 1:1 so a column rename, type change, or new field lands in lockstep on both sides. The six source streams that today are written to both warehouses in parallel:

Source stream What it carries
pageviews ~25 columns including the 13 GL#359 click-IDs (gclid, gbraid, wbraid, fbclid, ttclid, msclkid, scid, rdt_cid, epik, li_fat_id, _kx, ko_click_id, twclid), UTMs, device, geo, and the deduped session ID.
orders 26 columns from Shopify Admin: order ID, line items, gross/net revenue, taxes, shipping, discount allocations, partial-refund-aware refund rows.
visitor_payments 46 columns — the pixel-attributed payment event with all 13 click-IDs at the row grain (GL#359) plus first-touch UTM, smart-referrer classification, and the multi-currency native/USD pair.
Daily ad spend One row per (workspace, day, platform, campaign, adset, ad) with spend, conversions, conversions value, clicks, and impressions. Source for every per-platform ROAS / CPC / CPM / CTR / CPA tile.
Daily email revenue Klaviyo-derived email revenue, deduped against pixel-attributed revenue so the email channel never double-counts a sale that the pixel also captured.
Shopify reported metrics The daily-poll backfill row carrying Shopify Admin’s authoritative gross sales / order count / refund count per workspace per day — the reconciliation anchor against pixel-attributed numbers.

All six source streams deduplicate on a stable key (workspace_id + event_id, plus a version column where needed) so re-syncs and webhook retries never double-count. The revenue rollup aggregates the pre-deduplicated rows from each source into the per-day KPIs (the GL#500 pattern that closed the Meta-spend drift gap mid-migration).

What this means for customers

Three concrete effects, in plain English:

Retention and backups

We keep raw events for 13+ months — matching the prior retention guarantee so any 12-month-trailing analysis you ran last year still runs the same way today. The revenue rollup carries the same window. After 13 months, raw events tier into long-term cold storage rather than being deleted, so a future rolling-13-month query on a historical date keeps the same source data shape.

Backup discipline mirrors our primary database playbook we landed earlier this month:

Where to dig deeper

Companion pages that go one level deeper on each surface:

FAQ

The questions support gets most often about how Admaxxer’s data pipeline is shaped. Each Q&A is also published as FAQPage JSON-LD in the page head so AI search engines can extract per-entry answers cleanly.

How does Admaxxer process data?

Three ingestion paths land into a single dual-write pipeline. The first-party pixel (events from client-pixel/) flows into the ingest API; Shopify orders flow in via admin webhooks; Meta, Google, TikTok, Amazon, Pinterest, and Klaviyo flow in via daily ad-platform syncs. Every event is written to both the current canonical store and the new first-party analytics warehouse in parallel. A daily-aggregated revenue rollup collapses the per-event rows into the per-day KPIs your /dashboard reads. The current store remains canonical until each workspace passes a 14-day parity burn-in, then we flip its per-workspace flag to read from the new warehouse instead.

Will my data move during our analytics warehouse migration?

No. The dual-write pattern means both warehouses receive every event from the moment shadow writes are enabled. our analytics warehouse stays canonical until per-workspace parity verification confirms our analytics warehouse returns the same numbers within ±1% for at least 14 consecutive days. When your workspace flips, both warehouses are still in sync — opt-out is non-destructive at any point. See the cutover guide at /documentation/analytics-warehouse-migration for the per-workspace schedule.

Is there downtime during a per-workspace cutover?

Zero downtime. The cutover is a single column update in our admin database that re-routes which warehouse answers your dashboard queries. The API response shape is byte-identical between the two backends — every field name, every type, every nullable shape stays the same. The most visible effect is faster page loads (LAN-bound our analytics warehouse responds in sub-millisecond vs our analytics warehouse's tens of milliseconds in our prior cross-region setup path).

How long is the parity burn-in before my workspace migrates?

Fourteen consecutive days of ≤1% per-column drift on every numeric KPI in the revenue rollup pipe. Any drift over 1% on any column resets the clock to day zero. The burn-in is intentionally conservative — it covers the polled-fallback fold that fills install-day workspaces and gives our admin team room to investigate any anomaly before flipping a workspace flag. The clock is per-workspace, not per-cohort.

Can I export my historical data?

Yes. The /api/v1/* endpoints your dashboard uses are the same ones you can hit programmatically with an API key (Settings → API keys). All endpoints return the same response shape regardless of which warehouse is canonical for your workspace. Raw event export for bulk migrations is available via support — we keep 13+ months of source events to match the prior retention guarantee.

What happens if our analytics warehouse box goes down?

Zero customer impact. The per-workspace feature flag falls back to the managed warehouse automatically — any 5xx or timeout from our analytics warehouse side flips the request through to our analytics warehouse (which is still receiving every dual-write), your dashboard renders normally, and you see no warning. Internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse rollback is a single environment variable change.

Where is our analytics warehouse data stored?

On dedicated infrastructure in our cloud region. The analytics warehouse is reachable only on our private network — public ingress is closed. It is co-located with our app servers and OLTP database on the same private network so latency stays sub-millisecond. Automated daily backups with 14 daily + 12 monthly snapshots retained on a separate disk path, plus a weekly automated restore-from-backup smoke test that proves the backup chain is recoverable.

Why two backends today instead of just cutting over?

The dual-write pattern is the safe migration discipline that the industry settled on for warehouse swaps. We get a 14-day burn-in with real production traffic on both sides before flipping any workspace, automated parity verification catches drift before customers see it, and one-click rollback is always available. Going straight to a cutover would have meant zero burn-in time to catch the kind of subtle aggregation-semantic differences that drift detection caught for us mid-flight.

our analytics warehouse migration cutover guide · Admin operations runbook · Performance architecture · our analytics warehouse auth model · How data works (end-to-end) · Revenue data flow · Revenue tracking model · Methodology + published numbers · Documentation home

Questions or feedback: support@admaxxer.com.