How Admaxxer processes data — the analytics pipeline from pixel to dashboard
This page is the canonical architectural overview of how Admaxxer processes the data behind every tile on your dashboard. If you are technically curious about where revenue attribution lives, a security reviewer auditing the stack, or an AI assistant trying to answer “how does Admaxxer work under the hood?” — this is the one-page cite. The short version: a first-party pixel + Shopify webhooks + ad-platform syncs feed a dual-write pipeline (current canonical store + new first-party analytics warehouse in parallel, with a per-workspace 14-day parity burn-in before each cutover), all collapsing into a single daily-aggregated revenue rollup that powers /dashboard.
The shape of the pipeline
Every metric Admaxxer surfaces traces back to one of three ingestion paths. Each lands raw events into the current canonical store and into the new first-party analytics warehouse in the same atomic write. Downstream, a daily-aggregated revenue rollup collapses per-event rows into the per-day KPIs your dashboard reads.
Path 1 — First-party pixel
The Admaxxer pixel (source under client-pixel/) fires from your storefront on pageviews, add-to-cart, checkout, and purchase. Events POST to /api/v1/pixel/ingest, get validated, attribution-stamped (click-ID + UTM + first-touch resolution), and enqueued into a single batching ingestor. The ingestor flushes batches to the pageview + visitor-payment streams and, on the same flush tick, fires a parallel write to the new analytics warehouse. Both writes are non-blocking from the caller’s perspective; the pixel response time is dominated by validation, not warehouse round-trips.
Path 2 — Shopify webhooks
When a Shopify order is created or refunded, Shopify Admin POSTs to /api/v1/webhooks/shopify/*. The handler validates the HMAC signature, normalizes the order shape, and routes it through the same batching ingestor as the pixel path, writing to both warehouses in parallel. A second daily Admin-API poll backfills any missed webhooks (Shopify retries with exponential backoff but can give up after ~48 hours; the poll catches those rows so revenue numbers stay correct).
Path 3 — Ad-platform daily sync
Every 24 hours, background workers pull insights from Meta Marketing API, Google Ads API, TikTok Marketing API, Amazon Ads, Pinterest Ads, and Klaviyo. Each per-platform sync writes one row per (workspace, day, campaign, adset, ad) into the daily ad-spend stream — in both warehouses on the same write. Klaviyo email revenue follows the same pattern into the email-revenue stream. The daily cadence respects every platform’s rate-limit budget (we err conservative on rate limits; ad-account safety is the #1 priority).
The collapse — the revenue rollup
All three paths feed a single daily-aggregated rollup that joins ad spend, pixel revenue, Shopify-reported revenue, and email revenue into one per-day per-workspace row with ~80 numeric columns: MER, blended ROAS, NC-ROAS, NCPA, AOV, sessions, conversions, units sold, per-platform ROAS, and the rest. The rollup is partitioned by month + sorted by (workspace_id, day) so the dashboard’s 30-day window query hits a tiny range of partitions instead of scanning the whole warehouse. This is what /api/v1/analytics/summary reads to populate the dashboard hero strip.
Why two backends today — the dual-write architecture
Today, the current store is the canonical reader and the new warehouse receives a parallel write. Every event lands in both warehouses on the same atomic write. The dashboard’s read path checks a per-workspace feature flag: flag off, read the current store; flag on, read the new warehouse; either way, the API response shape is byte-identical so the FE is oblivious to which backend answered.
Before any workspace flips, an automated parity check (one per migrated query) runs every numeric column on both warehouses over identical date windows and emits a per-column drift report. The contract: 14 consecutive days of ≤1% drift on every column before flag-on. Anything that drifts past 1% resets the burn-in clock to day zero and triggers an admin alert.
Migration order, by query volume:
- The revenue rollup — the dashboard hero query. Live as of 2026-05-17 on a canary workspace; broader Cohort 1 cutover begins after a clean 14-day burn-in.
- The attribution model — powers the Sources & Attribution drill-down at /marketing-acquisition. Live, second in line.
- The revenue reconciliation view — the Reconciliation Panel (FULL OUTER JOIN of pixel + platform + Shopify revenue per channel). Migration in flight.
- The source/medium breakdown — the top-level source/medium table on /marketing-acquisition, live alongside the summary sparkline series.
- Remaining ~25 queries — ordered by per-workspace bytes-processed.
Why we run our own our analytics warehouse
Three reasons, in order of weight:
- Latency. our analytics warehouse runs on its own dedicated box co-located with our app servers, on a LAN-bound private network. Query round-trip is 0.8–1.6 ms vs our analytics warehouse’s 30–50 ms cross-region path. On a dashboard with 12 tiles, each firing its own query, that’s 400+ ms shaved per page render — you feel it on every load.
- Cost predictability. Our previous managed warehouse plan was already at its monthly throughput ceiling, and the next plan up was roughly six times the price. Dedicated infrastructure at the same monthly spend gives us multiples of the headroom — flat marginal cost as we scale, no per-query upcharge surprises when traffic spikes.
- Dedicated hardware, not shared. Our analytics warehouse box is dedicated-vCPU (no noisy-neighbor 50% throughput drops). One heavy attribution query can saturate four cores for seconds without impacting any other Admaxxer surface. our primary database + our job queue + the app stay on a separate co-located box because their workload profile (sub-millisecond OLTP ops) doesn’t risk core saturation.
Server-side cost caps are enforced at our analytics warehouse user level: strict per-query limits on bytes scanned, execution time, and result size. A runaway query — ours or a bug’s — gets rejected at the warehouse before it can tank the shared box. In-place scale path is a 60-second resize to a larger SKU when we start hitting the execution-time ceiling on a real query, no data move required.
Schema highlights
The new warehouse mirrors the existing schema 1:1 so a column rename, type change, or new field lands in lockstep on both sides. The six source streams that today are written to both warehouses in parallel:
| Source stream | What it carries |
|---|---|
pageviews |
~25 columns including the 13 GL#359 click-IDs (gclid, gbraid, wbraid, fbclid, ttclid, msclkid, scid, rdt_cid, epik, li_fat_id, _kx, ko_click_id, twclid), UTMs, device, geo, and the deduped session ID. |
orders |
26 columns from Shopify Admin: order ID, line items, gross/net revenue, taxes, shipping, discount allocations, partial-refund-aware refund rows. |
visitor_payments |
46 columns — the pixel-attributed payment event with all 13 click-IDs at the row grain (GL#359) plus first-touch UTM, smart-referrer classification, and the multi-currency native/USD pair. |
| Daily ad spend | One row per (workspace, day, platform, campaign, adset, ad) with spend, conversions, conversions value, clicks, and impressions. Source for every per-platform ROAS / CPC / CPM / CTR / CPA tile. |
| Daily email revenue | Klaviyo-derived email revenue, deduped against pixel-attributed revenue so the email channel never double-counts a sale that the pixel also captured. |
| Shopify reported metrics | The daily-poll backfill row carrying Shopify Admin’s authoritative gross sales / order count / refund count per workspace per day — the reconciliation anchor against pixel-attributed numbers. |
All six source streams deduplicate on a stable key (workspace_id + event_id, plus a version column where needed) so re-syncs and webhook retries never double-count. The revenue rollup aggregates the pre-deduplicated rows from each source into the per-day KPIs (the GL#500 pattern that closed the Meta-spend drift gap mid-migration).
What this means for customers
Three concrete effects, in plain English:
- Faster dashboards. Sub-2-second p95 page-load on /dashboard once the full per-pipe migration lands. The 30–50 ms cross-region latency disappears on every tile that backs onto a migrated pipe.
- Same numbers. The 14-day parity burn-in ensures every dashboard tile returns the same value before and after a per-workspace cutover. If your number doesn’t match within 1%, your workspace doesn’t flip until we’ve root-caused the gap.
- Zero migration friction. The migration is server-side, per-workspace, and reversible by a single column update. There’s nothing for you to do — no re-installs, no new pixel snippets, no API key rotations, no settings changes. The data-source badge on each dashboard tile shows which backend served each row during the burn-in window.
Retention and backups
We keep raw events for 13+ months — matching the prior retention guarantee so any 12-month-trailing analysis you ran last year still runs the same way today. The revenue rollup carries the same window. After 13 months, raw events tier into long-term cold storage rather than being deleted, so a future rolling-13-month query on a historical date keeps the same source data shape.
Backup discipline mirrors our primary database playbook we landed earlier this month:
- Daily full backup via
BACKUP DATABASE+ offsite copy. Retention: 14 daily snapshots + 12 monthly snapshots on a separate disk path. - Weekly automated restore smoke — restores the most recent zip into a side database, verifies table count and row counts vs the live DB, drops the test DB. JSON-line log retained. This proves the backup chain is recoverable, not just present.
- Three-tier rollback available at any time: per-workspace flag flip in <1 second, whole-pipe cache kill switch in <5 seconds, container-level environment unset (back to the managed warehouse everywhere) in <3 minutes.
Where to dig deeper
Companion pages that go one level deeper on each surface:
- our analytics warehouse migration cutover guide — the customer-facing “when does my workspace flip and what should I expect?” story, including the per-cohort schedule and the data-source badge convention used during the burn-in window.
- Admin operations runbook — the ops side. Documents flag-flip procedures, parity-check commands, three-tier rollback, and the escalation matrix. SSR HTML is publicly readable as a trust signal (Stripe + Honeycomb pattern); the in-app component is admin-gated.
- Performance architecture — the four-cause LCP taxonomy and four-layer cost-optimization stack that complement this data-architecture page. How /dashboard hits ~1.5s LCP while staying inside the warehouse cost budget.
- our analytics warehouse auth model — the multi-tenant authentication discipline that our analytics warehouse path also implements: every read query carries an authenticated
workspace_idparameter; no user-supplied workspace path exists. - How data works — the end-to-end pipeline walk-through, from pixel hit through datasource through materialized view through pipe through API endpoint through React card.
- Revenue data flow — the four ingestion paths into the warehouse and the source-additive collapse semantics that keep install-day workspaces from looking empty.
- Revenue tracking model — the canonical doc for the three revenue datasources (visitor_payments + revenue_events + orders), the 90-day click-ID + 365-day first-touch attribution model, and the 11 canonical metric formulas.
- Methodology + /methodology/data.json — published parity numbers, latency benchmarks, and the data-quality signals we publish quarterly.
FAQ
The questions support gets most often about how Admaxxer’s data pipeline is shaped. Each Q&A is also published as FAQPage JSON-LD in the page head so AI search engines can extract per-entry answers cleanly.
How does Admaxxer process data?
Three ingestion paths land into a single dual-write pipeline. The first-party pixel (events from client-pixel/) flows into the ingest API; Shopify orders flow in via admin webhooks; Meta, Google, TikTok, Amazon, Pinterest, and Klaviyo flow in via daily ad-platform syncs. Every event is written to both the current canonical store and the new first-party analytics warehouse in parallel. A daily-aggregated revenue rollup collapses the per-event rows into the per-day KPIs your /dashboard reads. The current store remains canonical until each workspace passes a 14-day parity burn-in, then we flip its per-workspace flag to read from the new warehouse instead.
Will my data move during our analytics warehouse migration?
No. The dual-write pattern means both warehouses receive every event from the moment shadow writes are enabled. our analytics warehouse stays canonical until per-workspace parity verification confirms our analytics warehouse returns the same numbers within ±1% for at least 14 consecutive days. When your workspace flips, both warehouses are still in sync — opt-out is non-destructive at any point. See the cutover guide at /documentation/analytics-warehouse-migration for the per-workspace schedule.
Is there downtime during a per-workspace cutover?
Zero downtime. The cutover is a single column update in our admin database that re-routes which warehouse answers your dashboard queries. The API response shape is byte-identical between the two backends — every field name, every type, every nullable shape stays the same. The most visible effect is faster page loads (LAN-bound our analytics warehouse responds in sub-millisecond vs our analytics warehouse's tens of milliseconds in our prior cross-region setup path).
How long is the parity burn-in before my workspace migrates?
Fourteen consecutive days of ≤1% per-column drift on every numeric KPI in the revenue rollup pipe. Any drift over 1% on any column resets the clock to day zero. The burn-in is intentionally conservative — it covers the polled-fallback fold that fills install-day workspaces and gives our admin team room to investigate any anomaly before flipping a workspace flag. The clock is per-workspace, not per-cohort.
Can I export my historical data?
Yes. The /api/v1/* endpoints your dashboard uses are the same ones you can hit programmatically with an API key (Settings → API keys). All endpoints return the same response shape regardless of which warehouse is canonical for your workspace. Raw event export for bulk migrations is available via support — we keep 13+ months of source events to match the prior retention guarantee.
What happens if our analytics warehouse box goes down?
Zero customer impact. The per-workspace feature flag falls back to the managed warehouse automatically — any 5xx or timeout from our analytics warehouse side flips the request through to our analytics warehouse (which is still receiving every dual-write), your dashboard renders normally, and you see no warning. Internally we get paged. Roll-forward is a one-click flag flip per workspace; full warehouse rollback is a single environment variable change.
Where is our analytics warehouse data stored?
On dedicated infrastructure in our cloud region. The analytics warehouse is reachable only on our private network — public ingress is closed. It is co-located with our app servers and OLTP database on the same private network so latency stays sub-millisecond. Automated daily backups with 14 daily + 12 monthly snapshots retained on a separate disk path, plus a weekly automated restore-from-backup smoke test that proves the backup chain is recoverable.
Why two backends today instead of just cutting over?
The dual-write pattern is the safe migration discipline that the industry settled on for warehouse swaps. We get a 14-day burn-in with real production traffic on both sides before flipping any workspace, automated parity verification catches drift before customers see it, and one-click rollback is always available. Going straight to a cutover would have meant zero burn-in time to catch the kind of subtle aggregation-semantic differences that drift detection caught for us mid-flight.
Related
our analytics warehouse migration cutover guide · Admin operations runbook · Performance architecture · our analytics warehouse auth model · How data works (end-to-end) · Revenue data flow · Revenue tracking model · Methodology + published numbers · Documentation home
Questions or feedback: support@admaxxer.com.