Introduction
Revenue is the first number a merchant looks at. Everything else — profit, ROAS, LTV — flows from it. If revenue is wrong, the entire analytics product is wrong. Merchants are making ad spend decisions, inventory decisions, hiring decisions based on numbers they trust to be correct.
At TrueProfit, we build a Shopify analytics app. Our core product is helping merchants understand their true profit after ad spend, COGS, shipping, and refunds. The centerpiece is the revenue figure shown on the dashboard: gross revenue, net revenue after refunds, revenue by channel.
For months, that number was slightly off. Not wildly wrong — close enough that merchants didn't immediately notice. But off. A few percentage points here, a discrepancy on a specific day there. The kind of wrong that erodes trust slowly, until one day a merchant with a spreadsheet emails support asking why their Shopify admin shows $48,230 but TrueProfit shows $47,110.
This is the story of how I found it, fixed it, and what I learned about building systems that handle financial data.
The Revenue Pipeline
TrueProfit's revenue data flows through several layers. Shopify sends webhooks when orders are created, updated, or refunded. We process these in reportfns — serverless Lambda functions responsible for aggregating order data into daily/weekly/monthly report summaries stored in MongoDB.
The reportfns Lambda receives order events and incrementally updates a daily report document. When an order comes in: add to gross revenue. When a refund arrives: subtract from revenue, add to refunds. When an order is edited: apply the delta.
Simple in theory. The problem is that Shopify's order model is surprisingly complex, and the real world doesn't send events in the clean sequence you'd expect.
The Problem
When we started investigating, the first thing we did was build a comparison tool: pull the last 30 days of orders directly from the Shopify Admin API for a set of test shops, calculate what revenue should be, and compare against what our database stored.
The results were uncomfortable. ~6% of daily revenue figures were wrong — some over-counted, some under-counted. The errors weren't random noise. They clustered around specific patterns: shops with lots of refunds, shops using multi-currency, shops that frequently edit orders after creation.
Edge Cases Everywhere
Shopify's order model has more states than you'd think:
- Partial refunds — a merchant refunds one line item out of five. Shopify fires a
refunds/createevent. But the refund object contains line item refunds, shipping refunds, and adjustments as separate fields. Easy to double-count or miss parts. - Order edits — Shopify added
orders/editedwebhooks relatively recently. Older webhook handlers don't know about this event type and silently drop it. The order's total changes but your database doesn't. - Multi-currency — Shopify presents prices in the shop's currency and the customer's currency. Refunds are issued in the customer's currency. If you convert at different times (order creation vs. refund creation), you get phantom currency gains/losses.
- Timezone mismatches — Shopify timestamps are UTC. Merchants think about revenue in their local timezone. An order at 11:30pm UTC is Tuesday in Vietnam but Wednesday in New York. If you bucket by Shopify's UTC date, you'll disagree with the merchant's Shopify admin view which shows local time.
- Test orders — Shopify marks some orders as test orders. They shouldn't count toward revenue. Easy to accidentally include them if you don't check the
testfield. - Cancelled orders with refunds — if an order is cancelled and fully refunded, should it appear in revenue at all? Shopify says yes (gross revenue minus refunds = 0). Some merchants expect it to not appear at all.
Accumulated Patches
The original reportfns code was written to handle the happy path: order comes in, add revenue. As edge cases were discovered over time, patches were applied on top. The result was something like this:
func handleOrderEvent(ctx context.Context, event OrderEvent) error {
order := event.Order
// Original logic
revenue := order.TotalPrice
// Patch #1 (3 months later): subtract refunds
if order.TotalRefunded > 0 {
revenue -= order.TotalRefunded
}
// Patch #2 (5 months later): handle currency
// TODO: this is wrong for multi-currency, fix later
if order.Currency != shopCurrency {
revenue = convertCurrency(revenue, order.Currency, shopCurrency)
}
// Patch #3 (7 months later): someone noticed test orders
if order.Test {
return nil
}
// Patch #4 (8 months later): cancelled orders
if order.CancelledAt != nil && order.TotalRefunded >= order.TotalPrice {
// skip? or not? unclear...
// return nil
}
// Update the daily report
return upsertDailyReport(ctx, shopID, orderDate(order), revenue)
}
Notice the commented-out return, the TODO that never got fixed, the patch applied after the currency conversion that means test orders after the currency conversion are still excluded but test orders before aren't. This code had been touched by multiple engineers over 18 months. Nobody held a full mental model of what it actually did.
Worse: each patch was applied as an incremental delta handler. There was no mechanism to go back and recalculate old data when a bug was fixed. Every fix only applied going forward. Old reports stayed wrong.
Investigation
Tracing the Discrepancy
I started by writing a diagnostic tool that compared our stored revenue against a freshly calculated figure for a given shop and date range. The approach:
- Fetch all orders for the shop directly from Shopify Admin API (paginated)
- Apply our revenue calculation rules to those raw orders
- Compare against what MongoDB stored
- Log discrepancies with the specific order IDs involved
type AuditResult struct {
ShopID string
Date time.Time
StoredRevenue float64
ActualRevenue float64
Delta float64
DeltaPct float64
DiscrepantOrders []string
}
func auditShopRevenue(ctx context.Context, shopID string, from, to time.Time) ([]AuditResult, error) {
// Fetch raw orders from Shopify (source of truth)
orders, err := shopifyClient.GetOrdersInRange(ctx, shopID, from, to)
if err != nil {
return nil, fmt.Errorf("fetch orders for %s: %w", shopID, err)
}
// Group by local date (using shop's timezone)
shop, _ := shopRepo.Get(ctx, shopID)
loc, _ := time.LoadLocation(shop.Timezone)
byDate := make(map[string][]ShopifyOrder)
for _, o := range orders {
localDate := o.CreatedAt.In(loc).Format("2006-01-02")
byDate[localDate] = append(byDate[localDate], o)
}
var results []AuditResult
for dateStr, dayOrders := range byDate {
actual := calculateRevenueFromOrders(dayOrders, shop)
stored, _ := reportRepo.GetDailyRevenue(ctx, shopID, dateStr)
delta := actual - stored
if math.Abs(delta) > 0.01 { // ignore float rounding < 1 cent
results = append(results, AuditResult{
ShopID: shopID,
Date: parseDate(dateStr),
StoredRevenue: stored,
ActualRevenue: actual,
Delta: delta,
DeltaPct: delta / actual * 100,
})
}
}
return results, nil
}
Root Causes
Running the audit across 50 shops over 90 days surfaced four distinct root causes:
| Root Cause | Frequency | Impact |
|---|---|---|
| Timezone bucketing (UTC vs. shop local time) | All shops | Revenue moved across day boundaries, net zero but days wrong |
| Partial refund double-counting | ~40% of shops | Refunds subtracted twice when refunds/create event arrived after orders/updated |
| Multi-currency conversion timing | ~25% of shops | Order converted at creation rate, refund converted at different rate |
| Order edit events dropped | ~15% of shops | orders/edited webhook not registered; post-creation order changes lost |
The timezone issue was the most pervasive. Every shop was affected, but in a way that averaged out over the month (orders shift between days, not disappear). The partial refund double-count was the most damaging in absolute dollar terms.
The Fix
The Source of Truth Approach
The key insight was this: incremental event processing is the wrong model for financial data.
Incremental updates work fine when each event carries the full delta and events arrive in order and are never duplicated. Shopify's webhook delivery satisfies none of these properties. Webhooks can arrive out of order. They can be retried (duplicate delivery). A single order state change can generate multiple overlapping events. And there's no guaranteed ordering between orders/updated and refunds/create for the same refund.
The fix was to change the model entirely: instead of applying deltas, recalculate from the raw order state.
When any order event arrives for a shop on a given day, we don't try to figure out what changed. We:
- Fetch the current state of all orders for that shop on that day from Shopify
- Calculate revenue from scratch using a pure function
- Write the result, overwriting whatever was there before
This is idempotent. It's safe to re-run. It handles out-of-order events naturally because we don't care about order — we care about the final state. And it will automatically correct any previous miscalculation.
Implementation
The core of the new implementation is a pure calculateRevenue function that takes a slice of Shopify orders and returns the revenue breakdown for that day. No database calls. No side effects. Easy to test.
// RevenueResult is the daily revenue breakdown calculated from raw orders.
// All values are in the shop's currency.
type RevenueResult struct {
GrossRevenue float64
TotalRefunds float64
NetRevenue float64
OrderCount int
RefundCount int
}
// CalculateRevenue computes revenue from a slice of raw Shopify orders.
// Pure function: no I/O, no side effects, safe to call multiple times.
func CalculateRevenue(orders []ShopifyOrder, shopCurrency string) RevenueResult {
var result RevenueResult
for _, order := range orders {
// Skip test orders — they don't represent real revenue
if order.Test {
continue
}
// Skip orders that are fully cancelled AND fully refunded — they net to zero
// and pollute order counts without representing real business activity
if order.CancelledAt != nil && isFullyRefunded(order) {
continue
}
// Convert order total to shop currency at the exchange rate
// recorded at the time of order creation.
// CRITICAL: always use order.PresentmentCurrency + order.ExchangeRate
// never re-fetch current exchange rates — that introduces drift.
grossInShopCurrency := toShopCurrency(order.TotalPrice, order.PresentmentCurrency, order.ExchangeRate, shopCurrency)
result.GrossRevenue += grossInShopCurrency
result.OrderCount++
// Calculate refunds: sum all refund transactions
// DO NOT use order.TotalRefunded — it can be stale on webhook payloads.
// Walk order.Refunds[] directly.
for _, refund := range order.Refunds {
refundAmt := refundAmount(refund)
refundInShopCurrency := toShopCurrency(refundAmt, order.PresentmentCurrency, order.ExchangeRate, shopCurrency)
result.TotalRefunds += refundInShopCurrency
result.RefundCount++
}
}
result.NetRevenue = result.GrossRevenue - result.TotalRefunds
return result
}
// isFullyRefunded returns true if the order's refunds cover its entire price.
func isFullyRefunded(order ShopifyOrder) bool {
var totalRefunded float64
for _, r := range order.Refunds {
totalRefunded += refundAmount(r)
}
// Use a small epsilon to handle float precision
return totalRefunded >= order.TotalPrice-0.01
}
// refundAmount returns the net amount refunded in an order's presentment currency.
// It sums line item refunds + shipping refunds - restocking fees.
func refundAmount(refund ShopifyRefund) float64 {
var total float64
for _, t := range refund.Transactions {
if t.Kind == "refund" && t.Status == "success" {
total += t.Amount
}
}
return total
}
The handler becomes much simpler. It doesn't need to understand what changed in the event — it just triggers a recalculation:
func handleOrderEvent(ctx context.Context, event OrderEvent) error {
shop, err := shopRepo.Get(ctx, event.ShopID)
if err != nil {
return fmt.Errorf("get shop %s: %w", event.ShopID, err)
}
// Determine the affected date in the shop's local timezone
loc, err := time.LoadLocation(shop.Timezone)
if err != nil {
return fmt.Errorf("load timezone %s: %w", shop.Timezone, err)
}
affectedDate := event.OccurredAt.In(loc).Truncate(24 * time.Hour)
// Fetch current order state from Shopify (source of truth)
orders, err := shopifyClient.GetOrdersForDate(ctx, event.ShopID, affectedDate, loc)
if err != nil {
return fmt.Errorf("fetch orders for %s on %s: %w", event.ShopID, affectedDate.Format("2006-01-02"), err)
}
// Calculate revenue from scratch — pure, idempotent
rev := CalculateRevenue(orders, shop.Currency)
// Upsert the daily report — overwrite, don't accumulate
return reportRepo.UpsertDailyRevenue(ctx, UpsertRevenueParams{
ShopID: event.ShopID,
Date: affectedDate,
GrossRevenue: rev.GrossRevenue,
TotalRefunds: rev.TotalRefunds,
NetRevenue: rev.NetRevenue,
OrderCount: rev.OrderCount,
RefundCount: rev.RefundCount,
RecalcAt: time.Now(),
})
}
Notice what's gone: no conditionals on event type, no delta logic, no if order.Test scattered around, no TODO comments. The handler is 30 lines. The complexity lives in CalculateRevenue which is a pure function with unit tests covering every edge case.
Edge Case Handling
Because we now have a pure function with full control over the calculation, edge cases become explicit and testable:
func TestCalculateRevenue(t *testing.T) {
tests := []struct {
name string
orders []ShopifyOrder
currency string
want RevenueResult
}{
{
name: "simple order, no refunds",
orders: []ShopifyOrder{
{TotalPrice: 100.0, PresentmentCurrency: "USD", ExchangeRate: 1.0},
},
currency: "USD",
want: RevenueResult{GrossRevenue: 100.0, NetRevenue: 100.0, OrderCount: 1},
},
{
name: "partial refund — only refunded transactions, not order.TotalRefunded",
orders: []ShopifyOrder{
{
TotalPrice: 100.0,
PresentmentCurrency: "USD",
ExchangeRate: 1.0,
Refunds: []ShopifyRefund{{
Transactions: []RefundTransaction{
{Kind: "refund", Status: "success", Amount: 30.0},
},
}},
},
},
currency: "USD",
want: RevenueResult{GrossRevenue: 100.0, TotalRefunds: 30.0, NetRevenue: 70.0, OrderCount: 1, RefundCount: 1},
},
{
name: "test order — excluded from revenue",
orders: []ShopifyOrder{
{TotalPrice: 200.0, Test: true},
},
currency: "USD",
want: RevenueResult{},
},
{
name: "fully cancelled and refunded — excluded",
orders: []ShopifyOrder{
{
TotalPrice: 50.0,
CancelledAt: ptr(time.Now()),
Refunds: []ShopifyRefund{{
Transactions: []RefundTransaction{
{Kind: "refund", Status: "success", Amount: 50.0},
},
}},
},
},
currency: "USD",
want: RevenueResult{},
},
{
name: "multi-currency: USD order in EUR shop",
orders: []ShopifyOrder{
{
TotalPrice: 100.0,
PresentmentCurrency: "USD",
ExchangeRate: 0.92, // 1 USD = 0.92 EUR at order time
},
},
currency: "EUR",
want: RevenueResult{GrossRevenue: 92.0, NetRevenue: 92.0, OrderCount: 1},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := CalculateRevenue(tt.orders, tt.currency)
if !almostEqual(got.GrossRevenue, tt.want.GrossRevenue) {
t.Errorf("GrossRevenue: got %.2f, want %.2f", got.GrossRevenue, tt.want.GrossRevenue)
}
// ... other field checks
})
}
}
These tests are the real value. Before the rewrite, there were zero unit tests for revenue calculation. Now there are 23 test cases covering every edge case we found in production.
The Reconciliation Job
Design
After the fix was deployed and the numbers started looking correct, I wanted a safety net. A recurring job that would proactively compare our stored revenue against Shopify's numbers and alert if drift appeared.
The design was straightforward: a daily Lambda that runs on a cron, picks a sample of active shops, fetches their last 7 days of orders from Shopify, recalculates revenue, and compares against our database. Any discrepancy above 0.1% triggers a Slack alert with the shop ID and affected dates.
type ReconciliationJob struct {
shopRepo ShopRepository
reportRepo ReportRepository
shopifyClient ShopifyClient
alerter Alerter
}
func (j *ReconciliationJob) Run(ctx context.Context) error {
// Sample 5% of active shops per run — full scan is too expensive
shops, err := j.shopRepo.GetActiveSample(ctx, 0.05)
if err != nil {
return fmt.Errorf("sample active shops: %w", err)
}
var alerts []DiscrepancyAlert
for _, shop := range shops {
discrepancies, err := j.checkShop(ctx, shop, 7) // last 7 days
if err != nil {
// Log but don't fail the whole job — one shop's error shouldn't block others
log.Error("reconciliation check failed", "shop_id", shop.ID, "err", err)
continue
}
alerts = append(alerts, discrepancies...)
}
if len(alerts) > 0 {
return j.alerter.Send(ctx, alerts)
}
return nil
}
func (j *ReconciliationJob) checkShop(ctx context.Context, shop Shop, days int) ([]DiscrepancyAlert, error) {
loc, _ := time.LoadLocation(shop.Timezone)
now := time.Now().In(loc)
var alerts []DiscrepancyAlert
for d := 0; d < days; d++ {
date := now.AddDate(0, 0, -d).Truncate(24 * time.Hour)
orders, err := j.shopifyClient.GetOrdersForDate(ctx, shop.ID, date, loc)
if err != nil {
return nil, fmt.Errorf("fetch orders for day -%d: %w", d, err)
}
actual := CalculateRevenue(orders, shop.Currency)
stored, err := j.reportRepo.GetDailyRevenue(ctx, shop.ID, date)
if err != nil {
return nil, fmt.Errorf("get stored revenue: %w", err)
}
if discrepancyPct(actual.NetRevenue, stored.NetRevenue) > 0.001 {
alerts = append(alerts, DiscrepancyAlert{
ShopID: shop.ID,
Date: date,
Actual: actual.NetRevenue,
Stored: stored.NetRevenue,
DeltaPct: discrepancyPct(actual.NetRevenue, stored.NetRevenue),
})
}
}
return alerts, nil
}
Why It Got Rejected
I was fairly proud of this. Clean code, proper sampling, alerting on drift. I opened the PR and waited for approval.
It got rejected. Not because of a technical flaw. Not because of operational trade-offs. The tech lead simply didn't agree with the approach — he believed the existing incremental logic could be patched further, and that a full reconciliation was overkill.
I explained the context. The incremental approach had accumulated 18 months of patches — each one fixing a symptom while introducing new edge cases. The root problem was architectural: you can't reliably compute financial totals from deltas when the deltas themselves are unreliable (missed webhooks, out-of-order events, partial data).
The response was firm: rejected. I closed the PR and moved on.
What Happened Next
Over the next two weeks, the tech lead attempted to fix the revenue accuracy himself using the incremental patching approach. What followed was... educational.
The public status page tells the story. Over those two weeks:
- 13 production releases deployed trying to fix cascading issues
- Ghost/phantom records inflated merchant sales by $5K–$20K per shop
- 357,000 historical records had to be backfilled due to missing refund timestamps
- Duplicate report entries created by race conditions in the patched sync logic
- Discount allocation gaps introduced by yet another incremental patch
Each patch fixed one symptom and introduced another. The exact failure mode I had warned about.
After two weeks of firefighting, the team quietly adopted the same approach I had proposed: recalculate from source of truth. Fetch raw order data from Shopify, compute totals from scratch, replace the stored values. The status page even mentions "a comprehensive redesign leveraging Shopify's Agreements API" — which is essentially what my PR did, with a different API surface.
I'm not bitter about it. This happens in engineering teams more often than we like to admit. Sometimes the decision-maker doesn't have full context. Sometimes ego gets in the way of evaluating a solution on its technical merits. The important thing is: the right solution won in the end. It just took two extra weeks of production pain to get there.
Results
After deploying the source-of-truth rewrite and running the audit tool across all shops to backfill corrected figures:
| Metric | Before | After |
|---|---|---|
| Revenue accuracy (vs. Shopify admin) | ~94% | 100%* |
| Days with discrepancy >0.1% | ~6% of shop-days | <0.03% (float precision) |
| Partial refund double-counting | Affected 40% of shops | Eliminated |
| Multi-currency drift | Unpredictable | Zero (fixed-rate at order time) |
| Order edit events handled | No | Yes (recalc on any event) |
| Unit test coverage on revenue calc | 0 tests | 23 test cases |
| Code complexity (revenue handler) | ~120 LOC, 6 patches | 30 LOC handler + pure function |
*100% accuracy assumes Shopify webhooks are delivered correctly and MongoDB is consistent. The calculation itself is deterministic — given the same input data, it always produces the correct result. If a webhook is missed or the DB has stale data, that's an infrastructure problem, not a calculation problem.
The merchant support tickets about revenue discrepancies dropped to zero within two weeks of the fix. That's the real signal.
Key Takeaways
Incremental delta-based updates are fragile for financial data. Out-of-order events, duplicates, and state mutations break delta logic in ways that are hard to detect and impossible to backfill. Recalculate from the authoritative source whenever anything changes.
Yes, fetching all orders for a day on every event is more expensive than applying a delta. But it's safe to retry, safe to run twice, and automatically self-corrects. The cost difference is small. The correctness difference is everything.
A pure function with no I/O is the only way to confidently unit test financial logic. If your revenue calculation has database calls embedded in it, you can't test edge cases without mocking half your infrastructure. Extract the math into a pure function first.
UTC timestamps and merchant local time diverge at day boundaries. Always convert to the shop's timezone before bucketing into report periods. Never assume UTC. This is the most common and most silent revenue bucketing bug.
Shopify's order.TotalRefunded can be stale on webhook payloads — it reflects the state at webhook delivery time, not the true current state. Walk the order.Refunds[] array and sum transactions yourself. Same applies to order.TotalPrice after an order edit.
A bug fix that only applies to new events leaves months of wrong data in the database. Build the backfill into the fix from day one. Our audit tool became the backfill tool — same calculation logic, run against historical data.
My source-of-truth approach was rejected. Two weeks and 13 production releases later, the team adopted the same approach. Don't let ego — yours or someone else's — delay the correct fix. Defend your solution with data, and if you're overruled, let reality do the convincing.
Timeline
CalculateRevenue function written with 23 unit tests. New handler: 30 LOC. All four root causes handled.