Choosing Billing Platform for Craftology¶
Executive Summary¶
Choosing how billing works in this project is not just a technical decision—it quietly shapes how the product behaves under pressure, how predictable costs feel to users, and how confidently the business can grow. AI workloads don’t follow neat patterns. A single action can trigger multiple expensive operations, sometimes in parallel, sometimes unpredictably. That makes billing less of a back-office concern and more of a core part of the system’s behavior.
Two platforms were considered as anchors for this layer: FlexPrice and Metronome. They approach the problem from different directions. FlexPrice feels closer to the product itself—it allows the team to shape how usage is measured and priced while the system is still evolving. Metronome, by contrast, brings structure and financial discipline. It is built for a stage where pricing is no longer a moving target and the focus shifts toward accuracy, reporting, and trust in the numbers.
The key insight is that this is not a binary choice. Trying to impose a fully structured billing system too early would slow the team down and lock in assumptions that are not yet stable. At the same time, leaving everything flexible forever would eventually create risk, both financial and operational. The right approach is to let the billing system grow alongside the product.
In practical terms, that means starting with FlexPrice as a flexible pricing layer while keeping strict control of usage inside the application through a dedicated quota service. This ensures that decisions about whether a request should run are always made instantly and reliably. As the product matures and revenue becomes more predictable, Metronome can be introduced to take over the financial side, handling invoicing, reporting, and auditability without disrupting how the product itself operates.
This layered approach keeps the system responsive today while laying a clear path toward a more structured and reliable billing foundation tomorrow.
Context: Why this decision matters¶
AI billing is fundamentally different from traditional SaaS:
- Costs are variable and multi-dimensional (tokens, images, seconds, credits)
- Requests can fan out into multiple expensive operations
- Usage is often unpredictable and bursty
This means billing must support:
- Real-time control (prevent overspending)
- Flexible pricing models (per model, per modality)
- Accurate reconciliation with vendors
Candidate Platform Overview¶
We've explored three contemporary billing and monetization platforms, commonly seen as alternatives in the usage-based pricing or SaaS- and AI-native ecosystems.
- Lago is an open‑source billing engine that we can host yourself and deeply customize.
- It supports usage metering, plans, coupons, add‑ons, one‑time charges, and prepaid credits with automated invoicing and revenue analytics.
- Fits teams that treat billing as part of their core infrastructure and want to avoid vendor lock‑in or heavy SaaS‑style billing tools.
- Flexprice is known for highly flexible, credit‑based, and hybrid pricing with strong product‑team control.
- It focuses on real‑time metering, credit wallets, entitlements enforcement, and fast pricing experiments (changing plans and rules without schema migrations).
- It plugs into Stripe and other stacks and offers multi‑currency, tax, and modular components for composable billing.
- Metronome best fits for high‑volume usage‑based or hybrid billing with deep metering and customer‑facing dashboards.
- It focuses on event‑based metering, credits, minimums, overages, and real‑time in‑product spend visibility for customers.
- Finance‑friendly features include centralized rate cards, simple price updates, and integration hooks into payment and CRM systems.
- Metronome has been acquired by Stripe and is being integrated into Stripe’s billing and monetization stack.
Quick decision‑oriented table¶
| Factor | Lago | Flexprice | Metronome |
|---|---|---|---|
| Ownership / control | Open‑source, self‑hostable github | Cloud‑native, highly configurable | Cloud‑native, API‑first |
| Best‑suited for | Engineering‑owned billing infra | AI‑native, fast‑pricing‑experiments | High‑volume usage‑based SaaS |
| Pricing flexibility | Strong plans + usage + credits | Very flexible credits, hybrid models | Tiered, per‑unit, credits, minimums |
| Customer‑facing views | Analytics and basic dashboards | Limited, more product‑/engineer‑focused | Strong in‑product spend dashboards |
| Vendor lock‑in concern | Low (self‑hosted option) | Medium (cloud‑native) | Medium (cloud‑native) |
Lago for AI‑usage billing¶
Lago is an open, code‑owned metering and billing layer that sits on top of your existing payment stack (e.g., Stripe). It supports flexible plans, allowances, add‑ons, and prepaid credits. This makes it well‑suited for AI startups experimenting with usage‑based models, where you need to track high‑volume events, enforce quotas, and generate accurate invoices and revenue analytics.
Because Lago is open‑source and self‑hostable, engineering teams retain full control over billing logic and can deeply customize it for AI‑specific workflows, while still reusing their current payment processor and accounting systems.
Key limitations¶
- Less product‑team‑friendly: Lago is oriented toward engineers, so pricing changes, bundles, and entitlements often require more code or configuration.
- Lighter AI‑native tooling: Lago leaves much of AI‑specific logic to custom implementation.
- Fewer out‑of‑the‑box UX and finance features: Lago often requires you to build or integrate user-oriented dashboards yourself.
- More self‑service overhead: Lago’s open‑source, self‑hostable model avoids lock‑in but shifts responsibility for reliability, scaling, and deep payment integrations onto product team.
Engineering effort for hierarchical quotas and real‑time billing¶
- Event ingestion and hierarchy modeling: PDevelopers design the event schema (e.g., org_id, team_id, user_id, model_type) and ingest them into Lago via API/webhook‑style calls; then devs define plans and add‑ons per account tier to represent hierarchical quotas.
- Quota enforcement: Lago exposes usage and overages; developers call its API or consume webhooks on each request to check limits and gate the API (or rely on client‑side logic in a gateway).
- Operational load: Project team owns deployment, scaling, and observability of Lago and any backing data store (e.g., ClickHouse), which is substantial but flexible for the full control.
Engineering effort: medium to high.
Flexprice for AI‑usage billing¶
Flexprice is purpose‑built for AI‑usage billing, handling high‑velocity events like API calls, while tying usage directly to pricing and entitlements. It ingests usage streams in real time, applies plan‑specific rules (rates, caps, thresholds), and generates accurate invoices.
Flexprice also supports credit wallets and hybrid models, such as prepaid credits, recurring fees, and per‑event pricing tailored to AI workloads. Its developer‑first design links metering, rating, and invoicing into a single layer, so product teams can experiment with credit‑based pricing and in‑product usage dashboards while keeping billing and feature access synchronized.
Flexprice is optimized around Stripe as the primary payment backbone.
Key limitations¶
- Cloud‑first, less self‑hosting flexibility: While Flexprice is open‑source, its default story is more cloud‑native and managed‑service oriented.
- Focuses more on credits, contracts, and AI‑product workflows, so at extreme scale it may require implementing own pipelines or integrations.
Engineering effort for hierarchical quotas and real‑time billing¶
- Event‑driven metering: the app sends usage events (tokens, GPU‑seconds, API calls) to Flexprice, which applies pricing and quota rules immediately using its pricing engine and credit‑based wallet model.
- Quota enforcement: Flexprice offers SDKs and APIs to check remaining credits and limits synchronously, so the app's API or gateway can block or throttle before exceeding soft or hard quotas.
- Operational load: Developers need integrate Flexprice as a managed service (cloud‑first), relying on its ingestion pipeline (e.g., Kafka‑like streaming, idempotent processing) and data warehouse integrations. Devs can focus mainly on schema design and feature‑level checks.
Engineering effort: low to medium.
Metronome for AI‑usage billing¶
Metronome is a usage‑based billing platform where you charge per‑token, per‑API‑call, per‑compute‑unit, or similar consumption metrics. It ingests high‑volume usage events in real time, applies flexible pricing rules (tiered, hybrid, credits, minimums, overages), and automatically generates accurate invoices tied to actual usage.
Metronome is designed around contract‑aware billing, so it can handle complex enterprise deals, custom pricing agreements, and multi‑account hierarchies. It also gives real‑time visibility into credit usage and customer‑health metrics, which helps product and finance teams align pricing directly with value delivered.
Key limitations¶
- Less flexible data handling: Metronome is optimized for structured, pre‑aggregated usage data, so deeply custom or multi‑dimensional event schemas often need external transformation logic.
- Limited self‑hosting and lock‑in: Metronome is a cloud‑first service heavily integrated with Stripe, so you have less infra control than with other considered platforms.
- Fewer out‑of‑the‑box global features: Metronome offers less built‑in support for multi‑currency, regional pricing, and tax‑compliance tooling, often pushing that complexity back into your stack, while competitors increasingly bundle those capabilities more tightly.
Engineering effort for hierarchical quotas and real‑time billing¶
- Event streaming and hierarchy: The app pushes usage events into Metronome’s streaming pipeline (e.g., over Kafka), tagged with org/team/project identifiers; Metronome aggregates these into billable metrics and ties them to hierarchical contracts.
- Quota and spend limits: Metronome computes spend and usage in real time; the app either reacts to its webhooks/alerts or queries its API to enforce quotas (e.g., hard‑stop at a monthly budget), often with auxiliary logic in the app's gateway or orchestration layer.
- Operational load: Metronome itself runs on a managed data‑streaming backbone; the app mostly maintain the event‑production layer and its own UI/alerting, rather than a billing‑engine cluster.
Engineering effort: medium, skewed toward event‑pipeline design and contract modeling.
Scalability and Integration¶
Lago¶
Lago offers a usage‑based, self‑hosted billing platform with Business and Enterprise tiers that scale with event volume and billing complexity. Lago lets you experiment with complex pricing (tiers, overages, custom enterprise plans) without rebuilding the core billing engine.
-
Start: Create an account on Lago, define your first product, metering events, and a simple plan (e.g., base subscription + per‑token pricing). Use the hosted version initially or deploy Lago in your cloud/on‑prem environment for full control.
-
Integrate:
- Instrument your API or services to emit usage events (e.g.,
org_id,team_id,usage_type,value) to Lago’s API or webhook. - Configure plans, quotas, and add‑ons in Lago; then call its API or consume webhooks on each request to check limits and enforce hierarchical quotas at the gateway or application layer.
- Sync invoices and revenue data with your chosen payment processor (e.g., Stripe) or accounting stack via Lago’s integrations or exports.
- Instrument your API or services to emit usage events (e.g.,
Flexprice¶
Flexprice positions itself as monetization infrastructure for AI‑native and SaaS companies, supporting usage‑based, credit‑based, and hybrid pricing at scale. It exposes usage‑based, per‑unit, and tiered plans via its product catalog and scales by handling millions of events per second over stream‑like ingestion, with credit‑based wallets and real‑time rating built‑in. Flexprice is optimized for dynamic pricing experiments and global operations, including multi‑currency, multiple entities, and complex enterprise contracts.
-
Start: Sign up for Flexprice, define your “units” (billing metrics), and create initial plans or bundles in the Flexprice dashboard or via API.
-
Integrate:
- Send usage events from your API or workflow engine to Flexprice’s ingestion endpoints (push‑based or stream‑oriented APIs).
- Use Flexprice’s SDKs or APIs to check remaining credits and limits synchronously before executing high‑cost AI actions, implementing quota enforcement in your gateway or service layer.
- Connect Flexprice with your payment processor (e.g., Stripe) and accounting tools; Flexprice automatically translates usage into invoices and generates records for reconciliation and reporting.
Metronome¶
Metronome is designed to ingest millions of events per second through a streaming‑first pipeline, while keeping contract‑aware pricing, discounts, and minimums in a single source of truth. After its acquisition by Stripe, Metronome is tightly integrated into Stripe’s billing ecosystem, making it particularly attractive for companies that want scalable, Stripe‑backed usage‑based monetization.
- Start: Sign up for Metronome, define your event schema (e.g.,
org_id,team_id,model_type) and map them to meters and pricing indices; then create standard or custom plans for your customers. - Integrate:
- Stream AI‑usage events into Metronome’s ingestion layer (e.g., via Kafka‑style topics or HTTP‑based endpoints), tagged with hierarchical identifiers.
- Use Metronome webhooks, alerts, or API reads to react when spend or usage approaches budget or quota limits; plug these signals into your API gateway or orchestration layer to throttle or block requests.
- Let Metronome feed Stripe (or another processor) for invoicing and revenue reporting, while keeping your own dashboards and admin UIs synchronized via Metronome’s queries and event streams.
Total Cost of Ownership¶
Lago¶
Commercial / managed‑cloud costs¶
Lago’s Perform (cloud) plan is about $599/month and includes up to $100 k in monthly billing volume; beyond that you pay roughly 0.75% of revenue. At scale (e.g., \(1 M/month in billing), this fee alone can reach ~\)6–7.5 k/month above the base fee.
Self‑hosted Lago is free from vendor‑platform fees, but you must pay for your own infrastructure, staffing, and reliability.
Engineering and infra¶
Higher if you self‑host; you must run and scale databases, queues, and billing‑engine nodes, plus build and maintain custom integrations and quota‑enforcement layers.
Lower if you start with Lago’s managed cloud, but you still need nontrivial engineering work for event‑based metering and API‑level gating.
Flexprice¶
Commercial / managed‑cloud costs¶
Flexprice is positioned as an open‑source but cloud‑managed usage‑based platform focused on AI and SaaS, with pricing typically tied to event volume and feature depth (not a simple %‑of‑revenue model like Lago’s cloud plan).
Exact numbers are often negotiated, but Flexprice tends to avoid aggressive “revenue tax” fees, instead charging fixed or usage‑based tiers plus optional enterprise add‑ons.
Engineering and infra TCO¶
Flexprice is designed to reduce engineering overhead: you mainly define your event schema, connect ingestion (API or stream), and use SDKs or APIs to enforce quotas and credits in your gateway.
Because it is built for real‑time billing and entitlements, you trade some infra ownership for lower ongoing development and operational load versus a self‑hosted Lago.
Metronome¶
Commercial / managed‑cloud costs¶
Metronome is enterprise‑oriented and typically sold via custom annual contracts rather than a simple public‑tier grid. Estimates from practitioners put Metronome in the ~$10 k+ per year range for mid‑sized usage‑based SaaS, depending on volume and features.
It does not usually charge a %‑of‑revenue fee but instead uses fixed‑fee, tiered pricing based on usage events and features.
Engineering and infra TCO¶
Metronome is heavy on event‑streaming and contract‑aware billing, so you do need engineering effort to build and maintain the ingestion pipeline (e.g., Kafka / HTTP streams) and to wire its outputs into your API and billing stack.
However, you don’t run a billing engine cluster yourself; Metronome handles the high‑volume billing logic, so your infra TCO is lower than self‑hosted Lago but still not zero
TCO Estimations¶
Assumptions¶
- Billing volume: 10 M events / month (~0.3–0.4 M per day, 1–2 k/sec average).
- Team: 2 backend engineers.
Lago TCO (cloud‑first scenario)¶
-
Vendor cost
- Relies on Lago’s cloud plan (self‑hosted would push more cost onto your infra): Perform‑tier‑like offering roughly \(600–\)1.2 k/month at your volume, plus a small %‑of‑revenue component only if you grow much larger.
For more information, review Lago Pricing Plans.
- Relies on Lago’s cloud plan (self‑hosted would push more cost onto your infra): Perform‑tier‑like offering roughly \(600–\)1.2 k/month at your volume, plus a small %‑of‑revenue component only if you grow much larger.
-
Engineering / infra cost
- You still must:
- Instrument your API to emit events and call Lago’s API.
- Build gateway‑level quota checks (e.g., “fail fast” on over‑quota).
- Manage error handling, retries, and observability for billing flows.
- With 2 engineers, this is roughly 0.2–0.5 FTE ongoing.
- Infra cost (if not fully self‑hosting Lago): mostly your own app‑side queues/db; assume $0.5–2 k/month extra.
- You still must:
Flexprice TCO¶
-
Vendor cost
- Flexprice is typically self‑serve or light‑quote‑based with tiers starting around \(300–\)800/month for small‑to‑medium volumes, and higher usage tiers that scale with event volume.
For more information, review Flexprice Pricing Plans. - At 10 M events/month, assume ≈ $1.5–3 k/month once you pass the starter tier.
- Flexprice is typically self‑serve or light‑quote‑based with tiers starting around \(300–\)800/month for small‑to‑medium volumes, and higher usage tiers that scale with event volume.
-
Engineering / infra cost
- Flexprice greatly reduces billing‑engine work because it provides SDKs and APIs for real‑time credits and entitlements. You mainly send events and plug quota checks into your gateway.
- With 2 engineers, this is closer to 0.1–0.3 FTE
- Infra: little extra beyond existing API‑side stack; assume $0.5–1.5 k/month.
Metronome TCO (enterprise‑style)¶
-
Vendor cost
- Metronome positions itself as an enterprise‑grade billing platform with quote‑based pricing and no public self‑serve grid.
- For mid‑sized SaaS at the proposed scale (millions of events, AI‑style usage), ballpark is often ~$10–20 k/month (≈ $120–240 k/year) once you’re out of trial.
For more information, review Best usage based billing software: In-depth reviews.
-
Engineering / infra cost
- You still must:
- Build a streaming or HTTP‑based ingestion pipeline (e.g., Kafka‑style or batch‑stream) into Metronome.
- Wire quota, spend alerts, and throttling into your API gateway.
- However, Metronome owns the billing‑engine core, so you avoid running a billing DB/metrics cluster.
- With 2 engineers, assume 0.3–0.6 FTE for ingestion and integration, ≈ $6–12 k/month.
- Infra mainly your own stream‑ingestion and compute; ≈ $1–3 k/month.
- You still must: