Initial Setup for Vercel AI Gateway API Keys — Avoiding AI Bankruptcy with Spend Quota and Auto-reload

When you start using Vercel AI Gateway, the first thing you usually touch is issuing API keys, then configuring Auto-reload and Spend Quota. Take the defaults without thinking, and unexpected traffic or a runaway agent stuck in an infinite loop can rack up a bill many times what you anticipated. The popular term for this failure mode is “AI bankruptcy.”

This post is the minimum set of values to lock in before you do anything else. For everything beyond this floor, work through the official Vercel AI Gateway docs.

Keep Auto-reload Off at First to Build a Sense of Cost

Auto-reload automatically tops up your AI Gateway balance to a target amount whenever it drops below a threshold. In steady-state production it is almost mandatory, but at the very beginning, turn it off.

The reason is simple: you want to feel “what action costs how much” in your gut before you automate the refill. If Auto-reload is on from day one, the balance silently regenerates while you burn tokens, and you never develop a sense of cost. On manual top-up, the signal “wait, I refilled this last week and it is already gone” hits you directly, and it becomes much easier to identify which workloads are expensive.

A sensible minimum configuration looks like this:

Auto-reload: OFF (for the first few weeks to months)
When Balance Falls Below: 5 USD (as a future threshold once you switch Auto-reload on)
Recharge To Target Balance: 25 USD (same)
Maximum Monthly Spend: 50 USD (the monthly ceiling, applies once Auto-reload is on)

Once you have a feel for it, switch Auto-reload on and let Maximum Monthly Spend be the cap that prevents an unbounded month. Maximum Monthly Spend is an account-wide budget ceiling, distinct from the per-key Spend Quota covered below.

Issue Three Separate API Keys: production, preview, development

Resist the urge to issue a single API key and reuse it across all environments.

Vercel standardizes on three environments out of the box: production, preview, and development (see Vercel Environments). Mirror that on the AI Gateway side by issuing one key per environment, named for example:

your-app-production
your-app-preview
your-app-development

This split means that if a preview-environment agent goes haywire and hammers the API, your production key stays untouched. Per-key Spend Quota, covered next, also benefits from this split because you can tighten preview, tighten development even further, and leave production sized for real user traffic.

It also makes rotation safer: rotate production first, or try the new key on preview before promoting it, without coupling the two operations.

Mark production and preview Keys as Sensitive in Vercel env

When you store these keys in Vercel environment variables, the rule is straightforward.

production key → Mark as Sensitive
preview key → Mark as Sensitive
development key → Vercel does not allow the Sensitive flag on development-environment variables, so there is no Sensitive choice to make here (local handling covered below)

Vercel’s Sensitive Environment Variables cannot be read back as plaintext from the dashboard once stored. Anyone joining the project later cannot snoop the raw value through the UI, so for production credentials this should be the default rather than the exception.

Without the Sensitive flag, the value is visible to anyone with dashboard access. An AI Gateway API key, the moment it leaks, becomes an open faucet someone else can drain on your bill. For production and preview, there is no real choice to make: mark them Sensitive.

Since the development key cannot be marked Sensitive, local protection is what matters. vercel env pull writes to .env.local by default, so confirm its gitignore status immediately after pulling. This is a step worth turning into muscle memory.

Enable Spend Quota and Set Quota Refresh to Daily

Spend Quota, set per API key, is one of those features you enable almost unconditionally. If Auto-reload is the account-wide faucet, Spend Quota is the per-key choke valve behind it.

A reasonable starting combination:

Spend Quota: ON
Spend Quota ($): 100 (example for production; preview and development should be lower)
Quota Refresh: Daily

The critical choice here is setting Quota Refresh to Daily. If you pick Weekly or Monthly, a runaway agent that exhausts the budget in a single burst leaves you waiting a full week or a full month for the quota to come back. Daily means the worst case is “you lose today’s quota, and tomorrow the limit resets.” Being able to fall back on “we ate today’s budget by accident, recover tomorrow” is the real value of the Daily setting.

The right per-environment budget depends on the size of your application, but a stepped allocation is a good mental model to start from:

Environment	Typical workload	Spend Quota ($) guideline
production	Live user traffic	Derived from projected DAU and per-request cost
preview	Per-PR deployments, QA	1/5 to 1/10 of production
development	Local development, agent experiments	1/5 to 1/10 of preview

It looks counterintuitive for development to have the smallest quota, but infinite-loop bugs in agent code almost always fire during local development. Tighter limits there are safer than tighter limits in production.

Wrap-up — Make Sure You Notice When Things Break

Combining the above, the initial setup boils down to:

Keep Auto-reload off and use manual top-ups to build a feel for API cost
Issue three keys: production, preview, development
In Vercel env, mark the production and preview keys as Sensitive
Enable Spend Quota and lock Quota Refresh to Daily
Allocate budget in steps, with development tightest of all

AI Gateway is a convenient abstraction over many model providers behind a single API (see the Vercel AI Gateway overview for the full surface area). The convenience cuts both ways: the same one-click ergonomics that route cost-optimized inference everywhere can also route a runaway budget everywhere. Spinning up the fully-armed setup before you have a feel for cost amplifies the blast radius of the first incident.

For the first few weeks, the combination of Auto-reload off plus Daily Spend Quota optimizes for one thing: you notice on the same day that something broke. That is the right level of caution for the entry point, and the takeaway from this setup pass.

Model selection, rate limiting, and observability can all be layered in next by working through the Vercel AI Gateway docs one section at a time.

That’s all from the field, setting up AI Gateway API keys with cost guardrails in place.