---
title: "AI Gateway"
description: "Your bot works until you wake up to a $500 bill. Learn what AI Gateway actually provides (and what it doesn't), then implement production-grade cost controls. Reject expensive requests before they burn money."
canonical_url: "https://vercel.com/academy/slack-agents/ai-gateway"
md_url: "https://vercel.com/academy/slack-agents/ai-gateway.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-09T11:08:41.623Z"
content_type: "lesson"
course: "slack-agents"
course_title: "Slack Agents on Vercel with the AI SDK"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# AI Gateway

# Prevent runaway AI costs with pre-flight checks and token budgeting

**After this lesson, you'll:**

- Reject requests costing >$0.10 BEFORE making them (watch logs block expensive queries)
- See exact cost estimates: "Request would cost $0.1234" in logs
- Switch to cheaper models automatically based on your budget thresholds

```bash
# Cost-aware request handling in logs:
[INFO] Pre-flight cost check {
  inputTokens: 2341,
  estimatedOutputTokens: 500,
  estimatedCost: 0.0007,
  model: 'gpt-4o-mini',
  status: 'approved'
}

[INFO] AI request completed {
  model: 'gpt-4o-mini',
  actualCost: 0.0006436,
  estimatedCost: 0.0007,
  accuracy: '92%'
}

# When requests are too expensive:
[WARN] Request rejected - cost limit exceeded {
  estimatedCost: 0.1234,
  limit: 0.10,
  inputTokens: 15234,
  message: 'Request too large - break into smaller questions'
}
```

**What you'll build:** Pre-flight cost checks + token estimation + automatic model switching + manual spend review.

Your bot crashes at 3 AM with "rate limit exceeded." By morning, you've burned through $500 in retries because nobody was watching. The solution isn't magic metrics—it's **preventing expensive requests before they happen**. This lesson teaches you to build guardrails that save money while you sleep.

## Outcome

Implement cost controls that prevent expensive AI requests before they burn money, using token estimation and pre-flight checks.

## Understanding AI Gateway

AI Gateway provides:

- A unified API endpoint routing to 100+ models across providers
- Automatic provider fallback when primary fails
- Request/response passthrough with no markup on BYOK pricing
- A dashboard showing: Requests by Model, TTFT, Token Counts, Spend (manual viewing only)

**What this means for production:**
You can't query Gateway metrics from code. But you CAN control costs by estimating token counts and rejecting expensive requests BEFORE making them. The AI SDK's `usage` field gives you actual costs after each request—store those yourself if you need historical tracking.

**The production pattern:** Pre-flight checks + your own tracking > hoping for magic metrics.

## Fast Track

1. Build token estimation for pre-flight cost checks
2. Reject requests exceeding cost thresholds BEFORE making them
3. Log actual vs estimated costs to tune your estimates
4. Review Gateway dashboard manually for spend trends

## Hands-On Exercise 4.5

Implement production cost controls without relying on non-existent metrics APIs:

**Requirements:**

1. **Token Estimation** for pre-flight cost checks:
   - Estimate input tokens from message array
   - Assume conservative output token count
   - Calculate cost based on model pricing

2. **Request Rejection** based on estimated cost:
   - Set per-request cost limit (e.g., $0.10)
   - Reject before making expensive API calls
   - Return user-friendly error messages

3. **Model Switching** based on manual budget thresholds:
   - Environment variable for "cheap mode" toggle
   - Use gpt-3.5-turbo when over budget
   - Log when switching occurs

4. **Cost Logging** for post-request analysis:
   - Log estimated vs actual costs
   - Track accuracy of estimates over time
   - Use for manual dashboard correlation

**Implementation hints:**

- Open the AI settings via team selector: `https://vercel.com/d?to=/[team]/~/ai/api-keys`, then navigate to the AI Gateway section for your project
- Gateway dashboard shows: Requests by Model, TTFT, Token Counts, Spend (no API access)
- January 2025 pricing: gpt-4o-mini $0.00015 input, $0.0006 output per 1K tokens
- AI SDK returns `usage` field with actual token counts after request completes

## Try It

1. **Test normal request with cost logging:**
   - Ask the bot a simple question in a short thread
   - Check logs for pre-flight and post-request cost tracking:
   ```
   [INFO] Pre-flight cost check {
     inputTokens: 2341,
     estimatedOutputTokens: 500,
     estimatedCost: 0.0007,
     model: 'gpt-4o-mini',
     status: 'approved'
   }

   [INFO] AI request completed {
     model: 'gpt-4o-mini',
     usage: {
       promptTokens: 2341,
       completionTokens: 487,
       totalTokens: 2828
     },
     actualCost: 0.0006436,
     estimatedCost: 0.0007,
     estimationAccuracy: '92%'
   }
   ```

2. **Test cost rejection with large context:**
   - Create a thread with 50+ messages
   - Ask a question that would include all context
   - Watch the pre-flight check reject it BEFORE making the request:
   ```
   [WARN] Request rejected - cost limit exceeded {
     estimatedCost: 0.1234,
     limit: 0.10,
     inputTokens: 15234,
     estimatedOutputTokens: 500,
     model: 'gpt-4o-mini'
   }
   ```
   - Bot responds: "Your request is too large. Please break it into smaller questions. (Estimated cost: $0.1234)"

3. **Test model switching with budget threshold:**
   - Set `FORCE_CHEAP_MODEL=true` in `.env` to simulate over-budget state
   - Ask a question
   - Watch logs show gpt-3.5-turbo selection:
   ```
   [WARN] Budget threshold triggered - using cheap model {
     reason: 'FORCE_CHEAP_MODEL environment variable set',
     selectedModel: 'gpt-3.5-turbo',
     normalModel: 'gpt-4o-mini'
   }
   ```

4. **Review Gateway dashboard manually:**
   - Navigate to `https://vercel.com/[team]/[project]/ai-gateway`
   - Check "Requests by Model" chart
   - Review "Spend" over time
   - Compare dashboard spend with your logged costs
   - Note: You're viewing this manually—no API access exists

\*\*Note: Gateway Dashboard Access\*\*

The Gateway dashboard is read-only and manual. You can't query these metrics from code, but you can review them periodically to:

- Verify your cost estimates are accurate
- Spot unusual spending patterns
- Compare different models' actual costs
- Track TTFT trends over time

## Commit

```bash
git add -A
git commit -m "feat(cost-control): add pre-flight checks and token budgeting for AI requests"
```

## Done-When

- [x] Pre-flight cost checks reject expensive requests before making them
- [x] Token estimation calculates cost from message array
- [x] Model switching based on budget threshold (environment variable)
- [x] Actual vs estimated costs logged for accuracy tracking
- [x] User-friendly error messages when requests rejected

## Solution

Create `/slack-agent/server/lib/ai/cost-control.ts`:

```typescript title="/slack-agent/server/lib/ai/cost-control.ts"
import type { ModelMessage } from "ai";

/**
 * Estimate tokens for messages array using rough character-to-token ratio
 * Use this for pre-flight cost estimation
 *
 * Note: This is an approximation. Actual tokens depend on tokenizer.
 * Rule of thumb: 1 token ≈ 4 characters for English text.
 * In production, use a real tokenizer like OpenAI's tiktoken for more accurate estimates:
 * https://github.com/openai/tiktoken
 */
export function estimateTokens(messages: ModelMessage[]): number {
  return messages.reduce((sum, msg) => {
    const content = typeof msg.content === 'string'
      ? msg.content
      : JSON.stringify(msg.content);
    // Rough estimate: 1 token per 4 characters
    return sum + Math.ceil(content.length / 4);
  }, 0);
}

/**
 * Calculate request cost before making it
 * Use for budget decisions and request rejection
 *
 * January 2025 pricing (per 1K tokens):
 * - gpt-4o-mini: $0.00015 input, $0.0006 output
 * - gpt-3.5-turbo: $0.0005 input, $0.0015 output
 * - gpt-4o: $0.0025 input, $0.01 output
 */
export function estimateRequestCost(
  inputTokens: number,
  outputTokens: number,
  model: string
): number {
  const costs: Record<string, { input: number; output: number }> = {
    'gpt-4o-mini': { input: 0.00015, output: 0.0006 },
    'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
    'gpt-4o': { input: 0.0025, output: 0.01 },
  };

  const modelCost = costs[model] || costs['gpt-3.5-turbo'];
  return (
    (inputTokens / 1000) * modelCost.input +
    (outputTokens / 1000) * modelCost.output
  );
}

/**
 * Calculate actual cost from AI SDK usage response
 * Use to compare estimated vs actual and tune your estimates
 */
export function calculateActualCost(
  promptTokens: number,
  completionTokens: number,
  model: string
): number {
  return estimateRequestCost(promptTokens, completionTokens, model);
}

/**
 * Pre-flight check: Should we reject this request due to cost?
 * Reject obviously expensive requests BEFORE they burn money
 */
export function shouldRejectRequest(estimatedCost: number): {
  reject: boolean;
  reason?: string;
} {
  const MAX_REQUEST_COST = 0.10; // $0.10 per request cap

  if (estimatedCost > MAX_REQUEST_COST) {
    return {
      reject: true,
      reason: `Request would cost $${estimatedCost.toFixed(4)}, exceeds $${MAX_REQUEST_COST} limit`
    };
  }

  return { reject: false };
}

/**
 * Check if we should use cheap mode based on environment variable
 * In production, set this based on your own budget tracking
 */
export function shouldUseCheapModel(): boolean {
  return process.env.FORCE_CHEAP_MODEL === 'true';
}
```

Update `/slack-agent/server/lib/ai/respond-to-message.ts`:

```typescript title="/slack-agent/server/lib/ai/respond-to-message.ts" {4-9,21-55}
import type { KnownEventFromType } from "@slack/bolt";
import { generateText, type ModelMessage, stepCountIs } from "ai";
import { app } from "~/app";
import {
  calculateActualCost,
  estimateRequestCost,
  estimateTokens,
  shouldRejectRequest,
  shouldUseCheapModel,
} from "./cost-control";
import { simulateFailure, withRetry } from "./retry-wrapper";
// ... rest of imports ...

export const respondToMessage = async ({
  messages,
  event,
  channel,
  thread_ts,
  botId,
  correlation,
}: RespondToMessageOptions) => {
  // Pre-flight cost estimation - catch expensive requests BEFORE they cost money
  const inputTokens = estimateTokens(messages);
  const estimatedOutputTokens = 500; // Conservative estimate for response
  const primaryModel = "gpt-4o-mini";
  const estimatedCost = estimateRequestCost(
    inputTokens,
    estimatedOutputTokens,
    primaryModel
  );

  app.logger.info("Pre-flight cost check", {
    ...correlation,
    inputTokens,
    estimatedOutputTokens,
    estimatedCost,
    model: primaryModel,
    status: estimatedCost > 0.10 ? "rejected" : "approved",
  });

  // Reject expensive requests BEFORE making them
  const rejectCheck = shouldRejectRequest(estimatedCost);
  if (rejectCheck.reject) {
    app.logger.warn("Request rejected - cost limit exceeded", {
      ...correlation,
      estimatedCost,
      limit: 0.1,
      inputTokens,
      estimatedOutputTokens,
      model: primaryModel,
    });

    // Return user-friendly error instead of making expensive request
    return `Your request is too large. Please break it into smaller questions or provide less context. (Estimated cost: $${estimatedCost.toFixed(4)})`;
  }

  // Model selection with budget awareness
  // In production, set FORCE_CHEAP_MODEL based on your own budget tracking
  const useCheapMode = shouldUseCheapModel();
  const models = useCheapMode
    ? ["gpt-3.5-turbo"] // Force cheap model only
    : ["gpt-4o-mini", "gpt-3.5-turbo"];

  if (useCheapMode) {
    app.logger.warn("Budget threshold triggered - using cheap model", {
      ...correlation,
      reason: "FORCE_CHEAP_MODEL environment variable set",
      selectedModel: "gpt-3.5-turbo",
      normalModel: "gpt-4o-mini",
    });
  }

  let lastError: unknown;

  for (const currentModel of models) {
    try {
      const result = await withRetry(
        async () => {
          app.logger.info("Attempting AI request", {
            ...correlation,
            model: currentModel,
            inputTokens,
            estimatedCost,
          });

          return await generateText({
            model: currentModel,
            system: `You are Slack Agent, a helpful assistant in Slack.
            // ... rest of system prompt ...
            `,
            messages,
            // ... rest of config ...
          });
        },
        {
          maxRetries: 3,
          initialDelayMs: 1000,
        }
      );

      // Log actual cost after request completes
      const actualCost = calculateActualCost(
        result.usage.promptTokens,
        result.usage.completionTokens,
        currentModel
      );

      const estimationAccuracy = ((actualCost / estimatedCost) * 100).toFixed(0);

      app.logger.info("AI request completed", {
        ...correlation,
        model: currentModel,
        usage: {
          promptTokens: result.usage.promptTokens,
          completionTokens: result.usage.completionTokens,
          totalTokens: result.usage.totalTokens,
        },
        actualCost,
        estimatedCost,
        estimationAccuracy: `${estimationAccuracy}%`,
      });

      return result.text;
    } catch (error) {
      // ... existing error handling ...
      lastError = error;
    }
  }

  throw lastError;
};
```

**Screenshot placeholders:**

\[TODO: Add screenshot of Gateway dashboard showing "Requests by Model" chart]
\[TODO: Add screenshot of Gateway dashboard showing "Spend" over time]
\[TODO: Add screenshot of Gateway dashboard showing "TTFT" metrics]

\*\*Note: Why No /gateway-status Command?\*\*

Earlier versions of AI Gateway documentation suggested programmatic metrics access. As of January 2025, Gateway provides a **manual dashboard only**—no API exists to query metrics from your code.

Building a `/gateway-status` command that shows real-time metrics would require:

1. Storing `usage` data from every AI SDK response in your database
2. Aggregating that data yourself
3. Calculating costs based on stored token counts

This is a valid production pattern, but it's **your own tracking system**, not Gateway API integration. The lesson focuses on the more immediate value: preventing expensive requests before they happen.

\*\*Side Quest: Build Your Own Usage Tracking System\*\*

## Building on Previous Lessons

This lesson applies cost awareness to everything we've built:

- **From [Repository Flyover](./repository-flyover)**: You saw how the bot fetches context—now you'll optimize token usage for those contexts
- **From [system prompts](./system-prompts-shape-behavior)**: System prompts stay the same across budget-aware model selection
- **From [status communication](./status-communication)**: Status updates can inform users when using cheaper models
- **From [error handling](./error-handling-and-resilience)**: Retry logic combined with cost checks prevents retry storms burning money
- **From [Bolt Middleware](./bolt-nitro-middleware-and-logging)**: Correlation-style logging makes pre-flight and post-request cost logs queryable in production
- **Production reality**: You can't query Gateway metrics, but you CAN prevent expensive requests—the more valuable pattern

## What's Next

Section 5 covers deployment to Vercel and production operations—taking your cost-controlled bot live in [Deploy to Vercel](./deploy-to-vercel).


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)
