---
title: "Break It, Fix It"
description: "Master Vercel Workflow error handling by breaking things on purpose. Learn when to use RetryableError for transient failures and FatalError for permanent failures."
canonical_url: "https://vercel.com/academy/visual-workflow-builder-on-vercel/error-handling"
md_url: "https://vercel.com/academy/visual-workflow-builder-on-vercel/error-handling.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-09T07:50:12.535Z"
content_type: "lesson"
course: "visual-workflow-builder-on-vercel"
course_title: "Build Visual Workflow Plugins on Vercel"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# Break It, Fix It

# Break It, Fix It

Your email plugin from [Build an Email Plugin](/visual-workflow-builder-on-vercel/resend-plugin) works when everything goes right. But networks fail. APIs go down. Rate limits hit. Invalid credentials slip through. This is exactly why the stripped-down starter teaches the pattern first instead of pretending built-in integrations already solve it. What happens when your own plugin hits those failures?

Vercel Workflow gives you explicit control: throw `RetryableError` for transient failures that should retry automatically, or `FatalError` for permanent failures that need immediate attention. You decide what retries and what stops.

\*\*Note: Mental Model: Retry and Error Recovery\*\*

`RetryableError` retries automatically with exponential backoff. `FatalError` stops immediately. You control which errors get which treatment. See [Errors & Retrying](https://useworkflow.dev/docs/foundations/errors-and-retries) in the SDK docs.

## Outcome

You'll break your email plugin on purpose, see the limitation of simple return values, then refactor to proper error handling with `FatalError` and `RetryableError`. The goal: understand *why* the SDK provides these error types.

## Fast Track

1. Break the email plugin with a bad API key - see the return-value limitation
2. Refactor to `FatalError` for auth failures, `RetryableError` for transient ones
3. Fix and verify the improved error handling

## Hands-on Exercise

\*\*Reflection:\*\* Before you break it: If you set an invalid API key and run the workflow, how many times do you think the step will attempt? Will it retry forever, or stop after some limit? What will the logs show?

### Part 1: Break It (See the Problem)

Your step from [Build an Email Plugin](/visual-workflow-builder-on-vercel/resend-plugin) returns `{ success: false, error: "..." }` when things go wrong. Let's see what happens:

1. Open `.env.local` and set `RESEND_API_KEY=invalid_key_12345`
2. Run your Send Email workflow
3. Check the Runs tab output:

```json
{
  "error": "API key is invalid",
  "success": false
}
```

One attempt. Failed. Done. The workflow has no idea this was an auth error vs a rate limit vs a network blip. It just sees "failed" and stops.

**The problem:** Your step returns failure, but doesn't tell the workflow *how* to handle it. Should it retry? Alert immediately? The workflow can't decide because you haven't told it.

### Part 2: Refactor to Throw Errors

Let's upgrade your step to use proper Workflow SDK error types. Update `plugins/resend/steps/send-email.ts`:

```typescript title="plugins/resend/steps/send-email.ts" {1,9-11,21-24}
import { FatalError } from "workflow";

async function stepHandler(
  input: SendEmailCoreInput,
  credentials: ResendCredentials
): Promise<SendEmailResult> {
  const apiKey = credentials.RESEND_API_KEY;

  if (!apiKey) {
    throw new FatalError("RESEND_API_KEY is not configured");
  }

  const resend = new Resend(apiKey);
  const result = await resend.emails.send({
    from: "onboarding@resend.dev",
    to: input.emailTo,
    subject: input.emailSubject,
    text: input.emailBody,
  });

  if (result.error) {
    // Auth errors are permanent - don't retry
    if (result.error.message.includes("API key")) {
      throw new FatalError(`Auth failed: ${result.error.message}`);
    }
    // Other errors might be transient - return failure for now
    return { success: false, error: result.error.message };
  }

  return { success: true, id: result.data?.id || "" };
}
```

Run with the invalid key again:

```
[Workflow Executor] Node execution completed: { nodeId: 'action-1', success: false }
```

Still one attempt, but now the error is a `FatalError` - the workflow knows this is permanent and won't waste time retrying.

### Part 3: Add Retry for Transient Errors

Now let's handle the opposite case - errors that *should* retry. Rate limits (429) and service unavailable (503) are temporary. Add `RetryableError`:

```typescript title="plugins/resend/steps/send-email.ts" {1,15-17}
import { FatalError, RetryableError } from "workflow";

async function stepHandler(
  input: SendEmailCoreInput,
  credentials: ResendCredentials
): Promise<SendEmailResult> {
  // ... apiKey check with FatalError ...

  const resend = new Resend(apiKey);
  const result = await resend.emails.send({ ... });

  if (result.error) {
    const msg = result.error.message;
    
    // Transient errors - retry with backoff
    if (msg.includes("rate limit") || msg.includes("503")) {
      throw new RetryableError(`Temporary failure: ${msg}`);
    }
    
    // Auth errors - don't retry
    if (msg.includes("API key")) {
      throw new FatalError(`Auth failed: ${msg}`);
    }
    
    return { success: false, error: msg };
  }

  return { success: true, id: result.data?.id || "" };
}
```

\*\*Note: Testing Retries\*\*

To see retries in action, you can temporarily force a `RetryableError` at the start of your step. The workflow will retry with exponential backoff until it succeeds or hits the retry limit.

### Part 4: Fix and Verify

1. Restore your valid `RESEND_API_KEY` in `.env.local`
2. Run the workflow
3. Watch it succeed on first attempt
4. Check the Runs tab - you should see `{ "success": true, "id": "..." }`

## When to Use Which

```mermaid height=750
flowchart TB
    A[API returns error] --> B{Will retry fix it?}
    B -->|Yes: 429, 503, timeout| C[RetryableError]
    B -->|No: 401, 400, bad data| D[FatalError]
    C --> E[Auto-retry with backoff]
    D --> F[Stop immediately + alert]
```

| Error Type                                                                              | When to Use                                | Examples                                                 |
| --------------------------------------------------------------------------------------- | ------------------------------------------ | -------------------------------------------------------- |
| [`RetryableError`](https://useworkflow.dev/docs/api-reference/workflow/retryable-error) | Transient failures that might resolve      | 429 rate limit, 503 service unavailable, network timeout |
| [`FatalError`](https://useworkflow.dev/docs/api-reference/workflow/fatal-error)         | Permanent failures that won't self-resolve | 401 unauthorized, 400 bad request, invalid input data    |

\*\*Warning: Don't Retry Auth Failures\*\*

A bad API key won't become valid after 3 retries. Make auth failures fatal immediately — you'll get alerted faster and won't waste resources.

\*\*Note: Production Observability\*\*

In production, workflow errors show up in [Vercel Runtime Logs](https://vercel.com/docs/logs/runtime). Set up [Log Drains](https://vercel.com/docs/drains) to pipe them to your observability stack, and configure [Alerts](https://vercel.com/docs/alerts) to get notified when fatal errors spike.

```yaml
quiz:
  question: "Your step gets a 503 Service Unavailable from an external API. Which error type?"
  choices:
    - id: "fatal"
      text: "FatalError — the service is down"
    - id: "retryable"
      text: "RetryableError — service might recover"
    - id: "none"
      text: "No error — return a failure result instead"
    - id: "depends"
      text: "It depends on the API"
  correctAnswerId: "retryable"
  feedback: "{\n    correct: \"Right. 503 is transient — the service is temporarily unavailable but will likely recover. RetryableError lets the workflow try again after backoff.\",\n    incorrect: \"503 Service Unavailable is the textbook transient error. The service is down now but probably won't be in 30 seconds. That's exactly when you want automatic retry.\"\n  }"
```

```yaml
quiz:
  question: "Your step gets a 401 Unauthorized. Which error type?"
  choices:
    - id: "retryable"
      text: "RetryableError — maybe the token will refresh"
    - id: "fatal"
      text: "FatalError — bad credentials won't fix themselves"
    - id: "none"
      text: "No error — return a failure result instead"
    - id: "depends"
      text: "It depends on the auth type"
  correctAnswerId: "fatal"
  feedback: "{\n    correct: \"Exactly. A bad API key won't become valid after 3 retries. FatalError stops immediately so you get alerted and don't waste resources.\",\n    incorrect: \"401 means your credentials are wrong. No amount of waiting will fix that. Make it fatal so you find out immediately.\"\n  }"
```

\*\*Reflection:\*\* Think about an API you use regularly (Stripe, Twilio, GitHub, your internal services). List 2-3 error responses that API returns. For each one, would you use RetryableError or FatalError? Why?

## Try It

Check the Runs tab after each test:

**1. Before refactor (return pattern) - invalid API key:**

```json
{
  "error": "API key is invalid",
  "success": false
}
```

One attempt. Workflow doesn't know if it should retry.

**2. After adding FatalError - invalid API key:**

```json
{
  "error": "Auth failed: API key is invalid",
  "success": false
}
```

Still one attempt, but now it's explicit - workflow knows not to retry auth failures.

**3. After fixing - valid API key:**

```json
{
  "id": "1b588f42-6550-469b-b3af-2b422ac51993",
  "success": true
}
```

Success on first attempt. Email delivered.

\*\*Note: Advanced: Custom Retry Timing\*\*

`RetryableError` accepts a `retryAfter` option for precise control over when to retry. You can specify a duration string (`"5m"`), milliseconds (`5000`), or a specific `Date`. Combined with [`getStepMetadata()`](https://useworkflow.dev/docs/api-reference/workflow/get-step-metadata) for attempt counts, you can implement exponential backoff or honor `Retry-After` headers from APIs. See the [RetryableError docs](https://useworkflow.dev/docs/api-reference/workflow/retryable-error) for examples.

## Commit

```bash
git add -A
git commit -m "feat: add error handling with RetryableError and FatalError"
```

## Done

- [ ] Broke email plugin with invalid API key
- [ ] Saw the limitation of return-value error pattern
- [ ] Refactored to throw `FatalError` for auth failures
- [ ] Added `RetryableError` for transient failures (rate limits, 503)
- [ ] Fixed everything, verified successful send
- [ ] Can explain when to use RetryableError vs FatalError

\*\*Side Quest: Step Error Test Suite\*\*


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)
