Prelude

Building a first AI agent the hard way is a rite of passage. Raw API calls, manual conversation state, hand-rolled tool execution loops, retry logic scattered across three files. It works, in the same way that a house built without blueprints works. It stands up. It also leaks.

When Anthropic released the Agent SDK, that same agent was rewritten in an afternoon. Not because the SDK is magic. Because it handles the parts that every agent needs, the agentic loop, tool execution, conversation management, and lets you focus on the parts that make your agent unique.

This guide builds a complete agent from scratch. Not a weather-checking demo. A project management agent that creates tasks, checks status, assigns work, sends notifications, and handles the edge cases that real applications encounter. By the end, you will have a working agent, an understanding of the SDK's architecture, and patterns you can apply to any agent you build.

The Problem

Every agent shares the same core challenge. You need to give an LLM the ability to take actions in the real world, while keeping control over what those actions are, how they execute, and what happens when they fail.

The raw approach means writing the agentic loop yourself. Send a message to the API. Check if the response contains tool calls. If it does, parse them, execute them, send the results back, and check again. Handle errors at each step.

Manage conversation history. Track token usage. Implement timeouts. Add logging.

That loop is not complex in principle. It is tedious in practice. And the tedium creates bugs.

A missed error handler. A conversation history that grows without bound. A tool call parser that breaks on edge cases.

The Agent SDK eliminates the tedium. It gives you a tested, maintained implementation of the agentic loop, and it lets you define the interesting parts, the tools, the instructions, the guardrails, as simple Python functions and configuration.

Here is how.

The Journey: Build Your AI Agent Step by Step

What We Are Building

The agent we are building is called TaskBot. It manages a simple project board with these capabilities.

It can create tasks with a title, description, priority, and assignee. It can list tasks filtered by status or assignee. It can update task status. It can send notifications when tasks change. And it can provide summaries of project progress.

This is a realistic use case. It touches databases, external APIs, and business logic. It requires multi-turn conversations where the user asks follow-up questions. And it needs guardrails to prevent misuse.

The full code is at the end of this guide. We will build it piece by piece so you understand each decision.

Setting Up the Project

Start by creating a project directory and installing the SDK.

mkdir taskbot && cd taskbot
python -m venv venv
source venv/bin/activate
pip install openai-agents anthropic

The Agent SDK is distributed as the openai-agents package, which supports multiple model providers including Claude through the Anthropic integration. You will also need an Anthropic API key.

export ANTHROPIC_API_KEY="your-key-here"

Create the project structure.

taskbot/
  agent.py          # Agent definition
  tools.py          # Tool functions
  guardrails.py     # Input/output validation
  models.py         # Data models
  database.py       # Storage layer
  main.py           # Entry point

This structure separates concerns. Tools are pure functions that interact with the database. The agent definition is configuration. Guardrails are validation logic. The entry point ties everything together.

Defining the Data Models

Before writing tools, define what you are working with. This example uses Python dataclasses for simplicity, but Pydantic models work well here too.

# models.py
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional

class TaskStatus(Enum):
    TODO = "todo"
    IN_PROGRESS = "in_progress"
    REVIEW = "review"
    DONE = "done"

class Priority(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

@dataclass
class Task:
    id: str
    title: str
    description: str
    status: TaskStatus = TaskStatus.TODO
    priority: Priority = Priority.MEDIUM
    assignee: Optional[str] = None
    created_at: datetime = field(default_factory=datetime.now)
    updated_at: datetime = field(default_factory=datetime.now)

Building a Simple Storage Layer

For this tutorial, an in-memory database keeps things simple. In production, you would swap this for PostgreSQL, SQLite, or whatever your application uses. The agent does not care about the storage implementation. It only interacts with tools.

# database.py
from datetime import datetime
from models import Task, TaskStatus
from typing import Optional
import uuid

_tasks: dict[str, Task] = {}

def create_task(title: str, description: str,
                priority: str = "medium",
                assignee: Optional[str] = None) -> Task:
    task_id = str(uuid.uuid4())[:8]
    task = Task(
        id=task_id,
        title=title,
        description=description,
        priority=priority,
        assignee=assignee
    )
    _tasks[task_id] = task
    return task

def get_task(task_id: str) -> Optional[Task]:
    return _tasks.get(task_id)

def list_tasks(status: Optional[str] = None,
               assignee: Optional[str] = None) -> list[Task]:
    tasks = list(_tasks.values())
    if status:
        tasks = [t for t in tasks if t.status.value == status]
    if assignee:
        tasks = [t for t in tasks if t.assignee == assignee]
    return tasks

def update_task_status(task_id: str, new_status: str) -> Optional[Task]:
    task = _tasks.get(task_id)
    if task:
        task.status = TaskStatus(new_status)
        task.updated_at = datetime.now()
    return task

Defining Tools

Tools are where the agent meets the real world. Each tool is a Python function decorated with @function_tool. The function's name becomes the tool's name. The docstring becomes the tool's description, which Claude reads to understand when and how to use the tool. Type hints become the parameter schema.

# tools.py
from agents import function_tool
from database import create_task, get_task, list_tasks, update_task_status
from typing import Optional

@function_tool
def create_new_task(title: str, description: str,
                    priority: str = "medium",
                    assignee: Optional[str] = None) -> str:
    """Create a new task on the project board.

    Args:
        title: Short title for the task
        description: Detailed description of what needs to be done
        priority: One of 'low', 'medium', 'high', or 'critical'
        assignee: Name of the person to assign the task to
    """
    task = create_task(title, description, priority, assignee)
    return (
        f"Created task {task.id}: '{task.title}' "
        f"(priority: {priority}, assignee: {assignee or 'unassigned'})"
    )

@function_tool
def list_project_tasks(status: Optional[str] = None,
                       assignee: Optional[str] = None) -> str:
    """List tasks on the project board, optionally filtered.

    Args:
        status: Filter by status ('todo', 'in_progress', 'review', 'done')
        assignee: Filter by assignee name
    """
    tasks = list_tasks(status, assignee)
    if not tasks:
        return "No tasks found matching the criteria."

    lines = []
    for t in tasks:
        lines.append(
            f"[{t.id}] {t.title} | "
            f"Status: {t.status.value} | "
            f"Priority: {t.priority} | "
            f"Assignee: {t.assignee or 'unassigned'}"
        )
    return "\n".join(lines)

@function_tool
def update_status(task_id: str, new_status: str) -> str:
    """Update the status of an existing task.

    Args:
        task_id: The ID of the task to update
        new_status: New status ('todo', 'in_progress', 'review', 'done')
    """
    task = update_task_status(task_id, new_status)
    if task:
        return f"Updated task {task_id} to status '{new_status}'."
    return f"Task {task_id} not found."

@function_tool
def get_task_details(task_id: str) -> str:
    """Get full details of a specific task.

    Args:
        task_id: The ID of the task to look up
    """
    task = get_task(task_id)
    if not task:
        return f"Task {task_id} not found."

    return (
        f"Task {task.id}\n"
        f"Title: {task.title}\n"
        f"Description: {task.description}\n"
        f"Status: {task.status.value}\n"
        f"Priority: {task.priority}\n"
        f"Assignee: {task.assignee or 'unassigned'}\n"
        f"Created: {task.created_at.isoformat()}\n"
        f"Updated: {task.updated_at.isoformat()}"
    )

@function_tool
def send_notification(recipient: str, message: str) -> str:
    """Send a notification to a team member.

    Args:
        recipient: Name of the person to notify
        message: The notification message
    """
    # In production, this would call an email/Slack/Teams API
    print(f"[NOTIFICATION to {recipient}]: {message}")
    return f"Notification sent to {recipient}."

Notice that each tool returns a string. The SDK sends this string back to Claude as the tool result. Claude uses it to formulate its response. Clear, informative return values make Claude's responses better.

The docstrings matter enormously. Claude reads them to decide which tool to use and how to call it. A vague docstring produces vague tool usage. A precise docstring with documented parameters produces precise tool calls.

Creating the Agent

With tools defined, the agent itself is straightforward.

# agent.py
from agents import Agent
from tools import (
    create_new_task,
    list_project_tasks,
    update_status,
    get_task_details,
    send_notification
)

taskbot = Agent(
    name="TaskBot",
    model="claude-sonnet-4-6",
    instructions="""You are TaskBot, a project management assistant.
    You help teams manage their tasks and stay organised.

    Guidelines:
    - When creating tasks, always confirm the details with the user first
    - Use 'medium' priority unless the user specifies otherwise
    - When updating task status, notify the assignee
    - Provide concise summaries when listing tasks
    - If a task ID is not found, suggest listing all tasks
    - Always be helpful but never create tasks without explicit user request""",
    tools=[
        create_new_task,
        list_project_tasks,
        update_status,
        get_task_details,
        send_notification
    ]
)

The instructions field is your system prompt. It shapes how the agent behaves across all conversations. Write it like you are briefing a new team member. Be specific about what the agent should and should not do.

The model field determines which Claude model powers the agent. Use claude-sonnet-4-6 for a good balance of speed and capability. Use claude-opus-4-6 for tasks that require deeper reasoning. Use claude-haiku-4-6 for simple, high-volume tasks where speed matters most.

Understanding the Agentic Loop

When you call Runner.run_sync(taskbot, "Create a task for the API migration"), this is what happens internally.

First, the SDK sends your message to Claude along with the system instructions and the tool definitions. Claude receives everything it needs to understand who it is, what it can do, and what the user wants.

Second, Claude responds. If it decides to call a tool, the response contains a tool use block with the tool name and parameters. If it decides to respond directly, the response contains text.

Third, if there was a tool call, the SDK executes your tool function with the provided parameters. It captures the return value as a string.

Fourth, the SDK sends the tool result back to Claude. Claude now knows what happened and can decide to call another tool, ask a follow-up question, or provide a final response.

This loop continues until Claude produces a response with no tool calls. That response becomes the final_output of the run.

The key insight is that you never write this loop. The SDK handles it. Your job is to define the tools and instructions that shape what happens inside the loop.

Multi-Turn Conversations

A single Runner.run_sync() call handles one turn of conversation. For multi-turn interactions where the user asks follow-up questions, you need to maintain the conversation history.

# main.py
from agents import Runner
from agent import taskbot

async def chat():
    history = []

    while True:
        user_input = input("\nYou: ")
        if user_input.lower() in ("quit", "exit"):
            break

        # Add user message to history
        history.append({
            "role": "user",
            "content": user_input
        })

        result = await Runner.run(
            taskbot,
            history
        )

        # Add assistant response to history
        history.append({
            "role": "assistant",
            "content": result.final_output
        })

        print(f"\nTaskBot: {result.final_output}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(chat())

The conversation history is a list of message objects. Each message has a role (user or assistant) and content. The SDK sends the full history with each request, giving Claude the context of the entire conversation.

Be mindful of history length. Each message consumes tokens. For long-running sessions, you may need to summarise older messages or implement a sliding window that keeps only the most recent exchanges.

Adding Guardrails

Guardrails are functions that validate inputs before the agent processes them and outputs before the agent returns them. They are your safety layer.

# guardrails.py
from agents import (
    InputGuardrail,
    OutputGuardrail,
    GuardrailFunctionOutput,
    Agent,
    Runner
)

BLOCKED_TERMS = [
    "delete all", "drop table", "remove everything",
    "fire everyone", "terminate all"
]

async def validate_input(ctx, agent, user_input):
    """Block potentially destructive or harmful requests."""
    lower_input = user_input.lower() if isinstance(user_input, str) else ""
    triggered = any(term in lower_input for term in BLOCKED_TERMS)

    return GuardrailFunctionOutput(
        output_info={
            "blocked_term_found": triggered,
            "input_length": len(lower_input)
        },
        tripwire_triggered=triggered
    )

async def validate_output(ctx, agent, output):
    """Ensure the agent does not leak internal details."""
    lower_output = output.lower() if isinstance(output, str) else ""
    leaks_internals = any(
        term in lower_output
        for term in ["api_key", "database_url", "internal_secret"]
    )

    return GuardrailFunctionOutput(
        output_info={"leaks_internals": leaks_internals},
        tripwire_triggered=leaks_internals
    )

Now add the guardrails to the agent definition.

from guardrails import validate_input, validate_output
from agents import InputGuardrail, OutputGuardrail

taskbot = Agent(
    name="TaskBot",
    model="claude-sonnet-4-6",
    instructions="...",
    tools=[...],
    input_guardrails=[
        InputGuardrail(guardrail_function=validate_input)
    ],
    output_guardrails=[
        OutputGuardrail(guardrail_function=validate_output)
    ]
)

When a guardrail trips, the SDK raises an exception that you catch in your application code. The agent never sees the blocked input. The user gets a clear error message.

from agents.exceptions import InputGuardrailTripwireTriggered

try:
    result = await Runner.run(taskbot, user_input)
except InputGuardrailTripwireTriggered:
    print("That request was blocked by our safety policy.")

We recommend running guardrails on every agent in production. The overhead is minimal, typically a few milliseconds for pattern matching. The protection is significant. A guardrail that catches one destructive request has paid for itself permanently.

Error Handling

Tools fail. APIs time out. Databases go down. Your agent needs to handle these failures gracefully.

The simplest approach is to handle errors within the tool function itself.

@function_tool
def create_new_task(title: str, description: str,
                    priority: str = "medium",
                    assignee: Optional[str] = None) -> str:
    """Create a new task on the project board."""
    try:
        # Validate priority
        valid_priorities = ["low", "medium", "high", "critical"]
        if priority not in valid_priorities:
            return (
                f"Invalid priority '{priority}'. "
                f"Must be one of: {', '.join(valid_priorities)}"
            )

        task = create_task(title, description, priority, assignee)
        return (
            f"Created task {task.id}: '{task.title}' "
            f"(priority: {priority})"
        )
    except Exception as e:
        return f"Failed to create task: {str(e)}"

By returning error messages as strings rather than raising exceptions, you let Claude handle the failure conversationally. Claude sees "Failed to create task: database connection timeout" and can tell the user what happened, suggest a retry, or try an alternative approach.

For critical failures that should stop the agent entirely, raise an exception. The SDK will stop the agentic loop and propagate the error to your application code.

You should also set timeouts at the runner level to prevent agents from running indefinitely.

result = await Runner.run(
    taskbot,
    user_input,
    max_turns=10  # Stop after 10 tool call cycles
)

The max_turns parameter prevents infinite loops where the agent keeps calling tools without reaching a conclusion. Ten turns is a reasonable default for most agents. Increase it for agents that need to perform many sequential operations.

Agent Handoffs

Sometimes a single agent cannot handle everything. The Agent SDK supports handoffs, where one agent delegates to another for specialised tasks.

Imagine TaskBot needs to handle both project management and time tracking. Instead of cramming both into one agent, you create two specialised agents and let them hand off to each other.

from agents import Agent

time_tracker = Agent(
    name="TimeTracker",
    model="claude-sonnet-4-6",
    instructions="""You track time spent on tasks.
    You can log hours, view time reports, and calculate
    utilisation rates. For task management questions,
    hand off to the TaskBot agent.""",
    tools=[log_time, get_time_report, calculate_utilisation],
    handoffs=[]  # Will be set after taskbot is defined
)

taskbot = Agent(
    name="TaskBot",
    model="claude-sonnet-4-6",
    instructions="""You manage project tasks.
    For time tracking questions, hand off to the
    TimeTracker agent.""",
    tools=[
        create_new_task,
        list_project_tasks,
        update_status,
        get_task_details,
        send_notification
    ],
    handoffs=[time_tracker]
)

# Complete the circular reference
time_tracker.handoffs = [taskbot]

When a user asks TaskBot "How many hours did Sarah log this week?", TaskBot recognises this is a time tracking question and hands off to TimeTracker. TimeTracker handles the request with its specialised tools and returns the result.

This pattern keeps each agent focused. Focused agents are easier to test, easier to debug, and produce better results because their instructions and tools are not diluted by unrelated capabilities.

Observability and Monitoring

In production, you need to know what your agent is doing. The SDK provides hooks that let you observe every step of the agentic loop.

from agents import RunHooks, RunContextWrapper, Tool, Agent
from datetime import datetime

class ProductionHooks(RunHooks):
    def __init__(self):
        self.tool_calls = []
        self.start_time = None
        self.total_tokens = 0

    async def on_agent_start(self, context, agent):
        self.start_time = datetime.now()
        print(f"[{self.start_time}] Agent '{agent.name}' started")

    async def on_tool_start(self, context, agent, tool):
        print(f"  [TOOL CALL] {tool.name}")

    async def on_tool_end(self, context, agent, tool, result):
        self.tool_calls.append({
            "tool": tool.name,
            "timestamp": datetime.now().isoformat(),
            "result_length": len(str(result))
        })
        print(f"  [TOOL DONE] {tool.name} ({len(str(result))} chars)")

    async def on_agent_end(self, context, agent, output):
        duration = (datetime.now() - self.start_time).total_seconds()
        print(f"[COMPLETE] {len(self.tool_calls)} tool calls "
              f"in {duration:.1f}s")

hooks = ProductionHooks()
result = await Runner.run(
    taskbot,
    user_input,
    run_hooks=hooks
)

These hooks give you structured data about every agent execution. In production, send this data to a logging service for analysis. Common things to track include the number of tool calls per conversation, which tools are used most frequently, average execution time, error rates, and token consumption patterns.

If you are building agents that connect to external services through MCP servers, observability becomes even more important. MCP tool calls cross network boundaries, so you need to track latency and failures at each hop.

Production Patterns

Several patterns have proven essential in production agents.

Async execution. The SDK supports async natively. Use Runner.run() instead of Runner.run_sync() in production to avoid blocking your application's event loop.

result = await Runner.run(taskbot, user_input)

Rate limiting. If your agent handles multiple users, implement rate limiting to avoid API quota exhaustion.

import asyncio
from collections import defaultdict
from time import time

class RateLimiter:
    def __init__(self, max_requests_per_minute=20):
        self.max_rpm = max_requests_per_minute
        self.requests = defaultdict(list)

    async def check(self, user_id: str):
        now = time()
        # Clean old entries
        self.requests[user_id] = [
            t for t in self.requests[user_id]
            if now - t < 60
        ]
        if len(self.requests[user_id]) >= self.max_rpm:
            raise Exception("Rate limit exceeded. Please wait.")
        self.requests[user_id].append(now)

limiter = RateLimiter()

async def handle_request(user_id: str, message: str):
    await limiter.check(user_id)
    result = await Runner.run(taskbot, message)
    return result.final_output

Cost tracking. Every agent call consumes tokens. Track usage to avoid surprises on your bill.

class CostTracker(RunHooks):
    def __init__(self):
        self.total_input_tokens = 0
        self.total_output_tokens = 0

    async def on_agent_end(self, context, agent, output):
        usage = getattr(context, 'usage', None)
        if usage:
            self.total_input_tokens += usage.input_tokens
            self.total_output_tokens += usage.output_tokens

        cost = (
            self.total_input_tokens * 0.003 / 1000 +
            self.total_output_tokens * 0.015 / 1000
        )
        print(f"Session cost so far: ${cost:.4f}")

Testing Your Agent

Testing agents requires two layers. Unit tests for individual tools and integration tests for the full agent loop.

Unit testing tools. Since tools are regular Python functions (wrapped with a decorator), you can test them directly.

# test_tools.py
import pytest
from database import create_task, list_tasks, _tasks

def setup_function():
    """Clear the database before each test."""
    _tasks.clear()

def test_create_task():
    task = create_task(
        "Fix login bug",
        "Users cannot log in with SSO",
        "high",
        "Sarah"
    )
    assert task.title == "Fix login bug"
    assert task.priority == "high"
    assert task.assignee == "Sarah"
    assert task.id is not None

def test_list_tasks_filter_by_status():
    create_task("Task 1", "Description", "medium")
    task2 = create_task("Task 2", "Description", "high")
    task2.status = TaskStatus.DONE

    todo_tasks = list_tasks(status="todo")
    assert len(todo_tasks) == 1
    assert todo_tasks[0].title == "Task 1"

def test_list_tasks_empty():
    tasks = list_tasks()
    assert tasks == []

Integration testing the agent. Use the SDK to run the agent against test inputs and verify the outputs.

# test_agent.py
import pytest
from agents import Runner
from agent import taskbot
from database import _tasks

@pytest.fixture(autouse=True)
def clear_db():
    _tasks.clear()
    yield
    _tasks.clear()

@pytest.mark.asyncio
async def test_agent_creates_task():
    result = await Runner.run(
        taskbot,
        "Create a high priority task called 'Deploy v2.0' "
        "about deploying the new version to production"
    )
    assert len(_tasks) == 1
    task = list(_tasks.values())[0]
    assert "Deploy" in task.title

@pytest.mark.asyncio
async def test_agent_handles_unknown_task():
    result = await Runner.run(
        taskbot,
        "Show me details for task xyz123"
    )
    assert "not found" in result.final_output.lower()

@pytest.mark.asyncio
async def test_guardrail_blocks_destructive_input():
    from agents.exceptions import InputGuardrailTripwireTriggered

    with pytest.raises(InputGuardrailTripwireTriggered):
        await Runner.run(taskbot, "Delete all tasks immediately")

Integration tests are slower because they make API calls. Run them in a separate test suite and use environment variables to point them at a test API key with lower rate limits.

The Complete Working Example

Here is the full TaskBot agent in a single file, ready to run.

#!/usr/bin/env python3
"""TaskBot - A project management agent built with the Claude Agent SDK."""

import asyncio
import uuid
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional

from agents import (
    Agent,
    Runner,
    function_tool,
    InputGuardrail,
    GuardrailFunctionOutput,
    RunHooks,
)

# --- Data Models ---

@dataclass
class Task:
    id: str
    title: str
    description: str
    status: str = "todo"
    priority: str = "medium"
    assignee: Optional[str] = None
    created_at: str = field(
        default_factory=lambda: datetime.now().isoformat()
    )

# --- Database ---

tasks_db: dict[str, Task] = {}

# --- Tools ---

@function_tool
def create_task(title: str, description: str,
                priority: str = "medium",
                assignee: Optional[str] = None) -> str:
    """Create a new task on the project board.

    Args:
        title: Short title for the task
        description: What needs to be done
        priority: low, medium, high, or critical
        assignee: Person to assign the task to
    """
    valid = ["low", "medium", "high", "critical"]
    if priority not in valid:
        return f"Invalid priority. Must be one of: {', '.join(valid)}"

    task_id = str(uuid.uuid4())[:8]
    task = Task(
        id=task_id, title=title, description=description,
        priority=priority, assignee=assignee
    )
    tasks_db[task_id] = task
    return (
        f"Created task {task_id}: '{title}' "
        f"(priority: {priority}, "
        f"assignee: {assignee or 'unassigned'})"
    )

@function_tool
def list_tasks(status: Optional[str] = None,
               assignee: Optional[str] = None) -> str:
    """List tasks, optionally filtered by status or assignee.

    Args:
        status: Filter by todo, in_progress, review, or done
        assignee: Filter by assignee name
    """
    filtered = list(tasks_db.values())
    if status:
        filtered = [t for t in filtered if t.status == status]
    if assignee:
        filtered = [t for t in filtered if t.assignee == assignee]

    if not filtered:
        return "No tasks found."

    lines = []
    for t in filtered:
        lines.append(
            f"[{t.id}] {t.title} | {t.status} | "
            f"{t.priority} | {t.assignee or 'unassigned'}"
        )
    return "\n".join(lines)

@function_tool
def update_task(task_id: str, new_status: str) -> str:
    """Update the status of a task.

    Args:
        task_id: The task ID
        new_status: New status (todo, in_progress, review, done)
    """
    valid = ["todo", "in_progress", "review", "done"]
    if new_status not in valid:
        return f"Invalid status. Must be one of: {', '.join(valid)}"

    task = tasks_db.get(task_id)
    if not task:
        return f"Task {task_id} not found."

    old_status = task.status
    task.status = new_status
    return (
        f"Updated task {task_id} from '{old_status}' "
        f"to '{new_status}'."
    )

@function_tool
def get_task(task_id: str) -> str:
    """Get full details of a task.

    Args:
        task_id: The task ID to look up
    """
    task = tasks_db.get(task_id)
    if not task:
        return f"Task {task_id} not found."

    return (
        f"Task: {task.id}\n"
        f"Title: {task.title}\n"
        f"Description: {task.description}\n"
        f"Status: {task.status}\n"
        f"Priority: {task.priority}\n"
        f"Assignee: {task.assignee or 'unassigned'}\n"
        f"Created: {task.created_at}"
    )

@function_tool
def notify(recipient: str, message: str) -> str:
    """Send a notification to a team member.

    Args:
        recipient: Person to notify
        message: Notification message
    """
    print(f"  [NOTIFY {recipient}]: {message}")
    return f"Notification sent to {recipient}."

# --- Guardrails ---

async def check_input(ctx, agent, user_input):
    blocked = ["delete all", "remove everything", "drop table"]
    text = user_input.lower() if isinstance(user_input, str) else ""
    triggered = any(term in text for term in blocked)
    return GuardrailFunctionOutput(
        output_info={"blocked": triggered},
        tripwire_triggered=triggered
    )

# --- Hooks ---

class AgentLogger(RunHooks):
    async def on_tool_start(self, context, agent, tool):
        print(f"  > Calling {tool.name}...")

    async def on_tool_end(self, context, agent, tool, result):
        preview = str(result)[:80]
        print(f"  < {tool.name} returned: {preview}")

# --- Agent ---

taskbot = Agent(
    name="TaskBot",
    model="claude-sonnet-4-6",
    instructions="""You are TaskBot, a project management assistant.

    Rules:
    - Confirm details before creating tasks
    - Default to medium priority unless told otherwise
    - Notify assignees when their tasks change status
    - Be concise and helpful
    - Never create tasks without an explicit request""",
    tools=[create_task, list_tasks, update_task, get_task, notify],
    input_guardrails=[
        InputGuardrail(guardrail_function=check_input)
    ]
)

# --- Main ---

async def main():
    print("TaskBot ready. Type 'quit' to exit.\n")
    history = []
    hooks = AgentLogger()

    while True:
        user_input = input("You: ")
        if user_input.lower() in ("quit", "exit"):
            break

        history.append({"role": "user", "content": user_input})

        try:
            result = await Runner.run(
                taskbot, history,
                run_hooks=hooks, max_turns=10
            )
            response = result.final_output
            history.append(
                {"role": "assistant", "content": response}
            )
            print(f"\nTaskBot: {response}\n")

        except Exception as e:
            print(f"\nError: {e}\n")

if __name__ == "__main__":
    asyncio.run(main())

Save this as taskbot.py, set your ANTHROPIC_API_KEY, and run it with python taskbot.py. You will have a working project management agent that creates tasks, tracks status, sends notifications, and blocks destructive inputs.

The Lesson

Building an agent is not about the library. It is about the decisions you make when using it. Which tools to expose. What instructions to write.

Where to put guardrails. How to handle failures.

The Agent SDK handles the mechanical parts, the agentic loop, tool execution, conversation management, so you can focus on these decisions. Every hour previously spent debugging a hand-rolled loop is now an hour spent improving tools and instructions.

The patterns in this guide transfer to any agent you build. The tools will be different. The domain will be different. But the structure, clear tool definitions with precise docstrings, thoughtful instructions, guardrails at the boundaries, observability throughout, remains the same.

For a comparison of how the Agent SDK stacks up against other frameworks, our guide on Claude Agent SDK vs LangChain covers that in detail. And to extend your agents with external services, the guide on MCP servers and extensions shows how to connect agents to the broader tool ecosystem.

Conclusion

This guide started by describing a first agent built the hard way with raw API calls and manual loops. TaskBot is the same kind of agent, but built in an afternoon instead of a week. It has guardrails, observability, error handling, and multi-turn support.

It is testable. It is maintainable. And the code is readable enough that a new team member can understand it without a walkthrough.

The Agent SDK is not doing anything you could not do yourself. It is doing what you would do yourself, but tested, maintained, and improved by the team that builds Claude. That is the value proposition. Not magic. Leverage.

Start with a single tool. Get it working. Add a second. Add guardrails. Add observability.

Each step is small. Each step makes your agent more capable and more reliable. And every agent you build after the first one is faster, because the patterns are the same.

Build something. Ship it. Watch how your users interact with it. Then improve it. That is the loop that matters more than any agentic loop in any SDK.