field note

MCP Server for Web Scraping , Give Your AI Agent Real-Time Web Data

AI agents are only as useful as the data they can access. Large language models like Claude and GPT-4 have vast knowledge baked in during training, but they can't access live web data on their own. That's where MCP (Model Context Protocol) comes in , and web scraping is one of the most valuable capabilities you can give your agent through it.

What is the Model Context Protocol?

MCP is an open protocol introduced by Anthropic that standardizes how AI models connect to external data sources and tools. Think of it as USB-C for AI , a universal connector that lets any AI client talk to any data source through a consistent interface.

An MCP server exposes tools that an AI agent can call during a conversation. Instead of hard-coding API integrations, you configure an MCP server once and your agent discovers and uses its capabilities automatically.

Why Web Scraping + MCP is Powerful

Web scraping as an MCP tool unlocks use cases that were previously impossible or required complex multi-step orchestration:

  • Real-time research: "Find the current price of [product] on Amazon and compare it with Walmart"
  • Competitive monitoring: "Check what features [competitor] just added to their pricing page"
  • Data enrichment: "Look up this company's website and extract their team size and funding stage"
  • Content analysis: "Read this blog post and summarize the key arguments"
  • Lead generation: "Extract contact routes from these 10 public company websites"

The key insight: your AI agent can request structured data from supported public pages without custom scrapers, without you writing custom scrapers for each site.

Building an MCP Server for Web Extraction

Here's the minimal structure of an MCP server that provides web scraping as a tool:

// server.ts , MCP server with web extraction tool
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server({
  name: "web-scraper",
  version: "1.0.0",
}, {
  capabilities: { tools: {} },
});

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [{
    name: "extract_web_data",
    description: "Extract structured data from public or authorised pages. \
      Uses supported fetch paths and returns clear failure when blocked, \
      then returns clean structured data.",
    inputSchema: {
      type: "object",
      properties: {
        url: { type: "string", description: "URL to extract data from" },
        prompt: { type: "string", description: "What data to extract" },
      },
      required: ["url", "prompt"],
    },
  }],
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "extract_web_data") {
    const { url, prompt } = request.params.arguments;
    const result = await fetch("https://hauntapi.com/v1/extract", {
      method: "POST",
      headers: { "Authorization": "Bearer YOUR_API_KEY" },
      body: JSON.stringify({ url, prompt }),
    });
    const data = await result.json();
    return { content: [{ type: "text", text: JSON.stringify(data) }] };
  }
});

The server exposes a single extract_web_data tool that takes a URL and a natural language prompt describing what to extract. The AI agent calls this tool whenever it needs live web data.

The Haunt API MCP Server (Ready to Use)

You don't have to build this from scratch. Haunt API ships a pre-built MCP server that handles all the complexity:

  • JavaScript rendering (no headless browser needed on your end)
  • Fallback fetch paths for difficult public pages where supported
  • AI-powered structured extraction using natural language prompts
  • JSON output that your agent can parse immediately

Install it via npm:

npm install -g @hauntapi/mcp-server

Or use it with npx:

npx @hauntapi/mcp-server --api-key haunt_xxx

Using It With Claude Desktop and Other Clients

Add the Haunt MCP server to your Claude Desktop configuration:

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "haunt": {
      "command": "npx",
      "args": ["-y", "@hauntapi/mcp-server"],
      "env": {
        "HAUNT_API_KEY": "haunt_your_key_here"
      }
    }
  }
}

Once configured, Claude can extract structured data from public and authorised pages during your conversation. Just ask:

  • "What's on the front page of Hacker News right now?"
  • "Extract the pricing plans from stripe.com"
  • "Get the latest articles from this blog"

MCP vs Traditional Web Scraping for AI

Here's how MCP-based web scraping compares to the old approach:

  • Without MCP: Write a Python script → handle proxies → parse HTML → format as text → paste into chat → manually interpret results
  • With MCP: Ask your AI agent → it calls the scraping tool → gets structured data → reasons about it → gives you the answer

The MCP approach is 10x faster for ad-hoc research tasks and makes your AI agent genuinely useful for real-time web data tasks.

Get started with the Haunt MCP server in 60 seconds. Free tier includes 100 requests/month.

View Documentation
next scan

Turn a live page into structured JSON.

Use Haunt when selectors start lying to you.