ScraperCity logo

Integration Guide

ScraperCity + Pipedream

Build lead scraping automations on Pipedream with full code access. Pull Apollo contacts, Google Maps businesses, validate emails, find phone numbers, and route results to your CRM, spreadsheet, or cold email tool - all running in the cloud on a schedule.

What you can build

Pipedream workflows connect ScraperCity's B2B data API to any downstream tool. Every workflow runs serverlessly in the cloud - no infrastructure to manage. Here are the most common pipelines teams build.

Daily Apollo lead pipeline

Cron trigger fires every morning. ScraperCity scrapes Apollo for contacts matching your ICP filters. Pagination code step collects all pages. Leads route to HubSpot, Pipedrive, or a Google Sheet automatically.

Google Maps local business prospecting

Trigger on webhook or schedule. ScraperCity pulls Google Maps listings by keyword and city - including phone, email, reviews, and website. Send new businesses to your outreach tool or Airtable CRM.

Email validation before outreach

After any lead enrichment step, POST each address to ScraperCity's email validation API at $0.0036/email. Filter deliverable-only leads to a separate Google Sheet or CRM list. Reduce bounces before you hit send.

Lead enrichment pipeline

Receive a webhook from your sign-up form. Pass the company domain to ScraperCity's Email Finder or Website Finder. Append phone numbers with Mobile Finder. Push the enriched lead record back to your CRM automatically.

Shopify/WooCommerce store prospecting

Scrape e-commerce stores by niche using the Store Leads endpoint. Filter by platform, revenue signals, or technology. Route qualified merchants to a Slack notification channel and a CRM pipeline stage.

Sales prospecting automation with email finding

Trigger from a new row added to Google Sheets. For each company name + person name, call ScraperCity's Email Finder API. Write the discovered email back to the same row. A complete no-touch enrichment loop.

Setup

Follow these steps to connect ScraperCity to Pipedream. The full workflow takes about 10 minutes.

1

Get your ScraperCity API key

Log in to ScraperCity and go to app.scrapercity.com/dashboard/api-docs to copy your API key. Then in Pipedream, navigate to Settings > Environment Variables and create a new secret variable named SCRAPERCITY_API_KEY. Storing the key as an environment variable keeps it out of your workflow code and prevents accidental exposure in Pipedream's execution logs.

2

Create a Pipedream workflow with a trigger

Create a new workflow inside a Pipedream project. Choose the trigger type that matches your use case:

  • Cron Schedule - runs the scrape automatically on a fixed interval. Use the "Every" option (e.g. every 1 day) or a custom cron expression like 0 8 * * 1-5 to run at 8 AM on weekdays.
  • HTTP / Webhook - fires when you send a POST request to the workflow's unique URL. Useful for triggering a scrape from an external app, a form submission, or another automation tool.
  • Manual - click "Run Now" in the Inspector. Good for testing before enabling the schedule.

Note: Pipedream manages the servers for scheduled workflows, so there is no server or cron daemon to operate yourself.

3

Add an HTTP request step for ScraperCity

Add a step and select HTTP Request (or use a Node.js code step with axios for more control). Configure the request as shown below. This example scrapes Apollo for Director-level contacts in SaaS companies with a verified email address.

Method: GET
URL: https://app.scrapercity.com/api/v1/database/leads
Headers:
  Authorization: Bearer YOUR_SCRAPERCITY_KEY
Query Parameters:
  title: Director of Sales
  industry: computer software
  hasEmail: true
  limit: 100
  page: 1

Replace YOUR_SCRAPERCITY_KEY with {{process.env.SCRAPERCITY_API_KEY}} when using Pipedream's object explorer, or reference process.env.SCRAPERCITY_API_KEY inside a code step. The Lead Database endpoint requires the $649/mo plan. All other scraper endpoints work on any plan.

4

Handle pagination with a code step

The ScraperCity API paginates at 100 leads per page with a maximum of 100,000 leads per day. Add a Node.js code step to loop through all pages and return a flat array of leads for downstream steps:

export default defineComponent({
  async run({ steps, $ }) {
    const axios = require("axios");
    const allLeads = [];
    let page = 1;
    let totalPages = 1;

    do {
      const response = await axios.get(
        "https://app.scrapercity.com/api/v1/database/leads",
        {
          headers: {
            Authorization: "Bearer " + process.env.SCRAPERCITY_API_KEY,
          },
          params: {
            title: "Director of Sales",
            industry: "computer software",
            hasEmail: "true",
            limit: "100",
            page: String(page),
          },
        }
      );
      allLeads.push(...response.data.data);
      totalPages = response.data.pagination.totalPages;
      page++;
    } while (page <= totalPages);

    return allLeads;
  },
})

Using axios (rather than $.send.http()) is recommended here because you need to read the response body to extract pagination metadata and pass the full lead array to the next step. Pipedream makes any value you return from a code step available to all downstream steps via the steps object.

5

(Optional) Validate emails before routing

Add an optional email validation step after the pagination loop. For each lead returned, POST the email address to the ScraperCity Email Validator at $0.0036/email. This filters catch-all addresses and undeliverable contacts before they reach your CRM or outreach tool - keeping your sender reputation clean.

export default defineComponent({
  async run({ steps, $ }) {
    const axios = require("axios");
    const leads = steps.fetch_leads.$return_value; // from previous step
    const validatedLeads = [];

    for (const lead of leads) {
      if (!lead.email) continue;
      try {
        const res = await axios.post(
          "https://app.scrapercity.com/api/v1/email-validator",
          { email: lead.email },
          {
            headers: {
              Authorization: "Bearer " + process.env.SCRAPERCITY_API_KEY,
              "Content-Type": "application/json",
            },
          }
        );
        if (res.data.deliverable === true) {
          validatedLeads.push(lead);
        }
      } catch (err) {
        console.log("Validation error for " + lead.email, err.message);
      }
    }

    return validatedLeads;
  },
})
6

Route leads to your destination

Add a destination step after your data is collected and optionally validated. Pipedream has pre-built actions for common destinations - click the + button and search by app name:

  • Google Sheets - use the "Add Single Row" action to append each lead. Map fields from the ScraperCity response (name, email, title, company, phone) to your sheet columns.
  • HubSpot - use "Create Contact" to push validated leads into your CRM pipeline with custom properties.
  • Pipedrive - create Persons and Deals from scraped contacts with a single action step.
  • Airtable - insert leads into a base for review and tagging before outreach.
  • Slack - post a summary message to a channel when a scrape completes, with lead count and a link to the results sheet.
  • Instantly / Smartlead / Lemlist - POST validated leads directly to your cold email tool's API using an HTTP Request step.

ScraperCity API endpoints for Pipedream

Every ScraperCity scraper is accessible via the same base URL (https://app.scrapercity.com/api/v1) with Bearer token authentication. The table below shows the endpoints most commonly used in Pipedream workflows.

EndpointWhat it returnsCostDelivery
GET /apolloB2B contacts from Apollo by title, industry, location$0.0039/lead11-48+ hrs
GET /database/leads3M+ B2B contacts, instant query (requires $649/mo plan)Included in planInstant
GET /google-mapsLocal businesses with phone, email, reviews, website$0.01/place5-30 min
POST /email-validatorDeliverability, MX records, catch-all detection$0.0036/email1-10 min
POST /email-finderBusiness email from name + company domain$0.05/contact1-10 min
POST /mobile-finderPhone numbers from LinkedIn URL or email$0.25/input1-5 min
GET /store-leadsShopify/WooCommerce stores with contacts$0.0039/leadInstant
GET /status/:runIdPoll the status of an async scrape jobFreeInstant
GET /download/:runIdDownload CSV results for a completed scrapeFreeInstant

All endpoints use Authorization: Bearer YOUR_API_KEY in the request header. Apollo scrapes are asynchronous and delivered in 11-48+ hours. For async scrapers, use the Status endpoint to poll job completion or configure a webhook at app.scrapercity.com/dashboard/webhooks to receive a POST notification when results are ready.

Handling async scrapes (Apollo and long-running jobs)

Some ScraperCity scrapers - most notably Apollo - are asynchronous. When you POST a scrape request, the API returns a runId immediately but results are not available for 11-48+ hours. Pipedream cron-triggered workflows have a maximum execution time, so you cannot poll inside a single workflow run for an async scrape. There are two reliable patterns for handling this:

Pattern 1 - Webhook notification (recommended)

Configure a webhook URL in ScraperCity's dashboard (app.scrapercity.com/dashboard/webhooks). Set the URL to a Pipedream HTTP-triggered workflow. When your scrape completes, ScraperCity POSTs the results to that URL and Pipedream fires the workflow automatically - no polling needed.

// Workflow A: Trigger the scrape (Cron trigger)
export default defineComponent({
  async run({ steps, $ }) {
    const axios = require("axios");
    const res = await axios.post(
      "https://app.scrapercity.com/api/v1/apollo",
      {
        title: "VP of Engineering",
        industry: "saas",
        limit: 500,
      },
      {
        headers: {
          Authorization: "Bearer " + process.env.SCRAPERCITY_API_KEY,
          "Content-Type": "application/json",
        },
      }
    );
    // Store the runId to track status if needed
    return { runId: res.data.runId };
  },
});

// Workflow B: Receive webhook when complete (HTTP trigger)
// ScraperCity POSTs results to this workflow's URL
// steps.trigger.event.body contains the lead data

Pattern 2 - Polling with Status endpoint

For shorter async scrapers (Google Maps, Email Finder - typically 1-30 minutes), you can use a second scheduled workflow that polls the GET /api/v1/status/:runId endpoint every few minutes. When the status returns completed, call the Download endpoint to retrieve results and route them to your destination.

export default defineComponent({
  async run({ steps, $ }) {
    const axios = require("axios");
    const runId = process.env.PENDING_RUN_ID; // store this after triggering

    const status = await axios.get(
      `https://app.scrapercity.com/api/v1/status/${runId}`,
      {
        headers: { Authorization: "Bearer " + process.env.SCRAPERCITY_API_KEY },
      }
    );

    if (status.data.status !== "completed") {
      $.flow.exit("Scrape not ready yet - will retry on next cron tick");
    }

    const results = await axios.get(
      `https://app.scrapercity.com/api/v1/download/${runId}`,
      {
        headers: { Authorization: "Bearer " + process.env.SCRAPERCITY_API_KEY },
      }
    );

    return results.data;
  },
})

Troubleshooting common errors

These are the most common issues when integrating ScraperCity with Pipedream, and how to fix each one.

401 Unauthorized

Why it happens: The Authorization header is missing or the API key is wrong.

Fix: Confirm the header is set to Authorization: Bearer YOUR_KEY with no typos. In a code step, verify process.env.SCRAPERCITY_API_KEY returns the correct value by logging it once (then remove the log - do not leave API keys printing to Pipedream's Inspector logs).

429 Too Many Requests

Why it happens: You are sending requests faster than the allowed rate.

Fix: The ScraperCity Lead Database endpoint allows up to 100,000 leads per day at 100 per page. Add a short delay between page requests in your loop if you are paginating at very high speed. Pipedream itself rate-limits HTTP triggers to an average of 10 requests per second - use throttle controls in Workflow Settings if fanning out large batches.

Workflow timeout error (red in Inspector)

Why it happens: Your pagination loop takes longer than Pipedream's execution limit. Cron workflows default to 60 seconds.

Fix: Split the work across multiple workflow runs. Scrape one page range per cron tick, storing the current page in an external state store (e.g. a single-cell Google Sheet or a Pipedream data store). Alternatively, use the webhook pattern for async scrapers so no polling loop is needed inside a single run.

Duplicate request blocked (ScraperCity 30-second dedup)

Why it happens: ScraperCity blocks identical requests made within 30 seconds to prevent accidental double charges.

Fix: This is expected behavior. If you are retrying a failed step, wait at least 30 seconds before resending. Vary at least one query parameter (e.g. page number) if you need to send multiple requests quickly.

process.env.SCRAPERCITY_API_KEY returns undefined

Why it happens: The environment variable is being referenced outside of the defineComponent export function, or it was not saved correctly.

Fix: Confirm the variable was saved at Settings > Environment Variables in Pipedream. Ensure your code references process.env inside the run function body. process.env returns undefined when called at the module level outside of defineComponent.

Empty data array returned

Why it happens: The filter parameters returned no matching contacts, or the scrape is still processing.

Fix: For async scrapers (Apollo), the data will not be available until the scrape completes (11-48+ hours). Check the run status using GET /api/v1/status/:runId. For synchronous scrapers, loosen your filter criteria - try removing one filter at a time to identify which constraint is too narrow.

Performance tips

Pipedream vs other automation platforms for ScraperCity

ScraperCity's API works with any HTTP-capable automation tool. Here is how the main options compare for lead scraping workflows specifically.

PlatformCode stepsPagination supportHostingBest for
PipedreamNode.js + PythonFull loop controlCloud (managed)Devs who want code + 2,000+ integrations
n8nJavaScript function nodeFull loop controlSelf-hosted or cloudTeams wanting self-hosted control + visual builder
ZapierCode step (limited)No native paginationCloud (managed)No-code single-step triggers, simple routing
Make (Integromat)LimitedIterator moduleCloud (managed)Visual scenario builder, moderate complexity

For bulk lead scraping with pagination, data transformation, and conditional routing, Pipedream or n8n are the strongest choices. Both give you the code access needed to loop through ScraperCity's paginated API responses and handle async scrape jobs correctly.

FAQ

Get API access to ScraperCity: