logo
API GuidesWeb Scraping

Web Scraping APIs

Web data extraction endpoints — meta tags, tech stack, pixels, full-page scrape, emails, sitemap, the flagship website_intelligence report, plus search and directory tools.

All web scraping endpoints require the x-api-key header. Most accept a cache parameter (default true) which returns cached results when available. Set cache_bust: true to force a fresh fetch.


Meta Tags

Extract Open Graph, Twitter Card, SEO meta tags, hreflang, favicon, and canonical tags from a URL.

Endpoint: POST /web_meta_tags Credits: 1 per call | Quota: 100,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_meta_tags \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL (must start with http/https)
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce a fresh fetch

Response

{
  "title": "TexAu — Growth Automation",
  "description": "Automate your growth with TexAu.",
  "og": {
    "title": "TexAu",
    "description": "Automate your growth.",
    "image": "https://texau.com/og-image.png",
    "url": "https://texau.com",
    "type": "website"
  },
  "twitter": {
    "card": "summary_large_image",
    "title": "TexAu",
    "description": "Automate your growth."
  },
  "canonical": "https://texau.com/",
  "hreflang": [],
  "favicon": "https://texau.com/favicon.ico",
  "meta": {
    "viewport": "width=device-width, initial-scale=1",
    "robots": "index, follow"
  }
}

JSON-LD

Extract all JSON-LD structured data blocks from a page (Schema.org, breadcrumbs, FAQPage, HowTo, etc).

Endpoint: POST /web_json_ld Credits: 1 per call | Quota: 100,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_json_ld \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce a fresh fetch

Response

{
  "json_ld": [
    {
      "@context": "https://schema.org",
      "@type": "Organization",
      "name": "TexAu",
      "url": "https://texau.com",
      "logo": "https://texau.com/logo.png"
    },
    {
      "@context": "https://schema.org",
      "@type": "WebSite",
      "url": "https://texau.com",
      "potentialAction": {
        "@type": "SearchAction",
        "target": "https://texau.com/search?q={query}"
      }
    }
  ],
  "count": 2
}

Tracking Pixels

Detect analytics and tracking pixels on a page — Google Analytics, Facebook Pixel, LinkedIn Insight, HubSpot, Segment, Hotjar, and 30+ others.

Endpoint: POST /web_pixels Credits: 1 per call | Quota: 100,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_pixels \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce a fresh fetch

Response

{
  "pixels": {
    "google_analytics": {
      "detected": true,
      "ids": ["G-XXXXXXXXXX"]
    },
    "facebook_pixel": {
      "detected": true,
      "ids": ["123456789012345"]
    },
    "linkedin_insight": {
      "detected": false,
      "ids": []
    },
    "hubspot": {
      "detected": true,
      "ids": ["XXXXXXX"]
    },
    "hotjar": {
      "detected": false,
      "ids": []
    },
    "segment": {
      "detected": false,
      "ids": []
    }
  },
  "total_detected": 3
}

Full Page Scrape

Scrape a URL and return its content in HTML, Markdown, or as a list of links.

Endpoint: POST /web_scrape Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://texau.com",
    "formats": ["html", "markdown"],
    "fast_mode": false
  }'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
formatsstring[]No["html","markdown"]Output formats: html, markdown, links
fast_modebooleanNofalseUse a faster scrape method (may reduce quality)
cachebooleanNotrueReturn cached result if available

Response

{
  "url": "https://texau.com",
  "formats": {
    "html": "<html>...</html>",
    "markdown": "# TexAu

Automate your growth...",
    "links": [
      "https://texau.com/pricing",
      "https://texau.com/features"
    ]
  },
  "metadata": {
    "title": "TexAu — Growth Automation",
    "status_code": 200,
    "content_type": "text/html"
  }
}

Extract all social media profile links from a website, grouped by platform.

Endpoint: POST /web_social_links Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_social_links \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
cachebooleanNotrueReturn cached result if available

Response

{
  "linkedin": ["https://www.linkedin.com/company/texau"],
  "twitter": ["https://twitter.com/texau"],
  "facebook": [],
  "instagram": [],
  "youtube": ["https://www.youtube.com/c/texau"],
  "github": ["https://github.com/texau"]
}

Tech Stack

Detect the technology stack of a website — CMS, frameworks, CDN, analytics, chat widgets, and more.

Endpoint: POST /web_tech_stack Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_tech_stack \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
cachebooleanNotrueReturn cached result if available

Response

{
  "technologies": [
    {
      "name": "React",
      "category": "JavaScript Framework",
      "confidence": 100,
      "version": null
    },
    {
      "name": "Next.js",
      "category": "JavaScript Framework",
      "confidence": 95,
      "version": "14.x"
    },
    {
      "name": "Cloudflare",
      "category": "CDN",
      "confidence": 100,
      "version": null
    }
  ],
  "count": 3
}

Web Emails

Crawl a website and extract all email addresses found across multiple pages.

Endpoint: POST /web_emails Credits: 2 per call | Quota: 25,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_emails \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://texau.com",
    "max_pages": 5
  }'
FieldTypeRequiredDefaultDescription
urlstringYesStarting URL for the crawl
max_pagesintegerNo5Maximum pages to crawl (1–50)
cachebooleanNotrueReturn cached result if available

Response

{
  "emails": [
    "[email protected]",
    "[email protected]"
  ],
  "count": 2,
  "pages_crawled": 5
}

Sitemap

Discover URLs from a website's sitemap — via sitemap.xml, robots.txt, or crawl discovery.

Endpoint: POST /web_sitemap Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/web_sitemap \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://texau.com"}'
FieldTypeRequiredDefaultDescription
urlstringYesDomain or sitemap URL
cachebooleanNotrueReturn cached result if available

Response

{
  "urls": [
    "https://texau.com/",
    "https://texau.com/pricing",
    "https://texau.com/features",
    "https://texau.com/blog"
  ],
  "count": 4,
  "source": "sitemap.xml"
}

Website Intelligence

Comprehensive website report combining all scraping modules plus DNS records, SSL certificate, HTTP headers, SEO score, and more.

Endpoint: POST /website_intelligence Credits: 5 per call | Quota: 25,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/website_intelligence \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://texau.com",
    "include": ["meta_tags", "tech_stack", "dns", "ssl"]
  }'
FieldTypeRequiredDefaultDescription
urlstringYesTarget URL
includestring[]NoAll modulesModules to include in the report
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce a fresh fetch

Available modules: meta_tags, tech_stack, emails, social_links, pixels, json_ld, dns, ssl, headers

Omit include to run all modules.

Response

{
  "url": "https://texau.com",
  "meta_tags": { "title": "TexAu", "description": "..." },
  "tech_stack": { "technologies": [...] },
  "emails": { "emails": [...] },
  "social_links": { "linkedin": [...] },
  "pixels": { "google_analytics": { "detected": true } },
  "json_ld": { "json_ld": [...] },
  "dns": {
    "a": ["104.21.0.1"],
    "mx": ["mail.texau.com"],
    "txt": ["v=spf1 include:..."]
  },
  "ssl": {
    "valid": true,
    "issuer": "Let's Encrypt",
    "expires_at": "2026-09-01",
    "days_remaining": 180
  },
  "headers": {
    "x-powered-by": null,
    "strict-transport-security": "max-age=31536000",
    "content-security-policy": "..."
  }
}

Query Bing web search and return structured results.

Endpoint: POST /search_bing Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/search_bing \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "TexAu growth automation",
    "num_results": 10,
    "page": 1
  }'
FieldTypeRequiredDefaultDescription
querystringYesSearch query
num_resultsintegerNo10Number of results (1–50)
pageintegerNo1Page number (1–20)
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce a fresh search

Response

{
  "query": "TexAu growth automation",
  "results": [
    {
      "title": "TexAu — Growth Automation Platform",
      "url": "https://texau.com",
      "description": "Automate your growth with TexAu.",
      "position": 1
    }
  ],
  "total_results": 10,
  "page": 1
}

Fetch Google Trends interest over time, interest by region, and related queries for a keyword.

Endpoint: POST /search_google_trends Credits: 1 per call | Quota: 25,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/search_google_trends \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "keyword": "growth hacking",
    "geo": "US",
    "timeframe": "today 12-m"
  }'
FieldTypeRequiredDefaultDescription
keywordstringYesKeyword to analyze
geostringNo"US"Country code (e.g. "US", "GB", "" for worldwide)
timeframestringNo"today 12-m"Time range in Google Trends format
cachebooleanNotrueReturn cached result if available
cache_bustbooleanNofalseForce fresh data

Common timeframe values: "today 1-m", "today 3-m", "today 12-m", "today 5-y", "all"

Response

{
  "keyword": "growth hacking",
  "geo": "US",
  "timeframe": "today 12-m",
  "interest_over_time": [
    { "date": "2025-03-01", "value": 72 },
    { "date": "2025-04-01", "value": 68 }
  ],
  "interest_by_region": [
    { "geoName": "California", "value": 100 },
    { "geoName": "New York", "value": 87 }
  ],
  "related_queries": {
    "top": [
      { "query": "growth hacking strategies", "value": 100 }
    ],
    "rising": [
      { "query": "ai growth hacking", "value": 5000 }
    ]
  }
}

Yellow Pages Directory

Search Yellow Pages for businesses by keyword and location.

Endpoint: POST /directory_yellowpages Credits: 1 per call | Quota: 50,000/month

Request

curl -X POST https://v3-api.texau.com/api/v1/directory_yellowpages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "software development",
    "location": "San Francisco, CA",
    "page": 1,
    "max_pages": 2
  }'
FieldTypeRequiredDefaultDescription
querystringYesBusiness category or keyword (1–300 chars)
locationstringYesCity, state, or zip code (1–200 chars)
pageintegerNo1Starting page (1–20)
max_pagesintegerNo1Number of pages to fetch (1–10)
cachebooleanNotrueReturn cached result if available

Response

{
  "query": "software development",
  "location": "San Francisco, CA",
  "results": [
    {
      "name": "Acme Software Inc.",
      "address": "123 Market St, San Francisco, CA 94105",
      "phone": "(415) 555-0100",
      "website": "https://acmesoftware.com",
      "categories": ["Software Development", "IT Services"],
      "rating": 4.5,
      "reviews_count": 32
    }
  ],
  "total_results": 45,
  "pages_fetched": 2
}

Slack Channel Members

Extract member lists from public Slack community channels.

Endpoint: POST /slack_channel_members Credits: 2 per call

Request

curl -X POST https://v3-api.texau.com/api/v1/slack_channel_members \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "channel_url": "https://community.slack.com/archives/C01234ABCDE"
  }'

Parameters

FieldTypeRequiredDescription
channel_urlstringYesFull Slack channel URL (public channels only)

Response

{
  "channel": "general",
  "members": [
    {
      "name": "Jane Smith",
      "title": "Engineering Lead",
      "username": "janesmith"
    }
  ],
  "total_members": 128
}