Web Scraping APIs
Web data extraction endpoints — meta tags, tech stack, pixels, full-page scrape, emails, sitemap, the flagship website_intelligence report, plus search and directory tools.
All web scraping endpoints require the x-api-key header. Most accept a cache parameter (default true) which returns cached results when available. Set cache_bust: true to force a fresh fetch.
Meta Tags
Extract Open Graph, Twitter Card, SEO meta tags, hreflang, favicon, and canonical tags from a URL.
Endpoint: POST /web_meta_tags
Credits: 1 per call | Quota: 100,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_meta_tags \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL (must start with http/https) |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force a fresh fetch |
Response
{
"title": "TexAu — Growth Automation",
"description": "Automate your growth with TexAu.",
"og": {
"title": "TexAu",
"description": "Automate your growth.",
"image": "https://texau.com/og-image.png",
"url": "https://texau.com",
"type": "website"
},
"twitter": {
"card": "summary_large_image",
"title": "TexAu",
"description": "Automate your growth."
},
"canonical": "https://texau.com/",
"hreflang": [],
"favicon": "https://texau.com/favicon.ico",
"meta": {
"viewport": "width=device-width, initial-scale=1",
"robots": "index, follow"
}
}
JSON-LD
Extract all JSON-LD structured data blocks from a page (Schema.org, breadcrumbs, FAQPage, HowTo, etc).
Endpoint: POST /web_json_ld
Credits: 1 per call | Quota: 100,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_json_ld \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force a fresh fetch |
Response
{
"json_ld": [
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "TexAu",
"url": "https://texau.com",
"logo": "https://texau.com/logo.png"
},
{
"@context": "https://schema.org",
"@type": "WebSite",
"url": "https://texau.com",
"potentialAction": {
"@type": "SearchAction",
"target": "https://texau.com/search?q={query}"
}
}
],
"count": 2
}
Tracking Pixels
Detect analytics and tracking pixels on a page — Google Analytics, Facebook Pixel, LinkedIn Insight, HubSpot, Segment, Hotjar, and 30+ others.
Endpoint: POST /web_pixels
Credits: 1 per call | Quota: 100,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_pixels \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force a fresh fetch |
Response
{
"pixels": {
"google_analytics": {
"detected": true,
"ids": ["G-XXXXXXXXXX"]
},
"facebook_pixel": {
"detected": true,
"ids": ["123456789012345"]
},
"linkedin_insight": {
"detected": false,
"ids": []
},
"hubspot": {
"detected": true,
"ids": ["XXXXXXX"]
},
"hotjar": {
"detected": false,
"ids": []
},
"segment": {
"detected": false,
"ids": []
}
},
"total_detected": 3
}
Full Page Scrape
Scrape a URL and return its content in HTML, Markdown, or as a list of links.
Endpoint: POST /web_scrape
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_scrape \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://texau.com",
"formats": ["html", "markdown"],
"fast_mode": false
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
formats | string[] | No | ["html","markdown"] | Output formats: html, markdown, links |
fast_mode | boolean | No | false | Use a faster scrape method (may reduce quality) |
cache | boolean | No | true | Return cached result if available |
Response
{
"url": "https://texau.com",
"formats": {
"html": "<html>...</html>",
"markdown": "# TexAu
Automate your growth...",
"links": [
"https://texau.com/pricing",
"https://texau.com/features"
]
},
"metadata": {
"title": "TexAu — Growth Automation",
"status_code": 200,
"content_type": "text/html"
}
}
Social Links
Extract all social media profile links from a website, grouped by platform.
Endpoint: POST /web_social_links
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_social_links \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
cache | boolean | No | true | Return cached result if available |
Response
{
"linkedin": ["https://www.linkedin.com/company/texau"],
"twitter": ["https://twitter.com/texau"],
"facebook": [],
"instagram": [],
"youtube": ["https://www.youtube.com/c/texau"],
"github": ["https://github.com/texau"]
}
Tech Stack
Detect the technology stack of a website — CMS, frameworks, CDN, analytics, chat widgets, and more.
Endpoint: POST /web_tech_stack
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_tech_stack \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
cache | boolean | No | true | Return cached result if available |
Response
{
"technologies": [
{
"name": "React",
"category": "JavaScript Framework",
"confidence": 100,
"version": null
},
{
"name": "Next.js",
"category": "JavaScript Framework",
"confidence": 95,
"version": "14.x"
},
{
"name": "Cloudflare",
"category": "CDN",
"confidence": 100,
"version": null
}
],
"count": 3
}
Web Emails
Crawl a website and extract all email addresses found across multiple pages.
Endpoint: POST /web_emails
Credits: 2 per call | Quota: 25,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_emails \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://texau.com",
"max_pages": 5
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Starting URL for the crawl |
max_pages | integer | No | 5 | Maximum pages to crawl (1–50) |
cache | boolean | No | true | Return cached result if available |
Response
{
"emails": [
"[email protected]",
"[email protected]"
],
"count": 2,
"pages_crawled": 5
}
Sitemap
Discover URLs from a website's sitemap — via sitemap.xml, robots.txt, or crawl discovery.
Endpoint: POST /web_sitemap
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/web_sitemap \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://texau.com"}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Domain or sitemap URL |
cache | boolean | No | true | Return cached result if available |
Response
{
"urls": [
"https://texau.com/",
"https://texau.com/pricing",
"https://texau.com/features",
"https://texau.com/blog"
],
"count": 4,
"source": "sitemap.xml"
}
Website Intelligence
Comprehensive website report combining all scraping modules plus DNS records, SSL certificate, HTTP headers, SEO score, and more.
Endpoint: POST /website_intelligence
Credits: 5 per call | Quota: 25,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/website_intelligence \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://texau.com",
"include": ["meta_tags", "tech_stack", "dns", "ssl"]
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL |
include | string[] | No | All modules | Modules to include in the report |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force a fresh fetch |
Available modules: meta_tags, tech_stack, emails, social_links, pixels, json_ld, dns, ssl, headers
Omit include to run all modules.
Response
{
"url": "https://texau.com",
"meta_tags": { "title": "TexAu", "description": "..." },
"tech_stack": { "technologies": [...] },
"emails": { "emails": [...] },
"social_links": { "linkedin": [...] },
"pixels": { "google_analytics": { "detected": true } },
"json_ld": { "json_ld": [...] },
"dns": {
"a": ["104.21.0.1"],
"mx": ["mail.texau.com"],
"txt": ["v=spf1 include:..."]
},
"ssl": {
"valid": true,
"issuer": "Let's Encrypt",
"expires_at": "2026-09-01",
"days_remaining": 180
},
"headers": {
"x-powered-by": null,
"strict-transport-security": "max-age=31536000",
"content-security-policy": "..."
}
}
Bing Search
Query Bing web search and return structured results.
Endpoint: POST /search_bing
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/search_bing \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "TexAu growth automation",
"num_results": 10,
"page": 1
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | — | Search query |
num_results | integer | No | 10 | Number of results (1–50) |
page | integer | No | 1 | Page number (1–20) |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force a fresh search |
Response
{
"query": "TexAu growth automation",
"results": [
{
"title": "TexAu — Growth Automation Platform",
"url": "https://texau.com",
"description": "Automate your growth with TexAu.",
"position": 1
}
],
"total_results": 10,
"page": 1
}
Google Trends
Fetch Google Trends interest over time, interest by region, and related queries for a keyword.
Endpoint: POST /search_google_trends
Credits: 1 per call | Quota: 25,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/search_google_trends \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"keyword": "growth hacking",
"geo": "US",
"timeframe": "today 12-m"
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
keyword | string | Yes | — | Keyword to analyze |
geo | string | No | "US" | Country code (e.g. "US", "GB", "" for worldwide) |
timeframe | string | No | "today 12-m" | Time range in Google Trends format |
cache | boolean | No | true | Return cached result if available |
cache_bust | boolean | No | false | Force fresh data |
Common timeframe values: "today 1-m", "today 3-m", "today 12-m", "today 5-y", "all"
Response
{
"keyword": "growth hacking",
"geo": "US",
"timeframe": "today 12-m",
"interest_over_time": [
{ "date": "2025-03-01", "value": 72 },
{ "date": "2025-04-01", "value": 68 }
],
"interest_by_region": [
{ "geoName": "California", "value": 100 },
{ "geoName": "New York", "value": 87 }
],
"related_queries": {
"top": [
{ "query": "growth hacking strategies", "value": 100 }
],
"rising": [
{ "query": "ai growth hacking", "value": 5000 }
]
}
}
Yellow Pages Directory
Search Yellow Pages for businesses by keyword and location.
Endpoint: POST /directory_yellowpages
Credits: 1 per call | Quota: 50,000/month
Request
curl -X POST https://v3-api.texau.com/api/v1/directory_yellowpages \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "software development",
"location": "San Francisco, CA",
"page": 1,
"max_pages": 2
}'
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | — | Business category or keyword (1–300 chars) |
location | string | Yes | — | City, state, or zip code (1–200 chars) |
page | integer | No | 1 | Starting page (1–20) |
max_pages | integer | No | 1 | Number of pages to fetch (1–10) |
cache | boolean | No | true | Return cached result if available |
Response
{
"query": "software development",
"location": "San Francisco, CA",
"results": [
{
"name": "Acme Software Inc.",
"address": "123 Market St, San Francisco, CA 94105",
"phone": "(415) 555-0100",
"website": "https://acmesoftware.com",
"categories": ["Software Development", "IT Services"],
"rating": 4.5,
"reviews_count": 32
}
],
"total_results": 45,
"pages_fetched": 2
}
Slack Channel Members
Extract member lists from public Slack community channels.
Endpoint: POST /slack_channel_members
Credits: 2 per call
Request
curl -X POST https://v3-api.texau.com/api/v1/slack_channel_members \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"channel_url": "https://community.slack.com/archives/C01234ABCDE"
}'
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
channel_url | string | Yes | Full Slack channel URL (public channels only) |
Response
{
"channel": "general",
"members": [
{
"name": "Jane Smith",
"title": "Engineering Lead",
"username": "janesmith"
}
],
"total_members": 128
}
Last updated 2 weeks ago
Built with Documentation.AI