Skip to content

Web Scraping

TendSocial includes web scraping capabilities for extracting content from URLs.

Use Cases

  1. Blog Import: Extract content from blog posts for repurposing
  2. Website Analysis: Analyze brand websites for AI context
  3. Link Preview: Generate rich previews for shared links
  4. Competitor Analysis: Extract public content for reference

API Endpoints

POST /api/scrape-url

Scrape content from a URL.

typescript
// Request
{
  url: string,           // Full URL to scrape
  extractType?: "article" | "metadata" | "full"
}

// Response
{
  title: string,
  description: string,
  content: string,       // Main text content
  images: string[],      // Image URLs found
  author?: string,
  publishDate?: string,
  favicon?: string,
  ogImage?: string,
  siteName?: string
}

POST /api/scrape-website

Analyze a website for brand context.

typescript
// Request
{ url: string }

// Response
{
  title: string,
  description: string,
  industry?: string,
  keywords: string[],
  socialLinks: { platform: string, url: string }[],
  colors?: string[],     // Extracted brand colors
  logoUrl?: string
}

Technical Implementation

Scraping Library

Uses cheerio for HTML parsing:

typescript
import * as cheerio from 'cheerio';

const html = await fetch(url).then(r => r.text());
const $ = cheerio.load(html);

const title = $('title').text();
const content = $('article').text();

Rate Limiting

  • Max 10 scrapes per minute per company
  • Cached results for 1 hour per URL

Error Handling

Common failure modes:

  • Blocked: Site blocks scrapers (403)
  • Timeout: Site too slow (10s limit)
  • Invalid: URL returns non-HTML
  • Protected: Paywalled content

Security

  • URLs are validated before scraping
  • Private IPs (127.x, 10.x, etc.) are blocked
  • User-Agent is set to identify TendSocial
  • SSRF protections in place

Database Schema

prisma
model ScrapeCache {
  id        String   @id
  url       String   @unique
  data      Json
  scrapedAt DateTime
  expiresAt DateTime
}

TendSocial Documentation