OpenClaw Skillv1.0.1

Deep Scraper

opsunby opsun
Deploy on EasyClawdfrom $14.9/mo

Performs deep scraping of complex sites like YouTube using containerized Crawlee, extracting validated, ad-free transcripts and content as JSON output.

How to use this skill

OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours — no server setup needed.

  1. Sign up on EasyClawd (2 minutes)
  2. Connect your Telegram bot
  3. Install Deep Scraper from the skills panel
Get started — from $14.9/mo
7stars
6,949downloads
46installs
0comments
2versions

Latest Changelog

Included Dockerfile in the package

Tags

latest: 1.0.1

Skill Documentation

# Skill: deep-scraper

## Overview
A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data.

## Requirements
1.  **Docker**: Must be installed and running on the host machine.
2.  **Image**: Build the environment with the tag `clawd-crawlee`.
    *   Build command: `docker build -t clawd-crawlee skills/deep-scraper/`

## Integration Guide
Simply copy the `skills/deep-scraper` directory into your `skills/` folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment.

## Standard Interface (CLI)
```bash
docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets clawd-crawlee node assets/main_handler.js [TARGET_URL]
```

## Output Specification (JSON)
The scraping results are printed to stdout as a JSON string:
- `status`: SUCCESS | PARTIAL | ERROR
- `type`: TRANSCRIPT | DESCRIPTION | GENERIC
- `videoId`: (For YouTube) The validated Video ID.
- `data`: The core text content or transcript.

## Core Rules
1.  **ID Validation**: All YouTube tasks MUST verify the Video ID to prevent cache contamination.
2.  **Privacy**: Strictly forbidden from scraping password-protected or non-public personal information.
3.  **Alpha-Focused**: Automatically strips ads and noise, delivering pure data optimized for LLM processing.
Security scan, version history, and community comments: view on ClawHub