Generate tech news digests with unified source model, quality scoring, and multi-format output. Six-source data collection from RSS feeds, Twitter/X KOLs, Gi...
OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours — no server setup needed.
BLOG_PICKS_COUNT placeholder, SKILL.md alignment, cron docs update
---
name: tech-news-digest
description: Generate tech news digests with unified source model, quality scoring, and multi-format output. Six-source data collection from RSS feeds, Twitter/X KOLs, GitHub releases, GitHub Trending, Reddit, and web search. Pipeline-based scripts with retry mechanisms and deduplication. Supports Discord, email, and markdown templates.
version: "3.14.0"
homepage: https://github.com/draco-agent/tech-news-digest
source: https://github.com/draco-agent/tech-news-digest
metadata:
openclaw:
requires:
bins: ["python3"]
optionalBins: ["mail", "msmtp", "gog", "gh", "openssl", "weasyprint"]
env:
- name: TWITTER_API_BACKEND
required: false
description: "Twitter API backend: 'official', 'twitterapiio', or 'auto' (default: auto)"
- name: X_BEARER_TOKEN
required: false
description: Twitter/X API bearer token for KOL monitoring (official backend)
- name: TWITTERAPI_IO_KEY
required: false
description: twitterapi.io API key for KOL monitoring (twitterapiio backend)
- name: TAVILY_API_KEY
required: false
description: Tavily Search API key (alternative to Brave)
- name: WEB_SEARCH_BACKEND
required: false
description: "Web search backend: auto (default), brave, or tavily"
- name: BRAVE_API_KEYS
required: false
description: Brave Search API keys (comma-separated for rotation)
- name: BRAVE_API_KEY
required: false
description: Brave Search API key (single key fallback)
- name: GITHUB_TOKEN
required: false
description: GitHub token for higher API rate limits (auto-generated from GitHub App if not set)
- name: GH_APP_ID
required: false
description: GitHub App ID for automatic installation token generation
- name: GH_APP_INSTALL_ID
required: false
description: GitHub App Installation ID for automatic token generation
- name: GH_APP_KEY_FILE
required: false
description: Path to GitHub App private key PEM file
tools:
- python3: Required. Runs data collection and merge scripts.
- mail: Optional. msmtp-based mail command for email delivery (preferred).
- gog: Optional. Gmail CLI for email delivery (fallback if mail not available).
files:
read:
- config/defaults/: Default source and topic configurations
- references/: Prompt templates and output templates
- scripts/: Python pipeline scripts
- <workspace>/archive/tech-news-digest/: Previous digests for dedup
write:
- /tmp/td-*.json: Temporary pipeline intermediate outputs
- /tmp/td-email.html: Temporary email HTML body
- /tmp/td-digest.pdf: Generated PDF digest
- <workspace>/archive/tech-news-digest/: Saved digest archives
---
# Tech News Digest
Automated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation.
## Quick Start
1. **Configuration Setup**: Default configs are in `config/defaults/`. Copy to workspace for customization:
```bash
mkdir -p workspace/config
cp config/defaults/sources.json workspace/config/tech-news-digest-sources.json
cp config/defaults/topics.json workspace/config/tech-news-digest-topics.json
```
2. **Environment Variables**:
- `TWITTERAPI_IO_KEY` - twitterapi.io API key (optional, preferred)
- `X_BEARER_TOKEN` - Twitter/X official API bearer token (optional, fallback)
- `TAVILY_API_KEY` - Tavily Search API key, alternative to Brave (optional)
- `WEB_SEARCH_BACKEND` - Web search backend: auto|brave|tavily (optional, default: auto)
- `BRAVE_API_KEYS` - Brave Search API keys, comma-separated for rotation (optional)
- `BRAVE_API_KEY` - Single Brave key fallback (optional)
- `GITHUB_TOKEN` - GitHub personal access token (optional, improves rate limits)
3. **Generate Digest**:
```bash
# Unified pipeline (recommended) — runs all 6 sources in parallel + merge
python3 scripts/run-pipeline.py \
--defaults config/defaults \
--config workspace/config \
--hours 48 --freshness pd \
--archive-dir workspace/archive/tech-news-digest/ \
--output /tmp/td-merged.json --verbose --force
```
4. **Use Templates**: Apply Discord, email, or PDF templates to merged output
## Configuration Files
### `sources.json` - Unified Data Sources
```json
{
"sources": [
{
"id": "openai-rss",
"type": "rss",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"enabled": true,
"priority": true,
"topics": ["llm", "ai-agent"],
"note": "Official OpenAI updates"
},
{
"id": "sama-twitter",
"type": "twitter",
"name": "Sam Altman",
"handle": "sama",
"enabled": true,
"priority": true,
"topics": ["llm", "frontier-tech"],
"note": "OpenAI CEO"
}
]
}
```
### `topics.json` - Enhanced Topic Definitions
```json
{
"topics": [
{
"id": "llm",
"emoji": "🧠",
"label": "LLM / Large Models",
"description": "Large Language Models, foundation models, breakthroughs",
"search": {
"queries": ["LLM latest news", "large language model breakthroughs"],
"must_include": ["LLM", "large language model", "foundation model"],
"exclude": ["tutorial", "beginner guide"]
},
"display": {
"max_items": 8,
"style": "detailed"
}
}
]
}
```
## Scripts Pipeline
### `run-pipeline.py` - Unified Pipeline (Recommended)
```bash
python3 scripts/run-pipeline.py \
--defaults config/defaults [--config CONFIG_DIR] \
--hours 48 --freshness pd \
--archive-dir workspace/archive/tech-news-digest/ \
--output /tmp/td-merged.json --verbose --force
```
- **Features**: Runs all 6 fetch steps in parallel, then merges + deduplicates + scores
- **Output**: Final merged JSON ready for report generation (~30s total)
- **Metadata**: Saves per-step timing and counts to `*.meta.json`
- **GitHub Auth**: Auto-generates GitHub App token if `$GITHUB_TOKEN` not set
- **Fallback**: If this fails, run individual scripts below
### Individual Scripts (Fallback)
#### `fetch-rss.py` - RSS Feed Fetcher
```bash
python3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]
```
- Parallel fetching (10 workers), retry with backoff, feedparser + regex fallback
- Timeout: 30s per feed, ETag/Last-Modified caching
#### `fetch-twitter.py` - Twitter/X KOL Monitor
```bash
python3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--backend auto|official|twitterapiio]
```
- Backend auto-detection: uses twitterapi.io if `TWITTERAPI_IO_KEY` set, else official X API v2 if `X_BEARER_TOKEN` set
- Rate limit handling, engagement metrics, retry with backoff
#### `fetch-web.py` - Web Search Engine
```bash
python3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]
```
- Auto-detects Brave API rate limit: paid plans → parallel queries, free → sequential
- Without API: generates search interface for agents
#### `fetch-github.py` - GitHub Releases Monitor
```bash
python3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]
```
- Parallel fetching (10 workers), 30s timeout
- Auth priority: `$GITHUB_TOKEN` → GitHub App auto-generate → `gh` CLI → unauthenticated (60 req/hr)
#### `fetch-github.py --trending` - GitHub Trending Repos
```bash
python3 scripts/fetch-github.py --trending [--hours 48] [--output FILE] [--verbose]
```
- Searches GitHub API for trending repos across 4 topics (LLM, AI Agent, Crypto, Frontier Tech)
- Quality scoring: base 5 + daily_stars_est / 10, max 15
#### `fetch-reddit.py` - Reddit Posts Fetcher
```bash
python3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]
```
- Parallel fetching (4 workers), public JSON API (no auth required)
- 13 subreddits with score filtering
#### `enrich-articles.py` - Article Full-Text Enrichment
```bash
python3 scripts/enrich-articles.py --input mRead full documentation on ClawHub