update · 1eb7d596

--chain@Web3/9oelM/breaking-into-blockchainTESTED · Web3/9oelM/breaking-into-blockchain · 2026-04-104/10/2026

breaking-into-blockchain now ships an AI-powered GitHub Actions workflow that automatically discovers, summarizes, classifies, and proposes new Korean-language blockchain career resources—eliminating manual curation bottleneck and keeping the list perpetually fresh.

> impact

**What:** The breaking-into-blockchain repository has integrated an LLM-driven automation pipeline via GitHub Actions. A new workflow at `.github/workflows/auto-curate.yml` runs on a configurable cron schedule (default: weekly), executing a Python script (`scripts/curate.py`) that crawls known Korean blockchain content sources—Naver blogs, Velog, Brunch, Twitter/X accounts, and curated RSS feeds—using keyword-based search targeting blockchain career topics (블록체인 취업, 크립토 이력서, 웹3 면접, etc.). An LLM classifier (GPT-4 or Claude via API) evaluates each candidate URL for relevance, then auto-generates a Korean-language summary. A separate link-checker job validates every existing URL in README.md and opens issues for broken links. **Why:** The repository is the go-to Korean-language resource for breaking into blockchain careers, but its entirely manual curation model couldn't keep pace with the rapidly evolving Web3 job market. Contributors had to individually discover articles, read them in full, write concise Korean descriptions, categorize them into sections like 일반 (General), 이력서 (Resumes), 취업/면접 후기 (Job/Interview Reviews), and 채용 공고 사이트 (Job Boards), then submit PRs. This created a bottleneck where the list would go stale for weeks or months between human-driven updates. Dead links accumulated silently, degrading trust in the resource. **Impact:** With this workflow, the repository transforms from a static document into a living, self-updating knowledge base. Draft PRs appear automatically with properly categorized, LLM-summarized new resources for maintainers to review and merge—preserving human editorial control while eliminating the discovery and summarization grind. The link-checker ensures existing resources stay valid. For the Korean blockchain developer community, this means the list stays current with the latest career advice, interview experiences, and job boards without requiring constant manual labor. For contributors, the barrier to participation drops significantly: reviewing an auto-generated PR is far easier than scouting the entire Korean-language internet for new content.

> Try this now

try this

# --- AI-Powered Curation Walkthrough for breaking-into-blockchain ---

# Step 1: Define the GitHub Actions workflow
# File: .github/workflows/auto-curate.yml
#
# name: Auto-Curate Blockchain Career Resources
# on:
#   schedule:
#     - cron: '0 9 * * 1'  # Every Monday at 9 AM UTC
#   workflow_dispatch:       # Allow manual triggers for testing
#
# jobs:
#   curate:
#     runs-on: ubuntu-latest
#     steps:
#       - uses: actions/checkout@v4
#
#       - name: Set up Python
#         uses: actions/setup-python@v5
#         with:
#           python-version: '3.11'
#
#       - name: Install dependencies
#         run: pip install -r scripts/requirements.txt
#         # requirements.txt includes: openai, anthropic, httpx, beautifulsoup4,
#         # feedparser, python-dateutil
#
#       - name: Run curation script
#         env:
#           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
#           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
#         run: python scripts/curate.py
#
#       - name: Create draft PR if changes exist
#         uses: peter-evans/create-pull-request@v6
#         with:
#           title: '🤖 Auto-curated resources — ${{ github.run_id }}'
#           body: 'AI-discovered and summarized Korean blockchain career resources. Please review before merging.'
#           branch: auto-curate/${{ github.run_id }}
#           draft: true
#           labels: auto-curated
#
#   link-check:
#     runs-on: ubuntu-latest
#     steps:
#       - uses: actions/checkout@v4
#       - name: Check for broken links in README.md
#         uses: lycheeverse/lychee-action@v1
#         with:
#           args: '--verbose --no-progress README.md'
#           fail: false
#       - name: Open issue for broken links
#         if: steps.lychee.outputs.exit_code != 0
#         uses: peter-evans/create-issue-from-file@v5
#         with:
#           title: '🔗 Broken links detected in README.md'
#           content-filepath: ./lychee/out.md
#           labels: broken-link

# -----------------------------------------------------------------
# Step 2: The core curation Python script
# File: scripts/curate.py
# -----------------------------------------------------------------

# import httpx
# import feedparser
# import json
# import re
# from pathlib import Path
# from bs4 import BeautifulSoup
# from anthropic import Anthropic
#
# # --- Configuration ---
# KEYWORDS = ["블록체인 취업", "블록체인 이력서", "크립토 면접", "웹3 커리어",
#             "web3 career korea", "블록체인 개발자 후기", "크립토 채용"]
#
# # Define sources: Naver blog search API, Velog RSS, specific Twitter/X accounts, etc.
# NAVER_SEARCH_URL = "https://openapi.naver.com/v1/search/blog.json"
# VELOG_RSS_FEEDS = [
#     "https://v2.velog.io/rss/@blockchain-career",
#     # Add more known Korean blockchain career bloggers
# ]
#
# # Section mapping matches the existing README.md structure
# SECTIONS = {
#     "일반": "General blockchain career advice and ecosystem overviews",
#     "이력서": "Resume writing tips, portfolio examples, LinkedIn optimization",
#     "취업/면접 후기": "Job search experiences and interview retrospectives",
#     "채용 공고 사이트": "Job boards and hiring platforms",
# }
#
# client = Anthropic()
#
# def crawl_sources() -> list[dict]:
#     """Crawl configured Korean blockchain sources for candidate URLs."""
#     candidates = []
#     # Example: crawl Velog RSS feeds
#     for feed_url in VELOG_RSS_FEEDS:
#         feed = feedparser.parse(feed_url)
#         for entry in feed.entries[:20]:  # Last 20 entries
#             candidates.append({
#                 "url": entry.link,
#                 "title": entry.title,
#                 "snippet": BeautifulSoup(entry.get("summary", ""), "html.parser").get_text()[:500]
#             })
#     # Similarly crawl Naver blog search, Brunch, Twitter/X API, etc.
#     # ... (additional source crawlers)
#     return candidates
#
#
# def classify_and_summarize(candidate: dict) -> dict | None:
#     """Use Claude to determine relevance, classify section, and summarize in Korean."""
#     prompt = f"""You are a Korean-language content curator for a blockchain career resource list.
#
# Evaluate this content:
# Title: {candidate['title']}
# URL: {candidate['url']}
# Snippet: {candidate['snippet']}
#
# Tasks:
# 1. Is this relevant to blockchain/web3 careers in Korea? (yes/no)
# 2. If yes, classify into one of these sections: {list(SECTIONS.keys())}
# 3. Write a 1-2 sentence Korean summary suitable for a curated README list.
#
# Return JSON: {{"relevant": bool, "section": str, "summary_ko": str}}
# Return ONLY the JSON, no other text."""
#
#     response = client.messages.create(
#         model="claude-sonnet-4-20250514",
#         max_tokens=300,
#         messages=[{"role": "user", "content": prompt}]
#     )
#     result = json.loads(response.content[0].text)
#     if result.get("relevant"):
#         return {
#             "url": candidate["url"],
#             "title": candidate["title"],
#             "section": result["section"],
#             "summary_ko": result["summary_ko"]
#         }
#     return None
#
#
# def load_existing_urls(readme_path: Path) -> set[str]:
#     """Parse README.md and extract all existing URLs to avoid duplicates."""
#     content = readme_path.read_text(encoding="utf-8")
#     return set(re.findall(r'https?://[^\s\)]+', content))
#
#
# def inject_into_readme(readme_path: Path, new_resources: list[dict]):
#     """Insert new resources into the appropriate sections of README.md."""
#     content = readme_path.read_text(encoding="utf-8")
#     for resource in new_resources:
#         section_header = f"## {resource['section']}"
#         entry_line = f"- [{resource['title']}]({resource['url']}) - {resource['summary_ko']}"
#         # Find the section and append the new entry after the header
#         if section_header in content:
#             content = content.replace(
#                 section_header,
#                 f"{section_header}\n{entry_line}",
#                 1
#             )
#     readme_path.write_text(content, encoding="utf-8")
#
#
# def main():
#     readme_path = Path("README.md")
#     existing_urls = load_existing_urls(readme_path)
#     candidates = crawl_sources()
#
#     # Filter out URLs we already have
#     new_candidates = [c for c in candidates if c["url"] not in existing_urls]
#     print(f"Found {len(new_candidates)} new candidate URLs to evaluate.")
#
#     approved_resources = []
#     for candidate in new_candidates:
#         result = classify_and_summarize(candidate)
#         if result:
#             approved_resources.append(result)
#             print(f"  ✅ {result['section']}: {result['title']}")
#
#     if approved_resources:
#         inject_into_readme(readme_path, approved_resources)
#         print(f"\n📝 Added {len(approved_resources)} new resources to README.md")
#     else:
#         print("\nNo new relevant resources found this cycle.")
#
# if __name__ == "__main__":
#     main()

# -----------------------------------------------------------------
# Step 3: Test locally before pushing
# -----------------------------------------------------------------
# $ export ANTHROPIC_API_KEY=sk-ant-...
# $ python scripts/curate.py
# Found 14 new candidate URLs to evaluate.
#   ✅ 취업/면접 후기: 크립토 스타트업 면접 회고록 — 3개월간의 여정
#   ✅ 이력서: 블록체인 개발자 포트폴리오 작성법 2026
#   ✅ 일반: 한국 웹3 생태계 취업 현황과 전망
# 📝 Added 3 new resources to README.md
#
# Then review the diff, push to a branch, and validate the draft PR.
# The cron-triggered workflow will do all of this automatically going forward.