Documentation Added: - ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements) - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures - ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures - test_production_deployment.py: Automated deployment verification script Key Documentation Highlights: - Detailed explanation of containerization limitations with browser automation - Complete deployment checklist with pre/post verification steps - Rollback scenarios with recovery time objectives - Emergency contact templates and backup procedures - Automated test script for production readiness 17 of 25 tasks completed (68% done) Remaining work focuses on spec compliance and testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
3.7 KiB
3.7 KiB
Architecture Decisions
Why Systemd Instead of Kubernetes/Docker
Decision
We chose to use systemd services for production deployment instead of the originally specified Kubernetes/Docker containerization.
Context
The original specification called for:
- Docker containerization with multi-stage builds
- Kubernetes deployment with CronJobs
- Running on a Kubernetes cluster control plane node
Problem
TikTok scraping using the Scrapling library requires:
- Display Server Access: Scrapling uses a real browser (Chromium) for JavaScript rendering
- X11/Wayland Session: Browser automation needs GUI environment variables (DISPLAY, XAUTHORITY)
- GPU Acceleration: Optional but improves performance for browser rendering
- Session Persistence: Browser cookies and local storage for authentication
Why Containers Don't Work
Technical Limitations
-
No Native Display Server: Containers don't have built-in X11/Wayland support
-
Complex Workarounds:
- X11 forwarding requires mounting
/tmp/.X11-unixsocket - Needs host network mode for display access
- Requires privileged mode for GPU access
- Security implications of running privileged containers
- X11 forwarding requires mounting
-
Environment Variables:
- DISPLAY and XAUTHORITY are host-specific
- Change between reboots
- Difficult to manage in container orchestration
-
Browser Automation Issues:
- Headless mode doesn't work for all TikTok features
- Virtual displays (Xvfb) are unreliable for modern web apps
- WebGL and video playback issues in virtual displays
Systemd Advantages
-
Native Environment Access:
- Direct access to host display server
- Can read environment variables from user session
- No abstraction layer complications
-
Simpler Configuration:
- Single service file vs Dockerfile + k8s manifests
- Easy to debug and troubleshoot
- Native logging with journald
-
Resource Management:
- CPU and memory limits via systemd
- Automatic restart on failure
- Built-in timer units for scheduling
-
Production Ready:
- Battle-tested for system services
- Excellent integration with Linux systems
- No additional overhead
Implementation
# systemd service can access display directly
[Service]
Environment="DISPLAY=:0"
Environment="XAUTHORITY=/run/user/1000/.Xauthority"
vs
# Docker requires complex workarounds
FROM python:3.11
# Need to install X11 libraries
RUN apt-get install xvfb x11vnc
# Run virtual display (unreliable)
CMD xvfb-run -a python scraper.py
Trade-offs
Lost Benefits of Containerization:
- Platform independence
- Easy scaling across nodes
- Isolated dependencies
- Reproducible builds
Gained Benefits:
- Simpler deployment
- Direct hardware access
- Lower overhead
- Easier debugging
- Native browser automation
Alternatives Considered
- Selenium Grid: Too complex for single-node deployment
- Puppeteer in Docker: Still requires display server workarounds
- Headless Chrome: Doesn't work reliably with TikTok
- API-only approach: TikTok has no public API
Conclusion
For this specific use case where:
- Browser automation with display access is required
- Single node deployment is sufficient
- Simplicity and reliability are priorities
Systemd provides a more appropriate solution than containerization.
Future Considerations
If containerization becomes necessary:
- Consider separating TikTok scraper as standalone service
- Use container for non-browser scrapers only
- Investigate newer solutions like playwright-docker
- Re-evaluate when TikTok provides official API
Decision Date: 2024-12-18 Decision Makers: Development Team Status: Implemented