Documentation Added: - ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements) - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures - ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures - test_production_deployment.py: Automated deployment verification script Key Documentation Highlights: - Detailed explanation of containerization limitations with browser automation - Complete deployment checklist with pre/post verification steps - Rollback scenarios with recovery time objectives - Emergency contact templates and backup procedures - Automated test script for production readiness 17 of 25 tasks completed (68% done) Remaining work focuses on spec compliance and testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
126 lines
No EOL
3.7 KiB
Markdown
126 lines
No EOL
3.7 KiB
Markdown
# Architecture Decisions
|
|
|
|
## Why Systemd Instead of Kubernetes/Docker
|
|
|
|
### Decision
|
|
We chose to use systemd services for production deployment instead of the originally specified Kubernetes/Docker containerization.
|
|
|
|
### Context
|
|
The original specification called for:
|
|
- Docker containerization with multi-stage builds
|
|
- Kubernetes deployment with CronJobs
|
|
- Running on a Kubernetes cluster control plane node
|
|
|
|
### Problem
|
|
TikTok scraping using the Scrapling library requires:
|
|
1. **Display Server Access**: Scrapling uses a real browser (Chromium) for JavaScript rendering
|
|
2. **X11/Wayland Session**: Browser automation needs GUI environment variables (DISPLAY, XAUTHORITY)
|
|
3. **GPU Acceleration**: Optional but improves performance for browser rendering
|
|
4. **Session Persistence**: Browser cookies and local storage for authentication
|
|
|
|
### Why Containers Don't Work
|
|
|
|
#### Technical Limitations
|
|
1. **No Native Display Server**: Containers don't have built-in X11/Wayland support
|
|
2. **Complex Workarounds**:
|
|
- X11 forwarding requires mounting `/tmp/.X11-unix` socket
|
|
- Needs host network mode for display access
|
|
- Requires privileged mode for GPU access
|
|
- Security implications of running privileged containers
|
|
|
|
3. **Environment Variables**:
|
|
- DISPLAY and XAUTHORITY are host-specific
|
|
- Change between reboots
|
|
- Difficult to manage in container orchestration
|
|
|
|
4. **Browser Automation Issues**:
|
|
- Headless mode doesn't work for all TikTok features
|
|
- Virtual displays (Xvfb) are unreliable for modern web apps
|
|
- WebGL and video playback issues in virtual displays
|
|
|
|
### Systemd Advantages
|
|
|
|
1. **Native Environment Access**:
|
|
- Direct access to host display server
|
|
- Can read environment variables from user session
|
|
- No abstraction layer complications
|
|
|
|
2. **Simpler Configuration**:
|
|
- Single service file vs Dockerfile + k8s manifests
|
|
- Easy to debug and troubleshoot
|
|
- Native logging with journald
|
|
|
|
3. **Resource Management**:
|
|
- CPU and memory limits via systemd
|
|
- Automatic restart on failure
|
|
- Built-in timer units for scheduling
|
|
|
|
4. **Production Ready**:
|
|
- Battle-tested for system services
|
|
- Excellent integration with Linux systems
|
|
- No additional overhead
|
|
|
|
### Implementation
|
|
|
|
```ini
|
|
# systemd service can access display directly
|
|
[Service]
|
|
Environment="DISPLAY=:0"
|
|
Environment="XAUTHORITY=/run/user/1000/.Xauthority"
|
|
```
|
|
|
|
vs
|
|
|
|
```dockerfile
|
|
# Docker requires complex workarounds
|
|
FROM python:3.11
|
|
# Need to install X11 libraries
|
|
RUN apt-get install xvfb x11vnc
|
|
# Run virtual display (unreliable)
|
|
CMD xvfb-run -a python scraper.py
|
|
```
|
|
|
|
### Trade-offs
|
|
|
|
**Lost Benefits of Containerization:**
|
|
- Platform independence
|
|
- Easy scaling across nodes
|
|
- Isolated dependencies
|
|
- Reproducible builds
|
|
|
|
**Gained Benefits:**
|
|
- Simpler deployment
|
|
- Direct hardware access
|
|
- Lower overhead
|
|
- Easier debugging
|
|
- Native browser automation
|
|
|
|
### Alternatives Considered
|
|
|
|
1. **Selenium Grid**: Too complex for single-node deployment
|
|
2. **Puppeteer in Docker**: Still requires display server workarounds
|
|
3. **Headless Chrome**: Doesn't work reliably with TikTok
|
|
4. **API-only approach**: TikTok has no public API
|
|
|
|
### Conclusion
|
|
|
|
For this specific use case where:
|
|
- Browser automation with display access is required
|
|
- Single node deployment is sufficient
|
|
- Simplicity and reliability are priorities
|
|
|
|
Systemd provides a more appropriate solution than containerization.
|
|
|
|
### Future Considerations
|
|
|
|
If containerization becomes necessary:
|
|
1. Consider separating TikTok scraper as standalone service
|
|
2. Use container for non-browser scrapers only
|
|
3. Investigate newer solutions like playwright-docker
|
|
4. Re-evaluate when TikTok provides official API
|
|
|
|
---
|
|
|
|
*Decision Date: 2024-12-18*
|
|
*Decision Makers: Development Team*
|
|
*Status: Implemented* |