Documentation Added: - ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements) - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures - ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures - test_production_deployment.py: Automated deployment verification script Key Documentation Highlights: - Detailed explanation of containerization limitations with browser automation - Complete deployment checklist with pre/post verification steps - Rollback scenarios with recovery time objectives - Emergency contact templates and backup procedures - Automated test script for production readiness 17 of 25 tasks completed (68% done) Remaining work focuses on spec compliance and testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
		
			126 lines
		
	
	
		
			No EOL
		
	
	
		
			3.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			126 lines
		
	
	
		
			No EOL
		
	
	
		
			3.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Architecture Decisions
 | |
| 
 | |
| ## Why Systemd Instead of Kubernetes/Docker
 | |
| 
 | |
| ### Decision
 | |
| We chose to use systemd services for production deployment instead of the originally specified Kubernetes/Docker containerization.
 | |
| 
 | |
| ### Context
 | |
| The original specification called for:
 | |
| - Docker containerization with multi-stage builds
 | |
| - Kubernetes deployment with CronJobs
 | |
| - Running on a Kubernetes cluster control plane node
 | |
| 
 | |
| ### Problem
 | |
| TikTok scraping using the Scrapling library requires:
 | |
| 1. **Display Server Access**: Scrapling uses a real browser (Chromium) for JavaScript rendering
 | |
| 2. **X11/Wayland Session**: Browser automation needs GUI environment variables (DISPLAY, XAUTHORITY)
 | |
| 3. **GPU Acceleration**: Optional but improves performance for browser rendering
 | |
| 4. **Session Persistence**: Browser cookies and local storage for authentication
 | |
| 
 | |
| ### Why Containers Don't Work
 | |
| 
 | |
| #### Technical Limitations
 | |
| 1. **No Native Display Server**: Containers don't have built-in X11/Wayland support
 | |
| 2. **Complex Workarounds**: 
 | |
|    - X11 forwarding requires mounting `/tmp/.X11-unix` socket
 | |
|    - Needs host network mode for display access
 | |
|    - Requires privileged mode for GPU access
 | |
|    - Security implications of running privileged containers
 | |
| 
 | |
| 3. **Environment Variables**: 
 | |
|    - DISPLAY and XAUTHORITY are host-specific
 | |
|    - Change between reboots
 | |
|    - Difficult to manage in container orchestration
 | |
| 
 | |
| 4. **Browser Automation Issues**:
 | |
|    - Headless mode doesn't work for all TikTok features
 | |
|    - Virtual displays (Xvfb) are unreliable for modern web apps
 | |
|    - WebGL and video playback issues in virtual displays
 | |
| 
 | |
| ### Systemd Advantages
 | |
| 
 | |
| 1. **Native Environment Access**: 
 | |
|    - Direct access to host display server
 | |
|    - Can read environment variables from user session
 | |
|    - No abstraction layer complications
 | |
| 
 | |
| 2. **Simpler Configuration**:
 | |
|    - Single service file vs Dockerfile + k8s manifests
 | |
|    - Easy to debug and troubleshoot
 | |
|    - Native logging with journald
 | |
| 
 | |
| 3. **Resource Management**:
 | |
|    - CPU and memory limits via systemd
 | |
|    - Automatic restart on failure
 | |
|    - Built-in timer units for scheduling
 | |
| 
 | |
| 4. **Production Ready**:
 | |
|    - Battle-tested for system services
 | |
|    - Excellent integration with Linux systems
 | |
|    - No additional overhead
 | |
| 
 | |
| ### Implementation
 | |
| 
 | |
| ```ini
 | |
| # systemd service can access display directly
 | |
| [Service]
 | |
| Environment="DISPLAY=:0"
 | |
| Environment="XAUTHORITY=/run/user/1000/.Xauthority"
 | |
| ```
 | |
| 
 | |
| vs
 | |
| 
 | |
| ```dockerfile
 | |
| # Docker requires complex workarounds
 | |
| FROM python:3.11
 | |
| # Need to install X11 libraries
 | |
| RUN apt-get install xvfb x11vnc
 | |
| # Run virtual display (unreliable)
 | |
| CMD xvfb-run -a python scraper.py
 | |
| ```
 | |
| 
 | |
| ### Trade-offs
 | |
| 
 | |
| **Lost Benefits of Containerization:**
 | |
| - Platform independence
 | |
| - Easy scaling across nodes
 | |
| - Isolated dependencies
 | |
| - Reproducible builds
 | |
| 
 | |
| **Gained Benefits:**
 | |
| - Simpler deployment
 | |
| - Direct hardware access
 | |
| - Lower overhead
 | |
| - Easier debugging
 | |
| - Native browser automation
 | |
| 
 | |
| ### Alternatives Considered
 | |
| 
 | |
| 1. **Selenium Grid**: Too complex for single-node deployment
 | |
| 2. **Puppeteer in Docker**: Still requires display server workarounds
 | |
| 3. **Headless Chrome**: Doesn't work reliably with TikTok
 | |
| 4. **API-only approach**: TikTok has no public API
 | |
| 
 | |
| ### Conclusion
 | |
| 
 | |
| For this specific use case where:
 | |
| - Browser automation with display access is required
 | |
| - Single node deployment is sufficient
 | |
| - Simplicity and reliability are priorities
 | |
| 
 | |
| Systemd provides a more appropriate solution than containerization.
 | |
| 
 | |
| ### Future Considerations
 | |
| 
 | |
| If containerization becomes necessary:
 | |
| 1. Consider separating TikTok scraper as standalone service
 | |
| 2. Use container for non-browser scrapers only
 | |
| 3. Investigate newer solutions like playwright-docker
 | |
| 4. Re-evaluate when TikTok provides official API
 | |
| 
 | |
| ---
 | |
| 
 | |
| *Decision Date: 2024-12-18*
 | |
| *Decision Makers: Development Team*
 | |
| *Status: Implemented* |