# Architecture Decisions ## Why Systemd Instead of Kubernetes/Docker ### Decision We chose to use systemd services for production deployment instead of the originally specified Kubernetes/Docker containerization. ### Context The original specification called for: - Docker containerization with multi-stage builds - Kubernetes deployment with CronJobs - Running on a Kubernetes cluster control plane node ### Problem TikTok scraping using the Scrapling library requires: 1. **Display Server Access**: Scrapling uses a real browser (Chromium) for JavaScript rendering 2. **X11/Wayland Session**: Browser automation needs GUI environment variables (DISPLAY, XAUTHORITY) 3. **GPU Acceleration**: Optional but improves performance for browser rendering 4. **Session Persistence**: Browser cookies and local storage for authentication ### Why Containers Don't Work #### Technical Limitations 1. **No Native Display Server**: Containers don't have built-in X11/Wayland support 2. **Complex Workarounds**: - X11 forwarding requires mounting `/tmp/.X11-unix` socket - Needs host network mode for display access - Requires privileged mode for GPU access - Security implications of running privileged containers 3. **Environment Variables**: - DISPLAY and XAUTHORITY are host-specific - Change between reboots - Difficult to manage in container orchestration 4. **Browser Automation Issues**: - Headless mode doesn't work for all TikTok features - Virtual displays (Xvfb) are unreliable for modern web apps - WebGL and video playback issues in virtual displays ### Systemd Advantages 1. **Native Environment Access**: - Direct access to host display server - Can read environment variables from user session - No abstraction layer complications 2. **Simpler Configuration**: - Single service file vs Dockerfile + k8s manifests - Easy to debug and troubleshoot - Native logging with journald 3. **Resource Management**: - CPU and memory limits via systemd - Automatic restart on failure - Built-in timer units for scheduling 4. **Production Ready**: - Battle-tested for system services - Excellent integration with Linux systems - No additional overhead ### Implementation ```ini # systemd service can access display directly [Service] Environment="DISPLAY=:0" Environment="XAUTHORITY=/run/user/1000/.Xauthority" ``` vs ```dockerfile # Docker requires complex workarounds FROM python:3.11 # Need to install X11 libraries RUN apt-get install xvfb x11vnc # Run virtual display (unreliable) CMD xvfb-run -a python scraper.py ``` ### Trade-offs **Lost Benefits of Containerization:** - Platform independence - Easy scaling across nodes - Isolated dependencies - Reproducible builds **Gained Benefits:** - Simpler deployment - Direct hardware access - Lower overhead - Easier debugging - Native browser automation ### Alternatives Considered 1. **Selenium Grid**: Too complex for single-node deployment 2. **Puppeteer in Docker**: Still requires display server workarounds 3. **Headless Chrome**: Doesn't work reliably with TikTok 4. **API-only approach**: TikTok has no public API ### Conclusion For this specific use case where: - Browser automation with display access is required - Single node deployment is sufficient - Simplicity and reliability are priorities Systemd provides a more appropriate solution than containerization. ### Future Considerations If containerization becomes necessary: 1. Consider separating TikTok scraper as standalone service 2. Use container for non-browser scrapers only 3. Investigate newer solutions like playwright-docker 4. Re-evaluate when TikTok provides official API --- *Decision Date: 2024-12-18* *Decision Makers: Development Team* *Status: Implemented*