# HVAC Know It All - Monitoring System This directory contains the monitoring and alerting system for the HVAC Know It All Content Aggregation System. ## Components ### 1. Monitoring Script (`setup_monitoring.py`) - Collects system metrics (CPU, memory, disk, network) - Monitors application metrics (scraper status, data sizes, log files) - Checks for alert conditions - Generates health reports - Cleans up old metric files ### 2. Dashboard Generator (`dashboard_generator.py`) - Creates HTML dashboard with real-time system status - Shows resource usage trends with charts - Displays scraper performance metrics - Lists recent alerts and system health - Auto-refreshes every 5 minutes ### 3. Systemd Services - `hvac-monitoring.service`: Runs monitoring and dashboard generation - `hvac-monitoring.timer`: Executes monitoring every 15 minutes ## Installation 1. **Install dependencies:** ```bash sudo apt update sudo apt install python3-psutil ``` 2. **Install systemd services:** ```bash sudo cp systemd/hvac-monitoring.* /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable hvac-monitoring.timer sudo systemctl start hvac-monitoring.timer ``` 3. **Verify monitoring is running:** ```bash sudo systemctl status hvac-monitoring.timer sudo journalctl -u hvac-monitoring -f ``` ## Directory Structure ``` monitoring/ ├── setup_monitoring.py # Main monitoring script ├── dashboard_generator.py # HTML dashboard generator ├── README.md # This file ├── metrics/ # JSON metric files (auto-created) │ ├── system_YYYYMMDD_HHMMSS.json │ ├── application_YYYYMMDD_HHMMSS.json │ └── health_report_YYYYMMDD_HHMMSS.json ├── alerts/ # Alert files (auto-created) │ └── alerts_YYYYMMDD_HHMMSS.json └── dashboard/ # HTML dashboard files (auto-created) ├── index.html # Current dashboard └── dashboard_YYYYMMDD_HHMMSS.html # Timestamped backups ``` ## Monitoring Metrics ### System Metrics - **CPU Usage**: Percentage utilization - **Memory Usage**: Percentage of RAM used - **Disk Usage**: Percentage of disk space used - **Network I/O**: Bytes sent/received, packets - **System Uptime**: Hours since last boot - **Load Average**: System load (Linux only) ### Application Metrics - **Scraper Status**: Last update time, item counts, state - **Data Directory Sizes**: Markdown, media, archives - **Log File Status**: Size, last modified time - **State File Analysis**: Last IDs, update timestamps ## Alert Conditions ### Critical Alerts - CPU usage > 80% - Memory usage > 85% - Disk usage > 90% ### Warning Alerts - Scraper hasn't updated in > 24 hours - Log files > 100MB - Application errors detected ### Error Alerts - Monitoring system failures - File access errors - Configuration issues ## Dashboard Features ### Health Overview - Overall system status (HEALTHY/WARNING/CRITICAL) - Resource usage gauges - Alert summary counts ### Trend Charts - CPU, memory, disk usage over time - Scraper item collection trends - Historical performance data ### Real-time Status - Current scraper status table - Recent alert history - Last update timestamps ### Auto-refresh - Dashboard updates every 5 minutes - Manual refresh available - Responsive design for mobile/desktop ## Usage ### Manual Monitoring ```bash # Run monitoring check python3 /opt/hvac-kia-content/monitoring/setup_monitoring.py # Generate dashboard python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py # View dashboard firefox file:///opt/hvac-kia-content/monitoring/dashboard/index.html ``` ### Check Recent Metrics ```bash # View latest health report ls -la /opt/hvac-kia-content/monitoring/metrics/health_report_*.json | tail -1 # View recent alerts ls -la /opt/hvac-kia-content/monitoring/alerts/alerts_*.json | tail -5 ``` ### Monitor Logs ```bash # Follow monitoring logs sudo journalctl -u hvac-monitoring -f # View timer status sudo systemctl list-timers hvac-monitoring.timer ``` ## Troubleshooting ### Common Issues 1. **Permission Errors** ```bash sudo chown -R hvac:hvac /opt/hvac-kia-content/monitoring/ sudo chmod +x /opt/hvac-kia-content/monitoring/*.py ``` 2. **Missing Dependencies** ```bash sudo apt install python3-psutil python3-json ``` 3. **Service Not Running** ```bash sudo systemctl status hvac-monitoring.timer sudo systemctl restart hvac-monitoring.timer ``` 4. **Dashboard Not Updating** ```bash # Check if files are being generated ls -la /opt/hvac-kia-content/monitoring/metrics/ # Manually run dashboard generator python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py ``` ### Log Analysis ```bash # Check for errors in monitoring sudo journalctl -u hvac-monitoring --since "1 hour ago" # Monitor system resources htop # Check disk space df -h /opt/hvac-kia-content/ ``` ## Integration ### Web Server Setup (Optional) To serve the dashboard via HTTP: ```bash # Install nginx sudo apt install nginx # Create site config sudo tee /etc/nginx/sites-available/hvac-monitoring << EOF server { listen 8080; root /opt/hvac-kia-content/monitoring/dashboard; index index.html; location / { try_files \$uri \$uri/ =404; } } EOF # Enable site sudo ln -s /etc/nginx/sites-available/hvac-monitoring /etc/nginx/sites-enabled/ sudo nginx -t sudo systemctl reload nginx ``` Access dashboard at: `http://your-server:8080` ### Email Alerts (Optional) To enable email alerts for critical issues: ```bash # Install mail utilities sudo apt install mailutils # Configure in monitoring script export ALERT_EMAIL="admin@yourdomain.com" export SMTP_SERVER="smtp.yourdomain.com" ``` ## Customization ### Adding New Metrics Edit `setup_monitoring.py` and add to `collect_application_metrics()`: ```python def collect_application_metrics(self): # ... existing code ... # Add custom metric metrics['custom'] = { 'your_metric': calculate_your_metric(), 'another_metric': get_another_value() } ``` ### Modifying Alert Thresholds Edit alert conditions in `check_alerts()`: ```python # Change CPU threshold if sys.get('cpu_percent', 0) > 90: # Changed from 80% to 90% # Add new alert if custom_condition(): alerts.append({ 'type': 'WARNING', 'component': 'custom', 'message': 'Custom alert condition met' }) ``` ### Dashboard Styling Modify the CSS in `generate_html_dashboard()` to customize appearance. ## Security Considerations - Monitoring runs with limited user privileges - No network services exposed by default - File permissions restrict access to monitoring data - Systemd security features enabled (PrivateTmp, ProtectSystem, etc.) - Dashboard contains no sensitive information ## Performance Impact - Monitoring runs every 15 minutes (configurable) - Low CPU/memory overhead (< 1% during execution) - Automatic cleanup of old metric files (7-day retention) - Dashboard generation is lightweight (< 1MB files)