- Created SystemMonitor class for health check monitoring - Implemented system metrics collection (CPU, memory, disk, network) - Added application metrics monitoring (scrapers, logs, data sizes) - Built alert system with configurable thresholds - Developed HTML dashboard generator with real-time charts - Added systemd services for automated monitoring (15-min intervals) - Created responsive web dashboard with Bootstrap and Chart.js - Implemented automatic cleanup of old metric files - Added comprehensive documentation and troubleshooting guide Features: - Real-time system resource monitoring - Scraper performance tracking and alerts - Interactive dashboard with trend charts - Email-ready alert notifications - Systemd integration for production deployment - Security hardening with minimal privileges - Auto-refresh dashboard every 5 minutes - 7-day metric retention with automatic cleanup Alert conditions: - Critical: CPU >80%, Memory >85%, Disk >90% - Warning: Scraper inactive >24h, Log files >100MB - Error: Monitoring failures, configuration issues 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| dashboard_generator.py | ||
| README.md | ||
| setup_monitoring.py | ||
HVAC Know It All - Monitoring System
This directory contains the monitoring and alerting system for the HVAC Know It All Content Aggregation System.
Components
1. Monitoring Script (setup_monitoring.py)
- Collects system metrics (CPU, memory, disk, network)
- Monitors application metrics (scraper status, data sizes, log files)
- Checks for alert conditions
- Generates health reports
- Cleans up old metric files
2. Dashboard Generator (dashboard_generator.py)
- Creates HTML dashboard with real-time system status
- Shows resource usage trends with charts
- Displays scraper performance metrics
- Lists recent alerts and system health
- Auto-refreshes every 5 minutes
3. Systemd Services
hvac-monitoring.service: Runs monitoring and dashboard generationhvac-monitoring.timer: Executes monitoring every 15 minutes
Installation
-
Install dependencies:
sudo apt update sudo apt install python3-psutil -
Install systemd services:
sudo cp systemd/hvac-monitoring.* /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable hvac-monitoring.timer sudo systemctl start hvac-monitoring.timer -
Verify monitoring is running:
sudo systemctl status hvac-monitoring.timer sudo journalctl -u hvac-monitoring -f
Directory Structure
monitoring/
├── setup_monitoring.py # Main monitoring script
├── dashboard_generator.py # HTML dashboard generator
├── README.md # This file
├── metrics/ # JSON metric files (auto-created)
│ ├── system_YYYYMMDD_HHMMSS.json
│ ├── application_YYYYMMDD_HHMMSS.json
│ └── health_report_YYYYMMDD_HHMMSS.json
├── alerts/ # Alert files (auto-created)
│ └── alerts_YYYYMMDD_HHMMSS.json
└── dashboard/ # HTML dashboard files (auto-created)
├── index.html # Current dashboard
└── dashboard_YYYYMMDD_HHMMSS.html # Timestamped backups
Monitoring Metrics
System Metrics
- CPU Usage: Percentage utilization
- Memory Usage: Percentage of RAM used
- Disk Usage: Percentage of disk space used
- Network I/O: Bytes sent/received, packets
- System Uptime: Hours since last boot
- Load Average: System load (Linux only)
Application Metrics
- Scraper Status: Last update time, item counts, state
- Data Directory Sizes: Markdown, media, archives
- Log File Status: Size, last modified time
- State File Analysis: Last IDs, update timestamps
Alert Conditions
Critical Alerts
- CPU usage > 80%
- Memory usage > 85%
- Disk usage > 90%
Warning Alerts
- Scraper hasn't updated in > 24 hours
- Log files > 100MB
- Application errors detected
Error Alerts
- Monitoring system failures
- File access errors
- Configuration issues
Dashboard Features
Health Overview
- Overall system status (HEALTHY/WARNING/CRITICAL)
- Resource usage gauges
- Alert summary counts
Trend Charts
- CPU, memory, disk usage over time
- Scraper item collection trends
- Historical performance data
Real-time Status
- Current scraper status table
- Recent alert history
- Last update timestamps
Auto-refresh
- Dashboard updates every 5 minutes
- Manual refresh available
- Responsive design for mobile/desktop
Usage
Manual Monitoring
# Run monitoring check
python3 /opt/hvac-kia-content/monitoring/setup_monitoring.py
# Generate dashboard
python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py
# View dashboard
firefox file:///opt/hvac-kia-content/monitoring/dashboard/index.html
Check Recent Metrics
# View latest health report
ls -la /opt/hvac-kia-content/monitoring/metrics/health_report_*.json | tail -1
# View recent alerts
ls -la /opt/hvac-kia-content/monitoring/alerts/alerts_*.json | tail -5
Monitor Logs
# Follow monitoring logs
sudo journalctl -u hvac-monitoring -f
# View timer status
sudo systemctl list-timers hvac-monitoring.timer
Troubleshooting
Common Issues
-
Permission Errors
sudo chown -R hvac:hvac /opt/hvac-kia-content/monitoring/ sudo chmod +x /opt/hvac-kia-content/monitoring/*.py -
Missing Dependencies
sudo apt install python3-psutil python3-json -
Service Not Running
sudo systemctl status hvac-monitoring.timer sudo systemctl restart hvac-monitoring.timer -
Dashboard Not Updating
# Check if files are being generated ls -la /opt/hvac-kia-content/monitoring/metrics/ # Manually run dashboard generator python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py
Log Analysis
# Check for errors in monitoring
sudo journalctl -u hvac-monitoring --since "1 hour ago"
# Monitor system resources
htop
# Check disk space
df -h /opt/hvac-kia-content/
Integration
Web Server Setup (Optional)
To serve the dashboard via HTTP:
# Install nginx
sudo apt install nginx
# Create site config
sudo tee /etc/nginx/sites-available/hvac-monitoring << EOF
server {
listen 8080;
root /opt/hvac-kia-content/monitoring/dashboard;
index index.html;
location / {
try_files \$uri \$uri/ =404;
}
}
EOF
# Enable site
sudo ln -s /etc/nginx/sites-available/hvac-monitoring /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
Access dashboard at: http://your-server:8080
Email Alerts (Optional)
To enable email alerts for critical issues:
# Install mail utilities
sudo apt install mailutils
# Configure in monitoring script
export ALERT_EMAIL="admin@yourdomain.com"
export SMTP_SERVER="smtp.yourdomain.com"
Customization
Adding New Metrics
Edit setup_monitoring.py and add to collect_application_metrics():
def collect_application_metrics(self):
# ... existing code ...
# Add custom metric
metrics['custom'] = {
'your_metric': calculate_your_metric(),
'another_metric': get_another_value()
}
Modifying Alert Thresholds
Edit alert conditions in check_alerts():
# Change CPU threshold
if sys.get('cpu_percent', 0) > 90: # Changed from 80% to 90%
# Add new alert
if custom_condition():
alerts.append({
'type': 'WARNING',
'component': 'custom',
'message': 'Custom alert condition met'
})
Dashboard Styling
Modify the CSS in generate_html_dashboard() to customize appearance.
Security Considerations
- Monitoring runs with limited user privileges
- No network services exposed by default
- File permissions restrict access to monitoring data
- Systemd security features enabled (PrivateTmp, ProtectSystem, etc.)
- Dashboard contains no sensitive information
Performance Impact
- Monitoring runs every 15 minutes (configurable)
- Low CPU/memory overhead (< 1% during execution)
- Automatic cleanup of old metric files (7-day retention)
- Dashboard generation is lightweight (< 1MB files)