The Vision: Automated Security Auditing
Security professionals spend too much time translating technical scan logs into business reports. I built **AutoVulnScanner** to bridge this gap. This tool orchestrates **Nmap** for deep service discovery and leverages **GPT-4** to perform a "Heuristic Audit," delivering a formatted PDF report ready for stakeholders.
The Multi-Stage Pipeline
The project follows a modular architecture to ensure clean data flow between scanning and reporting:
- Reconnaissance (scanner.py): Running Nmap with `-sV -T4 --open` to capture version banners while ignoring closed ports for noise reduction.
- Knowledge Synthesis (analyzer.py): Feeding raw XML data into a specialized LLM prompt that acts as a 'Senior Cybersecurity Mentor'.
- Risk Quantization: Automatically mapping open ports to visual risk levels using a custom emoji-based scoring system (❗ to ❗❗❗❗❗).
- Dynamic Reporting (reporter.py): Utilizing the FPDF library to generate structured, multi-page PDF documents from real-time AI output.
Mistakes & Roadblocks (The Hard Way)
Building a high-performance auditor revealed complex issues in data formatting and environment management.
Unicode Encoding Crash: The PDF generator was crashing when trying to render the "❗" risk emojis from the AI output.
The Fix: Implemented a text pre-processor in `reporter.py` that replaces emojis with `[!]` and handles `latin-1` character replacement for stable PDF creation.
Token Window Overflow: Large Nmap XML outputs were exceeding the AI's context limit, causing truncated reports.
The Fix: Optimized the data extraction logic to only pass the ``, ``, and `` tags, reducing token consumption by 70%.
Data Loss on Failure: If the AI analysis failed, the raw scan data was being lost because it wasn't saved locally first.
The Fix: Refined `scanner.py` to automatically create a `/scans` directory and save XML data with unique timestamps before proceeding.
AI Hallucinations: GPT sometimes "guessed" vulnerabilities that were not supported by the scan banners.
The Fix: Hardened the System Prompt to strictly forbid guesswork and added a mandatory "Evidence" section for each finding.
Key Takeaways
- Hybrid Intelligence: I learned that combining traditional tools like Nmap with modern LLMs requires careful data sanitation.
- Clean Environment Management: Using `.env` files for API keys and checking for system-level dependencies (like Npcap) is essential for portability.
- Executive Visualization: Risk scoring shouldn't just be numbers; visual indicators like emojis drastically improve report readability.
- Modular Design: Separating the scanner, analyzer, and reporter makes the tool easier to debug and scale in the future.
The Final Result
A professional-grade command-line tool that performs a full security audit in under 3 minutes. It takes an IP as input and outputs a boardroom-ready PDF, complete with technical evidence, risk ratings, and remediation steps.