Back to Digital Diary

AutoVulnScanner: AI-Driven Vulnerability Assessment

#CyberSecurity #Nmap #OpenAI #Python #Pentesting #Automation #AI #PDFReporting

The Vision: Automated Security Auditing

Security professionals spend too much time translating technical scan logs into business reports. I built **AutoVulnScanner** to bridge this gap. This tool orchestrates **Nmap** for deep service discovery and leverages **GPT-4** to perform a "Heuristic Audit," delivering a formatted PDF report ready for stakeholders.

The Multi-Stage Pipeline

The project follows a modular architecture to ensure clean data flow between scanning and reporting:

  • Reconnaissance (scanner.py): Running Nmap with `-sV -T4 --open` to capture version banners while ignoring closed ports for noise reduction.
  • Knowledge Synthesis (analyzer.py): Feeding raw XML data into a specialized LLM prompt that acts as a 'Senior Cybersecurity Mentor'.
  • Risk Quantization: Automatically mapping open ports to visual risk levels using a custom emoji-based scoring system (❗ to ❗❗❗❗❗).
  • Dynamic Reporting (reporter.py): Utilizing the FPDF library to generate structured, multi-page PDF documents from real-time AI output.

Mistakes & Roadblocks (The Hard Way)

Building a high-performance auditor revealed complex issues in data formatting and environment management.

Unicode Encoding Crash: The PDF generator was crashing when trying to render the "❗" risk emojis from the AI output.
The Fix: Implemented a text pre-processor in `reporter.py` that replaces emojis with `[!]` and handles `latin-1` character replacement for stable PDF creation.
Token Window Overflow: Large Nmap XML outputs were exceeding the AI's context limit, causing truncated reports.
The Fix: Optimized the data extraction logic to only pass the ``, ``, and `` tags, reducing token consumption by 70%.
Data Loss on Failure: If the AI analysis failed, the raw scan data was being lost because it wasn't saved locally first.
The Fix: Refined `scanner.py` to automatically create a `/scans` directory and save XML data with unique timestamps before proceeding.
AI Hallucinations: GPT sometimes "guessed" vulnerabilities that were not supported by the scan banners.
The Fix: Hardened the System Prompt to strictly forbid guesswork and added a mandatory "Evidence" section for each finding.

Key Takeaways

  • Hybrid Intelligence: I learned that combining traditional tools like Nmap with modern LLMs requires careful data sanitation.
  • Clean Environment Management: Using `.env` files for API keys and checking for system-level dependencies (like Npcap) is essential for portability.
  • Executive Visualization: Risk scoring shouldn't just be numbers; visual indicators like emojis drastically improve report readability.
  • Modular Design: Separating the scanner, analyzer, and reporter makes the tool easier to debug and scale in the future.

The Final Result

A professional-grade command-line tool that performs a full security audit in under 3 minutes. It takes an IP as input and outputs a boardroom-ready PDF, complete with technical evidence, risk ratings, and remediation steps.