Key Projects
Network Validation Platform (Pre- and Post-Change Automation)
- Built a pre- and post-change validation automation system to assess change, upgrade, RMA, and refresh activities across Cisco NX-OS, IOS, and IOS XE devices, ensuring network resiliency and service continuity.
- Automated collection of canonical device snapshots (routing tables, ARP/ND, STP, interface states, error counters) and implemented subnet reachability checks with fan-out pings.
- Aggregated utilization metrics (interfaces, CPU, memory) pre- and post-change, enabling detection of regression trends at scale.
- Integrated AI-driven summarization to highlight diffs and anomalies (e.g., missing VLANs, neighbor changes, error growth) while preserving full evidence-backed reports.
- Reduced validation efforts from weeks of manual comparison to a 10-minute automated review, cutting engineering costs and preventing "quiet failures" often missed by manual checks.
- Expanded coverage beyond upgrades to include switch RMAs and refreshes, normalizing OS/version discrepancies and enabling reliable device-to-device comparisons.
Automated Network Device Certification Platform
- Designed and implemented a certification automation system for Cisco NX-OS, IOS, and IOS XE devices, replacing a manual, weeks-long contractor process with a repeatable workflow executed in minutes.
- Built a PDF ingestion pipeline to extract test details from multi-page documents, leveraging prompt chaining + RAG techniques to normalize and reformat data into structured markdown.
- Automated test case generation by dynamically creating pytest scripts with PyShark and testbed YAML files, enabling scalable and repeatable validation.
- Integrated Allure-pytest reporting to deliver visual, versioned certification reports with full traceability, improving transparency and audit readiness for stakeholders.
- Collaborated with network teams to gather real-world test data and validate performance at scale, ensuring reliability across enterprise-grade device environments.
- Achieved significant operational impact: eliminated contractor costs, standardized certification quality, accelerated onboarding (weeks → minutes), and delivered board-ready compliance reports.
AI-Driven Multi-Agent Workflow Platform for Network Operations
- Engineered a multi-agent architecture with LangGraph, where a supervisor agent orchestrates domain-specific sub-agents to handle monitoring, troubleshooting, and automation tasks.
- Developed an agentic workflow pipeline that enabled dynamic tool invocation and intelligent decision-making, reducing manual troubleshooting time by 40%.
- Automated repetitive network operations through AI-driven workflows, improving operational efficiency and reducing human error.
- Delivered a scalable full-stack chatbot solution (FastAPI, Kafka, MongoDB, React, TypeScript) that streamlined collaboration across teams and accelerated issue resolution.
Network Topology Mapping & Visibility Platform
- Built a network topology platform to provide real-time visibility of device interconnections, enabling teams to predict impact zones and mitigate risks during change management and troubleshooting.
- Designed Airflow DAGs to schedule daily data collection from multiple enterprise sources, ensuring up-to-date visibility into network device states.
- Developed a progressive data collection and mapping workflow (device-by-device ingestion) to avoid heavy data loads, improving pipeline efficiency and system performance.
- Integrated Neo4j graph database for storing and visualizing device relationships, empowering operations teams to quickly identify dependency chains and potential points of failure.
- Improved incident resolution speed and enhanced proactive change verification, reducing downtime risks and operational costs.
Enterprise Data Platform MVP
- Architected and delivered a scalable data platform to empower enterprises with faster, data-driven decision-making.
- Designed and implemented ETL pipelines using Kafka and Airflow to extract and transform metrics from Postgres Enterprise Management, Grafana, and Dynatrace, storing results in AWS S3 for downstream analytics.
- Built real-time monitoring and visualization dashboards in Grafana, reducing data access latency by 35% and enabling proactive system insights.
- Improved data reliability and operational visibility, allowing stakeholders to identify anomalies faster and optimize infrastructure performance.
Real-Time Chatbot with LLM RAG
- Developed a real-time chatbot platform to streamline daily network operations such as executing multi-command SSH tasks, validating system upgrades, and searching devices across tools like ExtraHop, MyOps, DataWarehouse, NetBrain, and Cisco ACI.
- Refactored a monolithic architecture into modular microservices, introducing Kafka-based event-driven communication to enhance scalability and system reliability.
- Implemented MongoDB caching to avoid redundant processing and event triggers, cutting operational costs by reducing repeated queries and resource usage.
- Integrated WebSocket-based notifications across platforms (Webex, Slack), enabling faster updates and improved collaboration during live network changes.
- Delivered a robust automation framework that reduced manual troubleshooting efforts, accelerated change validation, and improved operational efficiency for network teams.