← Back to portfolio

Key Projects

Network Validation Platform (Pre- and Post-Change Automation)

  • Built a pre- and post-change validation automation system to assess change, upgrade, RMA, and refresh activities across Cisco NX-OS, IOS, and IOS XE devices, ensuring network resiliency and service continuity.
  • Automated collection of canonical device snapshots (routing tables, ARP/ND, STP, interface states, error counters) and implemented subnet reachability checks with fan-out pings.
  • Aggregated utilization metrics (interfaces, CPU, memory) pre- and post-change, enabling detection of regression trends at scale.
  • Integrated AI-driven summarization to highlight diffs and anomalies (e.g., missing VLANs, neighbor changes, error growth) while preserving full evidence-backed reports.
  • Reduced validation efforts from weeks of manual comparison to a 10-minute automated review, cutting engineering costs and preventing "quiet failures" often missed by manual checks.
  • Expanded coverage beyond upgrades to include switch RMAs and refreshes, normalizing OS/version discrepancies and enabling reliable device-to-device comparisons.

Automated Network Device Certification Platform

  • Designed and implemented a certification automation system for Cisco NX-OS, IOS, and IOS XE devices, replacing a manual, weeks-long contractor process with a repeatable workflow executed in minutes.
  • Built a PDF ingestion pipeline to extract test details from multi-page documents, leveraging prompt chaining + RAG techniques to normalize and reformat data into structured markdown.
  • Automated test case generation by dynamically creating pytest scripts with PyShark and testbed YAML files, enabling scalable and repeatable validation.
  • Integrated Allure-pytest reporting to deliver visual, versioned certification reports with full traceability, improving transparency and audit readiness for stakeholders.
  • Collaborated with network teams to gather real-world test data and validate performance at scale, ensuring reliability across enterprise-grade device environments.
  • Achieved significant operational impact: eliminated contractor costs, standardized certification quality, accelerated onboarding (weeks → minutes), and delivered board-ready compliance reports.

AI-Driven Multi-Agent Workflow Platform for Network Operations

  • Engineered a multi-agent architecture with LangGraph, where a supervisor agent orchestrates domain-specific sub-agents to handle monitoring, troubleshooting, and automation tasks.
  • Developed an agentic workflow pipeline that enabled dynamic tool invocation and intelligent decision-making, reducing manual troubleshooting time by 40%.
  • Automated repetitive network operations through AI-driven workflows, improving operational efficiency and reducing human error.
  • Delivered a scalable full-stack chatbot solution (FastAPI, Kafka, MongoDB, React, TypeScript) that streamlined collaboration across teams and accelerated issue resolution.

Network Topology Mapping & Visibility Platform

  • Built a network topology platform to provide real-time visibility of device interconnections, enabling teams to predict impact zones and mitigate risks during change management and troubleshooting.
  • Designed Airflow DAGs to schedule daily data collection from multiple enterprise sources, ensuring up-to-date visibility into network device states.
  • Developed a progressive data collection and mapping workflow (device-by-device ingestion) to avoid heavy data loads, improving pipeline efficiency and system performance.
  • Integrated Neo4j graph database for storing and visualizing device relationships, empowering operations teams to quickly identify dependency chains and potential points of failure.
  • Improved incident resolution speed and enhanced proactive change verification, reducing downtime risks and operational costs.

Enterprise Data Platform MVP

  • Architected and delivered a scalable data platform to empower enterprises with faster, data-driven decision-making.
  • Designed and implemented ETL pipelines using Kafka and Airflow to extract and transform metrics from Postgres Enterprise Management, Grafana, and Dynatrace, storing results in AWS S3 for downstream analytics.
  • Built real-time monitoring and visualization dashboards in Grafana, reducing data access latency by 35% and enabling proactive system insights.
  • Improved data reliability and operational visibility, allowing stakeholders to identify anomalies faster and optimize infrastructure performance.

Real-Time Chatbot with LLM RAG

  • Developed a real-time chatbot platform to streamline daily network operations such as executing multi-command SSH tasks, validating system upgrades, and searching devices across tools like ExtraHop, MyOps, DataWarehouse, NetBrain, and Cisco ACI.
  • Refactored a monolithic architecture into modular microservices, introducing Kafka-based event-driven communication to enhance scalability and system reliability.
  • Implemented MongoDB caching to avoid redundant processing and event triggers, cutting operational costs by reducing repeated queries and resource usage.
  • Integrated WebSocket-based notifications across platforms (Webex, Slack), enabling faster updates and improved collaboration during live network changes.
  • Delivered a robust automation framework that reduced manual troubleshooting efforts, accelerated change validation, and improved operational efficiency for network teams.