Building Network Automation: Start with the Foundation
After more than a decade working with network infrastructure and automation, I've learned that successful network automation isn't about deploying the fanciest tools or implementing complex orchestration from day one. It's about building a solid foundation and layering capabilities on top of it systematically.
The Core Principle
Every network automation architecture, regardless of its complexity, rests on one fundamental workflow: automatically applying configuration to network devices. This single capability is the cornerstone upon which everything else is built. Get this right, and you can gradually add inventory management, monitoring, validation, and sophisticated user interfaces. Get this wrong, and even the most advanced automation framework will struggle.
The Layered Architecture
Think of network automation as a stack of layers, each building upon the previous one:
┌─────────────────────────────────┐
├ User Interface Layer ┤ ← Self-service portals, APIs
├─────────────────────────────────┤
├ Network Configuration Manager ┤ ← Annet, Ansible, custom tools
├─────────────────────────────────┤
├ Monitoring & Validation ┤ ← Prometheus, Grafana, checks
├─────────────────────────────────┤
├ Inventory & Source of Truth ┤ ← NetBox, Racktables, Git repos
├─────────────────────────────────┤
├ Configuration Deployment ┤ ← The foundation
└─────────────────────────────────┘
Let me break down each layer and explain how they work together.
Layer 1: Configuration Deployment - The Foundation
This is where it all begins. The basic workflow is deceptively simple:
- Generate configuration for a device
- Connect to the device
- Apply the configuration
- Verify it was applied successfully
But simple doesn't mean trivial. This layer needs to handle:
- Connection management: SSH, NETCONF, REST APIs depending on your vendor
- Error handling: What happens when a device is unreachable? When configuration fails to apply?
- Rollback capability: Can you safely revert changes if something goes wrong?
- Atomic operations: Ensuring configuration changes are applied completely or not at all
Once you have this workflow running reliably, everything else becomes possible. You can trigger it manually, on schedule, or in response to events. You can test it against single devices or roll it out fleet-wide.
Layer 2: Inventory & Source of Truth
With reliable configuration deployment in place, you need to know what to configure and where to apply it. This layer provides:
- Device inventory (what devices exist, their roles, locations)
- IP address management (IPAM)
- Network topology and connections
- Device metadata (vendor, model, software version)
Tools like NetBox excel here, providing a structured database that becomes your single source of truth. But you could start simpler - even a well-maintained YAML file or spreadsheet can work initially.
The key is that your configuration deployment layer can query this inventory to know which devices to configure and what their properties are.
Layer 3: Monitoring & Validation
Now that you're deploying configurations automatically, you need to know if things are working as expected:
- Pre-deployment validation: Does the generated configuration meet your standards? Will it break anything?
- Post-deployment verification: Did the change apply successfully? Is the device functioning correctly?
- Continuous monitoring: Is the network operating within expected parameters?
This layer feeds data back down to the configuration layer. If monitoring detects drift between the intended configuration and actual device state, it can trigger automated remediation.
Layer 4: Network Configuration Manager
This layer transforms your inventory data into actual device configurations. It's where you define:
- Configuration templates
- Device-specific logic and rules
- Compliance policies
- Multi-device coordination (ensuring VLANs match across switches, BGP peers are symmetric, etc.)
Tools like Annet or custom solutions built with libraries like GnetCLI, Nornir, Netmiko fit here. The configuration manager queries your inventory, applies your business logic, generates configurations, and hands them to the deployment layer.
Layer 5: User Interface
The top layer makes automation accessible to your team:
- Self-service portals for common changes (add VLAN, provision new connection)
- APIs for integration with other systems
- Approval workflows for sensitive changes
- Audit logs and change tracking
This layer lets you democratize network changes while maintaining control and visibility.
Why Start with the Foundation?
You might be tempted to start with the user interface or pick a comprehensive tool that promises to do everything. I recommend against this approach. Here's why:
Complexity compounds: If your basic deployment workflow is unreliable, every layer built on top inherits that unreliability.
Understanding grows: By building from the bottom up, you understand each layer's requirements and can make informed decisions about tooling.
Flexibility matters: A solid foundation lets you swap out tools at higher layers without rebuilding everything. Started with simple YAML inventory? Migrate to NetBox later without changing your deployment logic.
Quick wins: Getting that first automated configuration push working is incredibly motivating and demonstrates value immediately.
Getting Started: A Practical Example
Here's a minimal Python script that demonstrates the foundational workflow:
from netmiko import ConnectHandler
def deploy_config(device_ip, config_commands):
"""
Deploy configuration to a network device
Returns: (success: bool, message: str)
"""
device = {
'device_type': 'cisco_ios',
'ip': device_ip,
'username': 'admin',
'password': 'password', # Use proper secrets management!
}
try:
# Connect to device
connection = ConnectHandler(**device)
# Apply configuration
output = connection.send_config_set(config_commands)
# Save configuration
connection.save_config()
connection.disconnect()
return True, f"Configuration applied successfully:\n{output}"
except Exception as e:
return False, f"Failed to apply configuration: {str(e)}"
# Example usage
commands = [
'interface GigabitEthernet0/1',
'description Uplink to Core',
'no shutdown'
]
success, message = deploy_config('192.168.1.1', commands)
print(message)
This is just 30 lines of code, but it implements the core workflow. From here, you can:
- Add error handling and retry logic
- Implement dry-run mode to preview changes
- Add logging and audit trails
- Integrate with inventory systems
- Build templating for configuration generation
- Add pre/post-deployment checks
The Path Forward
Network automation is a journey, not a destination. You don't need to build the entire stack before seeing value. Start with reliable configuration deployment. Once that's solid, add inventory management. Then monitoring. Then a configuration manager. Finally, wrap it all in a user interface.
Each layer adds capability, but they all depend on that foundational workflow working reliably every single time.
In future posts, I'll dive deeper into each layer, sharing practical examples and lessons learned from implementing these systems at scale. We'll explore tools like Annet for configuration management, NetBox for inventory, and how to build monitoring that actually helps you prevent problems rather than just alerting you when they occur.
But it all starts here: with one configuration, deployed automatically, to one device.
What's your experience with network automation? Are you just starting out, or have you built complex automation systems? I'd love to hear about your approach in the comments.