From switchboards to spine‑leaf, from ARP to BGP — with field‑ready troubleshooting at every layer.
Welcome. I write about networks for a living; this is the friendly version I wish more people had stumbled into early on. If you have fuzzy areas or blind spots, you’re in the right place. We’ll keep jargon minimal, diagrams mental, and the troubleshooting practical.
1870s–1930s: Telephony emerges: local exchanges, manual switchboards, then electromechanical step‑by‑step and crossbar switches. The idea: circuits — a dedicated path during a call.
1960s–1970s: Packet switching is proposed (Baran, Davies). ARPANET links UCLA, SRI, Utah, UCSB. Circuits are reliable, but packets are flexible and resilient.
1980s: TCP/IP standardizes across ARPANET (1983). Ethernet becomes the dominant LAN. The OSI model appears (a teaching model), but the pragmatic TCP/IP stack wins deployment.
1990s: NSFNET decommissions; commercial Internet takes over. “NAPs” (Network Access Points) like FIX‑West/East, MAE‑East/West, and CIX facilitate interconnection. Today we call these IXPs (Internet Exchange Points). The web arrives; BGP4 becomes the Internet’s inter‑domain routing glue.
2000s–present: Content delivery networks (CDNs), massive data centers, merchant‑silicon Ethernet switches, and Clos spine‑leaf fabrics replace tall, bespoke hierarchies. Cloud scales by repetition, not snowflakes.
There’s no rule that a provider “must connect to three NAPs.” That phrasing came from 1990s interconnect policy and procurement checklists. In practice, networks interconnect at many IXPs and private interconnects based on cost, performance, and geography — not a magic number.
Imagine two hosts: SanDiego and Bangor. Early on you could rent a private line, but one cut and you’re dark. The Internet’s genius was to forward packets hop‑by‑hop until they find a working path. Inside data centers, the most economical way to scale that forwarding is a Clos (spine‑leaf) fabric:
Modern switches often use Broadcom/Tofino “merchant” ASICs under different brands. A Network Operating System (NOS) provides BGP/OSPF/ISIS, telemetry, and automation hooks. Interop has improved; lock‑in is less absolute than it was.
At Internet scale we speak of autonomous systems (ASes) — networks under one admin policy — stitched together with BGP. Interconnection happens at:
Peering is usually settlement‑free swaps of traffic between networks of roughly equal value; transit is paid. CDNs and hyperscalers peer broadly to reduce latency and cost.
The OSI 7‑layer model is a teaching aid; the deployed Internet stack is simpler, but OSI gives us a shared vocabulary:
Layer | What to know | Troubleshooting tools |
---|---|---|
1 Physical | Bits on a wire/fiber; power; optics; RF. | ethtool , link LEDs, SFP diagnostics, cable testers; Wi‑Fi analyzers. |
2 Data Link | Ethernet, MAC addresses, VLANs, STP. | ip link , brctl /bridge , tcpdump -e , switch MAC tables. |
3 Network | IP addressing, routing, ARP/ND. | ip addr , ip route , arp /ip neigh , ping , traceroute /mtr . |
4 Transport | TCP/UDP, ports, congestion control, MTU. | ss /netstat , iperf3 , tracepath , TCP dumps, ping DF tests. |
5–7 | Sessions, TLS, HTTP/DNS/SSH, apps. | curl , dig /nslookup , openssl s_client , ssh -v , browser devtools. |
ip link
, ip addr
, nmcli
(NetworkManager), ethtool
for speed/duplex.ping
(ICMP), traceroute
/mtr
(path), tracepath
(PMTU).dig
(dig +trace
, dig @resolver
), resolvectl
.ss -ltnp
, lsof -i
, host firewalls (ufw
, nft
).tcpdump
(CLI), Wireshark (GUI). Start narrow: host/port/proto.iperf3
(end‑to‑end), check duplex and MTU first.curl -v
, curl --resolve
(DNS override), openssl s_client
(TLS handshake).nc
(netcat), telnet
(legacy), nmap
(scan carefully).ip neigh
, arping
, switch CAM tables.whois
for ASNs and prefixes.# find max payload before fragmentation (Linux)
ping -M do -s 1472 8.8.8.8 # 1472 + 28 = 1500
# if this fails but -s 1452 works, something en route is at MTU 1472 (PPPoE?)
ip link
shows state UP; ethtool
shows speed/duplex.ip addr
, ip route
.ping
your gateway; if ARP fails, check switch port/VLAN, ip neigh
, tcpdump -e arp
.dig example.com
, then dig @resolver example.com
, then dig +trace
.traceroute
/mtr
to the target and to a known good (e.g., 1.1.1.1). Compare.ss -ltnp
on the server; ACL/firewall rules; cloud SGs.iperf3
client↔server. If slow, check duplex, MTU, CPU offload, and queue drops.tcpdump
with a narrow filter. Confirm, don’t assume.ARP maps IP→MAC on IPv4 LANs. When host A wants to send to host B on the same subnet, it broadcasts “Who has 10.0.0.42?” and B replies “10.0.0.42 is at 00:11:22:33:44:55.” Routers do ARP for gateways. Common failure: wrong VLAN or stale cache.
# watch ARP while you try to reach the gateway
sudo tcpdump -n -e arp or icmp
ip neigh show
sudo arping -I eth0 10.0.0.1
Names resolve to IPs through recursive resolvers. Separate “can I reach the resolver?” from “can the resolver answer for this name?”
# does your resolver respond?
dig @192.0.2.53 example.com
# follow the chain yourself
dig +trace example.com
# HTTPS reachability with DNS override (SNI/Host)
curl -sv --resolve example.com:443:203.0.113.10 https://example.com/
The Internet grew not because every path was optimal, but because any one path being broken didn’t matter. Prefer simple, repeated designs; let redundancy and good telemetry carry the weight.
Last updated 2025‑09‑22.