Hanging network in XEN with bridging
This article is about a strange network problem occuring when used bridging in a XEN environment.
Symptoms
- Some DomU is not reachable from the network anymore.
- The problem persists after restarting the DomU (all: reboot, shutdown+create, destroy+create).
- Possible: The DomU still has a virtual NIC, if pings are send from DomU via console only arp-requests are seen on the virtual NIC in Dom0, no answers.
- Possible: There could be log-entries as:
xenbr1: port 4(domu_eth0) entering disabled state.
Cure
There are two possibilities. First: the DomU has lost it’s NIC on the Dom0 side, then re-add with
#> brctl addif xenbr0 domu_eth0
(xenbr0 = your bridge, domu_eth0 = the nic of the DomU, maybe looks like vif99.1)
If the NIC was not deleted from the bridge or the solution above doesnt work, first check:
brctl showstp xenbr0
This should output something like this:
domu_eth0 (4) port id 8004 state learning designated root 8000.0030487e5b0d path cost 100 designated bridge 8000.0030487e5b0d message age timer 0.00 designated port 8004 forward delay timer 12582588.14 designated cost 0 hold timer 0.00 flags
The important information is learning as value for state. If this is the case, the following solution works probably. You have to reset the forward delay timer by:
brctl delif xenbr0 domu_eth0 brctl setfd xenbr0 0 brctl addif xenbr0 domu_eth0
This removes the NIC from the bridge, then sets the timer to zero and then re-adds the NIC to the bridge. Afterwards the showstp-command should return somthing like this:
domu_eth0 (4) port id 8004 state forwarding designated root 8000.0030487e5b0d path cost 100 designated bridge 8000.0030487e5b0d message age timer 0.00 designated port 8004 forward delay timer 0.00 designated cost 0 hold timer 0.28 flags
The important part: forwarding (again).
I dont have any clue whe the NICs detach in the first place, but this works for me to repair it.