-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14
-
None
-
Critical
-
None
-
CNF Network Sprint 270
-
1
-
False
-
-
N/A
-
Release Note Not Required
-
Done
-
-
-
-
-
This is a clone of issue OCPBUGS-55346. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37212. The following is the description of the original issue:
—
Description of problem:
On pod deletion, clean up takes too long intermittently resulting in the replacement pod multus interface failing IPv6 DAD. Sample reproduction: - Worker 14 begins to remove the pod at 14:21:14: Jul 17 14:21:14 worker14 kubenswrapper[9796]: I0717 14:21:14.904545 9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD] - Worker 19 begins to add the pod at 14:21:14: Jul 17 14:21:14 worker19 kubenswrapper[9438]: I0717 14:21:14.952931 9438 kubelet.go:2425] "SyncLoop ADD" source="api" pods=[NAMESPACE/POD] - Worker 19 tries adding the network to the pod at Jul 17 14:21:15: Jul 17 14:21:15 worker19 crio[9376]: time="2024-07-17 14:21:15.294568336Z" level=info msg="Adding pod NAMESPACE/POD to CNI network \"multus-cni-network\" (type=multus-shim)" - But hiccups due to DAD failure at 14:21:17: Jul 17 14:21:17 worker19 kernel: IPv6: eth1: IPv6 duplicate address <IPv6_ADDRESS> used by <MAC> detected! - worker 14 has not finished tearing down the original pod and related netns: Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789184337Z" level=info msg="Got pod network &{Name:<POD> Namespace:<NAMESPACE> ID:a36d6da2c26fb668b3d9a665544ae25629377656b180bd3db2b4e199c59f9793 UID:9b7db4ae-b0bc-4987-ac57-35d3c42afdb3 NetNS:/var/run/netns/9b37d0a3-61c9-4b57-b5ea-51e1964b58c0 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789403797Z" level=info msg="Deleting pod <POD> from CNI network \"multus-cni-network\" (type=multus-shim)" Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.924580 9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD] Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.936882 9796 kubelet.go:2435] "SyncLoop REMOVE" source="api" pods=[NAMESPACE/POD] It's clear this is a timing issue where the replacement pod tries assigning the IPv6 address before the original pod network has been cleaned up.
Version-Release number of selected component (if applicable):
4.14
How reproducible:
Somewhat intermittent but can reliably be reproduced
Steps to Reproduce:
Steps to reproduce: - Delete a pod - Wait for pod to be rescheduled and jump on to the new worker: - Determine network namespace: - - $ for ns in $(ip netns | awk '{print $1}'); do ip netns exec $ns ip a | grep -iq 'IP'; if [ $? == 0 ]; then echo $ns; fi; done - Validate eth1 is in tentative+dadfailed: - - $ ip netns exec <NS> ip a
Actual results:
6: eth1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9000 qdisc noqueue state UNKNOWN link/ether 88:e9:a4:71:62:5c brd ff:ff:ff:ff:ff:ff inet6 IPv6_ADDRESS/64 scope global tentative dadfailed <--- FAILED valid_lft forever preferred_lft forever inet6 fe80::88e9:a400:371:625c/64 scope link valid_lft forever preferred_lft forever
Expected results:
No IPv6 DAD failure.
Additional info:
Note: This was not seen in the impacted cluster until upgraded to 4.14 so this might be regression or new bug.
- blocks
-
OCPBUGS-55648 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection
-
- Closed
-
- clones
-
OCPBUGS-55346 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection
-
- Closed
-
- is blocked by
-
OCPBUGS-55346 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection
-
- Closed
-
- is cloned by
-
OCPBUGS-55648 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection
-
- Closed
-
- links to
-
RHBA-2025:4723 OpenShift Container Platform 4.17.z bug fix update