Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.17.z
Affects Version/s: 4.14
Component/s: Networking / multus
Labels:
None

Severity:
Critical
Regression:
None
Sprint:
CNF Network Sprint 270
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Release Note Status:
Done
RH Private Keywords:
Target Version:

4.17.z
Escape Reason:
Escape Impact:
Corrective Measures:
SDLC stage when should've been found:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-55346~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37212. The following is the description of the original issue:
—
Description of problem:

On pod deletion, clean up takes too long intermittently resulting in the replacement pod multus interface failing IPv6 DAD.

Sample reproduction:
- Worker 14 begins to remove the pod at 14:21:14:

Jul 17 14:21:14 worker14 kubenswrapper[9796]: I0717 14:21:14.904545    9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD]

- Worker 19 begins to add the pod at 14:21:14:

Jul 17 14:21:14 worker19 kubenswrapper[9438]: I0717 14:21:14.952931    9438 kubelet.go:2425] "SyncLoop ADD" source="api" pods=[NAMESPACE/POD]

- Worker 19 tries adding the network to the pod at Jul 17 14:21:15:

Jul 17 14:21:15 worker19 crio[9376]: time="2024-07-17 14:21:15.294568336Z" level=info msg="Adding pod NAMESPACE/POD to CNI network \"multus-cni-network\" (type=multus-shim)"

- But hiccups due to DAD failure at 14:21:17:

Jul 17 14:21:17 worker19 kernel: IPv6: eth1: IPv6 duplicate address <IPv6_ADDRESS> used by <MAC> detected!

- worker 14 has not finished tearing down the original pod and related netns:

Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789184337Z" level=info msg="Got pod network &{Name:<POD> Namespace:<NAMESPACE> ID:a36d6da2c26fb668b3d9a665544ae25629377656b180bd3db2b4e199c59f9793 UID:9b7db4ae-b0bc-4987-ac57-35d3c42afdb3 NetNS:/var/run/netns/9b37d0a3-61c9-4b57-b5ea-51e1964b58c0 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789403797Z" level=info msg="Deleting pod <POD> from CNI network \"multus-cni-network\" (type=multus-shim)"
Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.924580    9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD]
Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.936882    9796 kubelet.go:2435] "SyncLoop REMOVE" source="api" pods=[NAMESPACE/POD]

It's clear this is a timing issue where the replacement pod tries assigning the IPv6 address before the original pod network has been cleaned up.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    Somewhat intermittent but can reliably be reproduced

Steps to Reproduce:

Steps to reproduce:
- Delete a pod
- Wait for pod to be rescheduled and jump on to the new worker:
- Determine network namespace:
- - $ for ns in $(ip netns | awk '{print $1}'); do ip netns exec $ns ip a | grep -iq 'IP'; if [ $? == 0 ]; then echo $ns; fi; done
- Validate eth1 is in tentative+dadfailed:
- - $ ip netns exec <NS> ip a

Actual results:

 6: eth1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9000 qdisc noqueue state UNKNOWN link/ether 88:e9:a4:71:62:5c brd ff:ff:ff:ff:ff:ff inet6 IPv6_ADDRESS/64 scope global tentative dadfailed <--- FAILED 
valid_lft forever preferred_lft forever inet6 fe80::88e9:a400:371:625c/64 scope link valid_lft forever preferred_lft forever

Expected results:

 No IPv6 DAD failure.

Additional info:

    Note: This was not seen in the impacted cluster until upgraded to 4.14 so this might be regression or new bug.

blocks

OCPBUGS-55648 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection

Closed

clones

OCPBUGS-55346 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection

Closed

is blocked by

OCPBUGS-55346 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection

Closed

is cloned by

OCPBUGS-55648 pod deletion doesn't occur fast enough resulting in new pod multus interface failing ipv6 duplicate address detection

Closed

links to

openshift/containernetworking-plugins#187: [release-4.17] OCPBUGS-55622: Check error returned by ipv6 SettleAddresses

RHBA-2025:4723 OpenShift Container Platform 4.17.z bug fix update

(1 links to)

Assignee:: Peng Liu

Reporter:: OpenShift Prow Bot

QA Contact:: Weibin Liang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/05/01 12:38 PM

Updated:: 2025/05/15 1:16 AM

Resolved:: 2025/05/15 1:16 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates