Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7188

Node drain reporting improvements

XMLWordPrintable

    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request

      Node drain reporting improvements

      2. What is the nature and description of the request?

      This is a request to improve observability around worker node drains and their status.

      What we're looking for:

      • a metric that reports node drain progress
        • in case of failures, the pod and reason: e.g. the pods blocking it and why (e.g. pod abc because of pdb)
      • status on NodePools that reflects node drain failures, the offending pods and the reason 
      • [optional] status on CAPI machine reflecting node drain failures, the offending pods and the reason 

      3. Why does the customer need this? (List the business requirements here)

      Managed services (ROSA/ARO HCP) don't expose node drain failures to the customer. The node drain failures are currently only available in CAPI logs. Managed services SRE is looking for a metric to automate sending notifications to customers when a node drain fails due to their workload, as well as a status on the nodepools as a general UX improvement. 

      4. List any affected packages or components.

      hypershift-operator, CAPI

              racedoro@redhat.com Ramon Acedo
              cbusse.openshift Claudio Busse
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: