How to Cancel Nutanix Stuck Task from Commanded Line.

Yesterday, I had initiated the AHV host restart task after reaching to 24% task got stuck, I wait for 1 hour but task didn’t move seems it’s got stuck. I tried to get the list using:

acli task.list

but didn’t get anything related to my task i am looking for, Finally found the solution using Egron Task manager Service. Nutanix Egron is the task manager that is responsible to start / kill the running, stuck task automatically or manually if needed.

Below are the steps to kill the running/stuck tasks.

Step 1: login to Nutanix CVM using SSH

Step 2: Get the task list

nutanix@NTNX-16SM65330119-A-CVM:XXX.XXX.XXX.XXX:~$ ecli task.list include_completed=false

Output:

Task UUID Parent Task UUID Component Sequence-id Type Status

6b2179c4-5459-474e-8521-637028e1418b Genesis 11 Hypervisor rolling restart kRunning

Step 3: Kill the desire task from the list

ergon_update_task –task_uuid='<Task UUID>’ –task_status=aborted

Replace the task_uuid with the Task UUID form Step 2. So the command will be as below

nutanix@NTNX-16SM65330119-A-CVM:XXX.XXX.XXX.XXX:~$ ergon_update_task –task_uuid=’6b2179c4-5459-474e-8521-637028e1418b’ –task_status=aborted

hopes this works incase if the TASK are not getting listed using acli / progress_monitor_cli

Attention: Above mention commands are irreversible, use it carefully if you are not confident better open the support case with Nutanix Support / local Nutanix Partner.

Just got a feedback from Nutanix Support Team, above mentioned taskkill can leads to serious issues please do not do it yourself, incase if any assistance is required please reach-out to Nutanix Support for the fix.

One thought on “How to Cancel Nutanix Stuck Task from Commanded Line.

  1. David Steele's avatar David Steele

    Hey there,

    Any chance that you could stress some caveats here? What you have done worked for you, but in general interrupting the LCM rolling restart process is unsafe. Just today we have a Nutanix customer calling in for assistance after reading this post and making their situation worse.

    Interrupting the underlying processes on a cluster is something that shouldn’t be done inside a production environment. If something is broken, please come to the Support team – we’re here to help, we want to know how it broke and we want to fix it for everyone. LCM doesn’t improve, doesn’t catch the nuance involved with supporting a clustering technology across an expanding HCL unless you and everyone else holds us to account.

    Regardless of the safety or danger of this post, I’d also like to suggest that a little more detail on your posts is in order: date of blogging and the environment in which the example was founded – AOS/PC version, Hypervisor version, underlying hardware, for example, would be great minimum benchmarks so that others can make a better decision on whether this is something that might apply to them. Tech solutions age, but I can’t tell if this post was this month or a year ago.

    Happy to discuss further if you like, and thanks for your enthusiasm with all things Nutanix!

    Cheers,
    David Steele
    Nutanix Support

Leave a comment