Remediate

This feature is based on explanation but additionally system will try to find relations between failures also using AI to get dependancies between them. As result it will create failure groups with specified root cause of all failed items inside.

For eg. deployment foo failing check K8S801 (Deployment has insufficient replicas) and actually depends on problem with pod foo-abcd-1234, which fails check K8S106 (Find Pending Pods). So system should put them both into single failure group.

Example

To start remediation you need to run:

unctl k8s -r

After that it will start steps mentioned in previous steps: system scan and AI explanation for failures. Finally it will start creating failure groups so you will see in terminal something like:

🔀 Looking for dependencies between failures...

❌ Created group for check <Deployment has insufficient replicas> and object <prometheus-server>
Title: prometheus-server CrashLoopBackOff issue
Summary: The prometheus-server pod is not able to start due to a CrashLoopBackOff error. The logs show that Prometheus crashes soon after it tries to initialize its 
remote write feature, indicating a possible issue with the configuration of the remote write feature or underlying resource issues.
Objects: monitoring/prometheus-server, monitoring/prometheus-server-bc7cbd8c-jzzf5

❌ Created group for check <Deployment has insufficient replicas> and object <remote-tunnel-server>
Title: remote-tunnel-server Deployment Issue
Summary: The 'remote-tunnel-server' deployment in the 'onprem' namespace is not creating any replicas. This can be due to various issues like image pull back off, 
incorrect permissions, resource limits, etc. Additional details may be found in the pod logs or events.
Objects: onprem/remote-tunnel-server

❌ Created group for check <Deployment has insufficient replicas> and object <slack-app>
Title: Deployment slack-app failure
Summary: The slack-app in namespace 'onprem' is failing to create replicas. No replicas are updated or available which results in pod unavailability. 'FailedCreate' and 
'ProgressDeadlineExceeded' failures are also observed.
Objects: onprem/slack-app

❌ Created group for check <Deployment has insufficient replicas> and object <util>
Title: sbox-yura/util Pod crashing
Summary: Pod 'util-57cd568c54-6jk6s' in 'sbox-yura' namespace is in 'CrashLoopBackOff' state, indicating that the container in this Pod is failing to start successfully,
likely causing unavailability of 'util' deployment. This could be due to issues with the container itself such as application errors or configuration issues. 
Unfortunately, the container logs are not provided, so further analysis is required
Objects: sbox-yura/util, sbox-yura/util-57cd568c54-6jk6s

❌ Created group for check <Deployment has insufficient replicas> and object <workflows>
Title: sbox-yura/workflows deployment issue
Summary: The workflows deployment in namespace sbox-yura is not available. This could potentially be due to issues with the readiness probe, environment variables, or 
issues with the image. This issue can impact any services that are dependent on the workflows deployment.
Objects: sbox-yura/workflows

Once all the failures will be sorted out you will enter into interactive mode:

PreviousExplain NextAI Providers

Last updated 1 year ago