122. 集群在灾难恢复(DR)进程后卡在“暂停”状态

张开发
2026/4/12 1:17:55 15 分钟阅读

分享文章

122. 集群在灾难恢复(DR)进程后卡在“暂停”状态
Situation 地理位置In certain cases, a downstream cluster may enter a broken state that requires a Disaster Recovery (DR) process to restore it to an active state.在某些情况下下游集群可能进入破损状态需要灾难恢复DR进程将其恢复为活跃状态。However, the DR process may occasionally fail to complete successfully, becoming stuck indefinitely. When this happens, the cluster enters a“paused”state.然而DR 进程有时可能未能成功完成导致无限期卡壳。当这种情况发生时集群进入“暂停”状态。This condition can be verified by inspecting theclusters.cluster.x-k8s.ioobject in thefleet-defaultnamespace of thelocal (upstream)cluster:通过检查本地上游集群的fleet 默认命名空间中的clusters.cluster.x-k8s.io对象可以验证此条件span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcodekubectl get a>In the output, you will see the following field set totrue:在输出中你会看到以下字段设置为truespan stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcodespec: paused: true/code/span/span/spanResolution 结局To recover the cluster from the paused state:要从暂停状态恢复集群Edit theclusters.cluster.x-k8s.ioobject in thefleet-defaultnamespace on thelocal (upstream)cluster:编辑本地上游集群中fleet-default命名空间中的clusters.cluster.x-k8s.io对象span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcodekubectl edit a>Locate the following field:查找以下字段span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcodespec: paused: true/code/span/span/spanChange the value ofpausedtofalse, then save and exit the editor.将paused的值改为false然后保存并退出编辑器。span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcodespec: paused: false/code/span/span/spanThese steps will instruct Rancher tounpausethe cluster, allowing the restore process to continue.这些步骤会指示牧场者解除集群暂停从而让恢复过程得以继续。Once the cluster resumes activity, it is recommended tore-run the DR processto ensure the cluster is fully recovered.一旦集群恢复活动建议重新运行 DR 进程以确保集群完全恢复。For detailed guidance, refer to the official Rancher Manager Backup and Restore documentation for your specific distribution.如需详细指导请参阅您具体发行版的官方 Rancher Manager 备份与恢复文档。Cause 病因The issue typically occurs due to one of the following:问题通常由以下原因之一引起An unexpected incident (e.g., network interruption, OS failure, etc.) leading the cluster into a broken state.一个意外事件例如网络中断、操作系统故障等导致集群进入了故障状态。A complete outage rendering all Control Plane nodes unavailable.完全中断导致所有控制平面节点无法使用。Additional Information 附加信息Environment 环境Rancher Server 2.7.6 and aboveRancher Server 2.7.6 及以上版本访问Rancher-K8S解决方案博主企业合作伙伴 https://blog.csdn.net/lidw2009

更多文章