redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

背景:  因机房故障,导致redis-cluster 集群中 有个节点 down 了 ,再从新拉起节点加入集群后,就没在意

过来小许,服务器磁盘告警来了,我看了下 应用服务器上的tomcat 的日志,不停的刷,日志内容 报redis 错误, CLUSTERDOWN Hash slot not served

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

吓得我 立马 去 redis 集群上去 检查 是否是我的刚刚加入的节点有问题

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

(error) CLUSTERDOWN Hash slot not served

192.168.207.251:7001>exit

提示hash 槽有问题 ;

退出redis集群 ,使用 redis-trib.rb check  对集群进行检测

备注: 我这里是redis 3.2 版本 ,使用 redis-trib.rb check 进行检测

            如果是 redis 3 版本以上 请使用 redis-cli –cluster check 命令

[root@admin ~]# redis-trib.rb check 192.168.207.251:7001

检测后 提示 如下图, [ERR] Not all 16384 slots are covered by nodes

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

可以看到 16384号slots没有被分配

下一步,我们尝试使用 redis 官方推荐的 redis-trib.rb fix 命令 来修复集群(同样 redis 3版本以上的请使用redis-cli –cluster fix 进行修复  )

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-5459 (5460 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 5460
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001
/usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis/client.rb:114:in `call': ERR Slot 6829 is already busy (Redis::CommandError)
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2646:in `block in method_missing'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `block in synchronize'
        from /usr/local/ruby/lib/ruby/2.2.0/monitor.rb:211:in `mon_synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2645:in `method_missing'
        from /usr/local/bin/redis-trib.rb:462:in `block in fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:459:in `each'
        from /usr/local/bin/redis-trib.rb:459:in `fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:398:in `check_slots_coverage'
        from /usr/local/bin/redis-trib.rb:361:in `check_cluster'
        from /usr/local/bin/redis-trib.rb:1139:in `fix_cluster_cmd'
        from /usr/local/bin/redis-trib.rb:1695:in `<main>'

fix 修复失败,提示 slot 6829 正在使用

登录集群,CLUSTER DELSLOTS 6829  (使一个特定的Redis Cluster节点去忘记一个主节点正在负责的哈希槽)

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001> CLUSTER DELSLOTS 6829

OK

再次 使用 redis-trib.rb fix  去修复集群

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829 (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 6829
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001

再次check

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829  (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5460-10922 (5463 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

发现 16384 个哈希槽全部分配好了

这时我们再登录集群测试写个KEY

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

ok

发现 成功写入了KEY

同时也看了下  tomcat 得到日志 也不再刷 CLUSTERDOWN Hash slot not served

文章来源:https://www.cnaaa.net,转载请注明出处:https://www.cnaaa.net/archives/8182

(0)
杰斯的头像杰斯
上一篇 2023年5月15日 下午5:16
下一篇 2023年5月16日 下午5:20

相关推荐

  • rsync综合备份

    一.先看需求 客户端需求:1.客户端每天凌晨1点在服务器本地打包备份(/etc目录和/var/log目录)2.客户端备份的数据必须存放至以 “主机名ip地址当前时间” 命名的目录中3.客户端最后通过rsync推送本地已经打包好的备份文件至backup服务器4.客户端服务器本地保留最近7天的数据,避免浪费磁盘空间 服务端需求:1.服务端…

    2023年12月11日
    7200
  • Hollywood – 给你的命令行加点魔法般的动画效果

    作为命令行的重度用户,你是否想让枯燥的终端界面来点生动有趣的元素?Hollywood来了!这是一个无比诙谐、小巧玲珑而又功能强大的动画效果命令行工具。 Hollywood可以为文本添加各种动画效果,让你的输出显示得像电影般生动活泼。它支持多种炫酷动画,并可深度自定义。本文将详细介绍Hollywood的安装使用、酷炫示例和高级技巧,让你快速上手,给终端加点魔力…

    2023年10月13日
    23200
  • 清华源连接失败原因与解决 CondaHTTPError SSLError

    Conda 清华源连接失败原因与解决 问题描述 在我设置好国内源之后,用conda创建虚拟环境,下载python版本时出现以下错误。 我的~/.condarc内容(即conda channels设置)如下 解决方案 在我查阅了多篇博客,尝试了多种方案之后,终于找到两种解决方案: 两种方法选一种即可 原因分析 https协议比http协议多了SSL,TLS等验…

    2023年3月23日
    41100
  • Linux查看网卡连接状态

    CentOS系统 通过service命令查看 连接状态 未连接状态

    2022年7月14日
    86000
  • Shell脚本——提取目录名和文件名

    在许多场景下,我们都需要对文件名称或者文件所在的目录进行操作,已达到我们业务目的。通常的操作是由路径中提取文件名,从路径中提取目录名,提取文件后缀等等一系列的操作。 一、${} 1.${var##*/} 2.${var##*.} 3.${var#*.} 4.${var%/*} 5.${var%%.*} 6.总结 其实 ${} 并不是专门为提取文件名或目录名的…

    4天前
    4100

在线咨询: QQ交谈

邮件:712342017@qq.com

工作时间:周一至周五,8:30-17:30,节假日休息

关注微信