redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

背景:  因机房故障,导致redis-cluster 集群中 有个节点 down 了 ,再从新拉起节点加入集群后,就没在意

过来小许,服务器磁盘告警来了,我看了下 应用服务器上的tomcat 的日志,不停的刷,日志内容 报redis 错误, CLUSTERDOWN Hash slot not served

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

吓得我 立马 去 redis 集群上去 检查 是否是我的刚刚加入的节点有问题

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

(error) CLUSTERDOWN Hash slot not served

192.168.207.251:7001>exit

提示hash 槽有问题 ;

退出redis集群 ,使用 redis-trib.rb check  对集群进行检测

备注: 我这里是redis 3.2 版本 ,使用 redis-trib.rb check 进行检测

            如果是 redis 3 版本以上 请使用 redis-cli –cluster check 命令

[root@admin ~]# redis-trib.rb check 192.168.207.251:7001

检测后 提示 如下图, [ERR] Not all 16384 slots are covered by nodes

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

可以看到 16384号slots没有被分配

下一步,我们尝试使用 redis 官方推荐的 redis-trib.rb fix 命令 来修复集群(同样 redis 3版本以上的请使用redis-cli –cluster fix 进行修复  )

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-5459 (5460 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 5460
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001
/usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis/client.rb:114:in `call': ERR Slot 6829 is already busy (Redis::CommandError)
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2646:in `block in method_missing'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `block in synchronize'
        from /usr/local/ruby/lib/ruby/2.2.0/monitor.rb:211:in `mon_synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2645:in `method_missing'
        from /usr/local/bin/redis-trib.rb:462:in `block in fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:459:in `each'
        from /usr/local/bin/redis-trib.rb:459:in `fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:398:in `check_slots_coverage'
        from /usr/local/bin/redis-trib.rb:361:in `check_cluster'
        from /usr/local/bin/redis-trib.rb:1139:in `fix_cluster_cmd'
        from /usr/local/bin/redis-trib.rb:1695:in `<main>'

fix 修复失败,提示 slot 6829 正在使用

登录集群,CLUSTER DELSLOTS 6829  (使一个特定的Redis Cluster节点去忘记一个主节点正在负责的哈希槽)

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001> CLUSTER DELSLOTS 6829

OK

再次 使用 redis-trib.rb fix  去修复集群

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829 (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 6829
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001

再次check

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829  (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5460-10922 (5463 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

发现 16384 个哈希槽全部分配好了

这时我们再登录集群测试写个KEY

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

ok

发现 成功写入了KEY

同时也看了下  tomcat 得到日志 也不再刷 CLUSTERDOWN Hash slot not served

文章来源:https://www.cnaaa.net,转载请注明出处:https://www.cnaaa.net/archives/8182

(0)
杰斯的头像杰斯
上一篇 2023年5月15日 下午5:16
下一篇 2023年5月16日 下午5:20

相关推荐

  • Linux Centos 7.6修改ssh端口为49527,并添加防火墙例外,修改root密码, 设置禁ping,搭建FTP站点 ,修改yum源。

    1.修改ssh端口为49527,并添加防火墙例外 (1). 修改ssh配置文件  /etc/ssh/sshd_config,将端口号修改为49527.同时保留ssh默认的22端口,为了防止修改端口号失败以后,远程登录不上服务器,如图1所示: (2).修改firewall配置 默认情况下,防火墙在没有配置任何策略集情况下,是禁止所有ip地址和端口号同行的,因此…

    2022年7月18日
    94000
  • VMware虚拟机安装Ubuntu24.04教程

    (一)Ubuntu镜像文件下载(选择Ubuntu桌面版)1、Ubuntu官方网站(1)Ubuntu官网:https://ubuntu.com(2)Ubuntu官网中文站:https://cn.ubuntu.com(3)Ubuntu24.04桌面端官方下载:https://ubuntu.com/blog/ubuntu-desktop-24-04-noble-n…

    2024年5月31日
    37900
  • smokeping修改Ping间隔和Ping包数量

    1、复制 所有带*.dist 2、修改httpd.conf 配置文件 3、修改 /usr/local/smokeping/etc/config

    2022年11月26日
    55800
  • Windows下安装Nginx错误总结

    别问我为啥非要在Windows上按照Nginx,问的话,回答就是:有这个需求 1:CreateFile()“xxxxx” failed (3: The system cannot find the path specified) 产生原因:创建文件xxxx异常了。大多数情况就是因为:安装目录中存在中文或者是空格 比如凯哥的就是因为存在空格。凯哥第一…

    2024年5月11日
    16200
  • 微信扫码登录的技术实现思考

    简介: 微信扫码登录是很常见的技术,曾经在一次面试当中,面试官就曾问过微信扫码登录的实现思路,这次,以微信读书网页版扫码登录为例子,聊聊我对它技术实现思路一些思考。 微信扫码登录是很常见的技术,曾经在一次面试当中,面试官就曾问过微信扫码登录的实现思路,这次,以微信读书网页版扫码登录为例子,聊聊我对它技术实现思路一些思考。 以谷歌浏览器来做分析,打开F12,准…

    2023年11月10日
    28200

在线咨询: QQ交谈

邮件:712342017@qq.com

工作时间:周一至周五,8:30-17:30,节假日休息

关注微信