redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

背景:  因机房故障,导致redis-cluster 集群中 有个节点 down 了 ,再从新拉起节点加入集群后,就没在意

过来小许,服务器磁盘告警来了,我看了下 应用服务器上的tomcat 的日志,不停的刷,日志内容 报redis 错误, CLUSTERDOWN Hash slot not served

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

吓得我 立马 去 redis 集群上去 检查 是否是我的刚刚加入的节点有问题

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

(error) CLUSTERDOWN Hash slot not served

192.168.207.251:7001>exit

提示hash 槽有问题 ;

退出redis集群 ,使用 redis-trib.rb check  对集群进行检测

备注: 我这里是redis 3.2 版本 ,使用 redis-trib.rb check 进行检测

            如果是 redis 3 版本以上 请使用 redis-cli –cluster check 命令

[root@admin ~]# redis-trib.rb check 192.168.207.251:7001

检测后 提示 如下图, [ERR] Not all 16384 slots are covered by nodes

redis 集群 CLUSTERDOWN Hash slot not served 报错解决思路

可以看到 16384号slots没有被分配

下一步,我们尝试使用 redis 官方推荐的 redis-trib.rb fix 命令 来修复集群(同样 redis 3版本以上的请使用redis-cli –cluster fix 进行修复  )

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-5459 (5460 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 5460
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001
/usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis/client.rb:114:in `call': ERR Slot 6829 is already busy (Redis::CommandError)
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2646:in `block in method_missing'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `block in synchronize'
        from /usr/local/ruby/lib/ruby/2.2.0/monitor.rb:211:in `mon_synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:57:in `synchronize'
        from /usr/local/ruby/lib/ruby/gems/2.2.0/gems/redis-3.2.2/lib/redis.rb:2645:in `method_missing'
        from /usr/local/bin/redis-trib.rb:462:in `block in fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:459:in `each'
        from /usr/local/bin/redis-trib.rb:459:in `fix_slots_coverage'
        from /usr/local/bin/redis-trib.rb:398:in `check_slots_coverage'
        from /usr/local/bin/redis-trib.rb:361:in `check_cluster'
        from /usr/local/bin/redis-trib.rb:1139:in `fix_cluster_cmd'
        from /usr/local/bin/redis-trib.rb:1695:in `<main>'

fix 修复失败,提示 slot 6829 正在使用

登录集群,CLUSTER DELSLOTS 6829  (使一个特定的Redis Cluster节点去忘记一个主节点正在负责的哈希槽)

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001> CLUSTER DELSLOTS 6829

OK

再次 使用 redis-trib.rb fix  去修复集群

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829 (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
>>> Fixing slots coverage...
List of not covered slots: 6829
Slot 5460 has keys in 0 nodes: 
The folowing uncovered slots have no keys across the cluster:
5460
Fix these slots by covering with a random node? (type 'yes' to accept): yes
>>> Covering slot 6829 with 192.168.207.251:7001

再次check

[root@admin ~]# redis-trib.rb fix 192.168.207.251:7001
>>> Performing Cluster Check (using node 192.168.207.251:7001)
M: 3d1808df8a82dd705a67e8fef26cb8c8482bd69a 192.168.207.251:7001
   slots:0-6829  (6829 slots) master
   1 additional replica(s)
S: 6fa8513ee71e0153e2e746fef2a3b1ce08348766 192.168.207.159:7003
   slots: (0 slots) slave
   replicates a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32
M: 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c 192.168.207.160:7001
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d057d13bc64b9be9f68c0291700965b3751127c0 192.168.207.159:7005
   slots: (0 slots) slave
   replicates 36dc79859c644cb03e6c1b0ec4a2d8bf56cfa39c
M: a10cf2bdaf1c445c34d6ae447a2f79bc5d4e9c32 192.168.207.160:7002
   slots:5460-10922 (5463 slots) master
   1 additional replica(s)
S: a84cb46e811925ee48f0ff8c25e753be752fc45c 192.168.207.159:7004
   slots: (0 slots) slave
   replicates 3d1808df8a82dd705a67e8fef26cb8c8482bd69a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

发现 16384 个哈希槽全部分配好了

这时我们再登录集群测试写个KEY

[root@admin ~]# redis-cli -c -h 192.168.207.251 -p 7001

192.168.207.251:7001>set testfxkj   6666

ok

发现 成功写入了KEY

同时也看了下  tomcat 得到日志 也不再刷 CLUSTERDOWN Hash slot not served

文章来源:https://www.cnaaa.net,转载请注明出处:https://www.cnaaa.net/archives/8182

(0)
杰斯的头像杰斯
上一篇 2023年5月15日 下午5:16
下一篇 2023年5月16日 下午5:20

相关推荐

  • 通过 Linux 命令行连接远程 Windows 系统

    安装 xfreerdp 在使用 xfreerdp 之前,您需要安装它。在大多数 Linux 发行版中,您可以使用包管理器来执行此操作。 在 Ubuntu / Debian 和 Linux Mint 上,运行以下命令 基于 RHEL 的发行版,如 Rocky Linux、AlmaLinux 和 Fedora,运行以下命令 使用 xfreerdp 远程连接 Wi…

    2023年9月20日
    23000
  • Windows下安装Nginx错误总结

    别问我为啥非要在Windows上按照Nginx,问的话,回答就是:有这个需求 1:CreateFile()“xxxxx” failed (3: The system cannot find the path specified) 产生原因:创建文件xxxx异常了。大多数情况就是因为:安装目录中存在中文或者是空格 比如凯哥的就是因为存在空格。凯哥第一…

    2024年5月11日
    9400
  • Centos7安装telnet-server

    如果什么都不显示。说明你没有安装telnet xinetd是新一代的网络守护进程服务程序 注意:因为是由xinetd管理,这里启动的是telnet.socket而不是telnet.service 注意: 默认情况下,PAM模块限制root不能telnet到telnet-server,可使用普通用户登录后su切换 一般不建议直接用root用户远程通过…

    2023年5月29日
    26400
  • 统一身份认证系统 OpenLDAP 完整部署

    LDAP 介绍LDAP 是什么?在那些地方用会用到 LDAP?LDAP英文名称:Lightweight Directory Access Protocol 轻型目录访问协议。常用在单点登录,用户可以通过一个用户和密码登录多个服务,方便管理。目前我们使用的一下工具,Jenkins,GitLab,Jumpserver,Grafana,Confluence,Nex…

    2023年12月11日
    17800
  • Route命令详解

    route命令用来显示并设置Linux内核中的网络路由表,route命令设置的路由主要是静态路由。要实现两个不同的子网之间的通信,需要一台连接两个网络的路由器,或者同时位于两个网络的网关来实现。 在Linux系统中设置路由通常是为了解决以下问题:该Linux系统在一个局域网中,局域网中有一个网关,能够让机器访问Internet,那么就需要将这台机器的ip地址…

    2023年4月7日
    22600

在线咨询: QQ交谈

邮件:712342017@qq.com

工作时间:周一至周五,8:30-17:30,节假日休息

关注微信