redis事务执行失败经历1

May 4, 2023

报错信息 #

在开发调试过程中，遇到了redis事务执行失败的问题，报错信息如下：

EXECABORT Transaction discarded because of previous errors.

开发语言：go。使用redis-client：go-redis。调用方式：TxPipeline，即管道事务。 redis版本：4.0。

问题分析 #

怀疑是自己的redis命令传参有问题 #

首先，封装的方法之前已经成功执行过，所以方法本身是没有问题的。怀疑是传参的问题，进行debug断点调试，传参没有任何问题。排除此可能。

改为pipeline方式执行 #

这个报错没有给出什么有效的信息，只说由于之前的错误导致事务执行失败，却不告诉是什么错误。改为普通pipeline的方式执行，报错信息如下：

NOREPLICAS Not enough good replicas to write.

从字面意思理解，是写从节点出现了问题。如果是这样，那普通的写命令也应该出现出现才对。

终端执行redis命令测试 #

在终端下连接redis，测试：

set key1 value1

执行后，直接断开了连接。

问题解决 #

经和运维人员沟通，发现是运维在搞事情，redis的从节点没有启动。启动从节点后，再次测试，一切正常。

问题复现 #

经网上搜索，发现此错误可能与redis的两个配置项

min-replicas-to-write
min-replicas-max-lag 简单来说：连接的从节点数量不足，或者从节点的延迟过大，都会导致master拒绝写入。

# It is possible for a master to stop accepting writes if there are less than
# N replicas connected, having a lag less or equal than M seconds.
#
# The N replicas need to be in "online" state.
#
# The lag in seconds, that must be <= the specified value, is calculated from
# the last ping received from the replica, that is usually sent every second.
#
# This option does not GUARANTEE that N replicas will accept the write, but
# will limit the window of exposure for lost writes in case not enough replicas
# are available, to the specified number of seconds.
#
# For example to require at least 3 replicas with a lag <= 10 seconds use:
#
# min-replicas-to-write 3
# min-replicas-max-lag 10

在本机取消第一个选项min-replicas-to-write的注释，重启redis。就可以重现这个问题。

Transaction discarded because of previous errors这个错误提示究竟是谁返回的？ #

这个错误提示是redis-server返回的，而不是go-redis客户端返回的。具体在redis源码-server.c文件中。而调用在multi.c文件execCommand函数中。错误定义错误调用

可以看到，这里会判断是否需要中止执行。