clickhouse执行查询内存超出限制问题:Memory limit (total) exceeded

原创 2020-11-26 18:15 阅读(12639)次

今天用clickhouse执行分布式查询报错:

SQL 错误 [241]: ClickHouse exception, code: 241, host: 192.168.29.124, port: 18123; Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 15.44 GiB (attempt to
allocate chunk of 4294967296 bytes), maximum: 13.95 GiB (version 20.6.3.28 (official build))

查询的语句如下:

	  SELECT DISTINCT plate_number FROM 
	 (SELECT DISTINCT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	 ) AS t0 
	 INNER JOIN
	 (SELECT DISTINCT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	 ) AS t1 ON t0.plate_number = t1.plate_number 
	 INNER JOIN
	 (SELECT DISTINCT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	  ) AS t2 ON t1.plate_number = t2.plate_number 
	 INNER JOIN
	 (SELECT DISTINCT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	  ) AS t3 ON t2.plate_number = t3.plate_number 

以上语句的意思是要查询同时出现在不同时段的车牌,每个时段数据量在10万左右,在选择3组join查询时结果也1到2秒内查询出结果,但是选择4组时段join查询就报内存超出限制
也就是说3组子查询join的话,没问题,4组甚至更多组就会报出内存限制问题。
由于业务上只要查询出车牌,所以上面每组子查询都用了DISTINCT做去重,之所以用DISTINCT不用group by的原因是我不想把语句写太长,太复杂。
看了下clickhouse的DISTINCT与group by功能上是差不多的(实际上用起来并非如此),于是我在user.xml中设置如下:

            <max_bytes_before_external_group_by>7500000000</max_bytes_before_external_group_by>

也就是设置group by超过7.5g就让group by走磁盘查询避免内存超出,设置完重启,上面这个查询依然报内存超出限制。也就是说这个配置不会对DISTINCT有影响。
于是我只好把上面的子查询语句DISTINCT全部修改成group by:

	SELECT DISTINCT plate_number FROM 
	 (SELECT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	 group by e.plate_number) AS t0 
	 INNER JOIN
	 (SELECT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	 group by e.plate_number) AS t1 ON t0.plate_number = t1.plate_number 
	 INNER JOIN
	 (SELECT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	  group by e.plate_number) AS t2 ON t1.plate_number = t2.plate_number 
	 INNER JOIN
	 (SELECT e.plate_number FROM base_db.event_local AS e 
	 WHERE (e.snapshot_time >= '2020-11-18 00:00:00') AND (e.snapshot_time <= '2020-11-24 23:59:59') 
	  group by e.plate_number) AS t3 ON t2.plate_number = t3.plate_number 

终于,没有再报内存问题,而且这个查询只用了2秒。
总结一下:group by与DISTINCT虽说查询结果一样,但是在内存限制的计算中并不太一样,max_bytes_before_external_group_by参数配置也只会对group by有影响,另外我还发现

分布式查询中除响应节点外其他节点被分发的语句,以上面报错的语句为例(不是我修改后的),从t0这个子查询开始的DISTINCT关键字会被去掉,分发节点的查询被篡改,可能是ck自动优化化语句吧。所以基于内存的限制问题来说clickhouse的复杂查询能不用DISTINCT就不用,我这就掉坑里了。

以上问题解决办法就是将DISTINCT修改为group by,并且设置max_bytes_before_external_group_by参数就可以了

如果开发过程中有什么问题,或者想了解一些clickhouse的原理的同学,推荐朱凯老师这本书

链接地址点:ClickHouse原理解析与应用实践