Skip to content

[fix](job) fix routine load task schedule stuck after create task fail#60143

Merged
dataroaring merged 1 commit intoapache:masterfrom
sollhui:fix_rl_job_schedule_stuck
Jan 30, 2026
Merged

[fix](job) fix routine load task schedule stuck after create task fail#60143
dataroaring merged 1 commit intoapache:masterfrom
sollhui:fix_rl_job_schedule_stuck

Conversation

@sollhui
Copy link
Copy Markdown
Contributor

@sollhui sollhui commented Jan 22, 2026

What problem does this PR solve?

fix routine load task schedule stuck after create task fail:

026-01-21 18:46:11,938 WARN (Routine load task scheduler|52) [RoutineLoadTaskScheduler.process():117] Taking routine load task from queue has been interrupted
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:499)
        at org.apache.doris.analysis.SlotRef.getTableName(SlotRef.java:356)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.rewriteOrToIn(ExtractCommonFactorsRule.java:536)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.makeCompoundRemaining(ExtractCommonFactorsRule.java:459)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.extractCommonFactors(ExtractCommonFactorsRule.java:205)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.apply(ExtractCommonFactorsRule.java:80)
        at org.apache.doris.rewrite.ExprRewriter.applyRuleOnce(ExprRewriter.java:178)
        at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:171)
        at org.apache.doris.planner.FileLoadScanNode.initWhereExpr(FileLoadScanNode.java:171)
        at org.apache.doris.planner.FileLoadScanNode.initAndSetPrecedingFilter(FileLoadScanNode.java:144)
        at org.apache.doris.planner.FileLoadScanNode.initParamCreateContexts(FileLoadScanNode.java:134)
        at org.apache.doris.planner.FileLoadScanNode.init(FileLoadScanNode.java:125)
        at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:307)
        at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:116)
        at org.apache.doris.load.routineload.RoutineLoadJob.plan(RoutineLoadJob.java:1032)
        at org.apache.doris.load.routineload.KafkaTaskInfo.rePlan(KafkaTaskInfo.java:136)
        at org.apache.doris.load.routineload.KafkaTaskInfo.createRoutineLoadTask(KafkaTaskInfo.java:99)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.scheduleOneTask(RoutineLoadTaskScheduler.java:193)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.process(RoutineLoadTaskScheduler.java:115)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.runAfterCatalogReady(RoutineLoadTaskScheduler.java:84)
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Jan 22, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 22, 2026

run buildall

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from cf2b3b7 to 5e4dcc5 Compare January 22, 2026 09:38
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 22, 2026

run buildall

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from 5e4dcc5 to ddfe49a Compare January 22, 2026 10:07
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 22, 2026

run buildall

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from ddfe49a to d0efa7e Compare January 23, 2026 08:10
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 23, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 7.69% (1/13) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 31039 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d0efa7e7d7c80c7143d5ea640f0c2c75edd4b041, data reload: false

------ Round 1 ----------------------------------
q1	17663	4750	4595	4595
q2	1993	307	212	212
q3	10249	1324	739	739
q4	10197	817	293	293
q5	7502	2069	1879	1879
q6	196	173	143	143
q7	875	721	592	592
q8	9283	1419	1210	1210
q9	5210	4593	4589	4589
q10	6838	1691	1284	1284
q11	518	296	291	291
q12	386	392	226	226
q13	17777	3792	3131	3131
q14	239	230	220	220
q15	601	532	519	519
q16	661	645	580	580
q17	655	790	550	550
q18	6688	6505	6362	6362
q19	1684	996	621	621
q20	398	361	231	231
q21	2644	1960	1809	1809
q22	1049	999	963	963
Total cold run time: 103306 ms
Total hot run time: 31039 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4880	4728	4744	4728
q2	330	396	320	320
q3	2165	2657	2277	2277
q4	1324	1785	1322	1322
q5	4053	3950	3991	3950
q6	214	172	130	130
q7	1902	1811	1996	1811
q8	2608	2527	2537	2527
q9	7257	7127	7104	7104
q10	2591	2792	2348	2348
q11	551	491	451	451
q12	685	747	628	628
q13	3657	4196	3480	3480
q14	298	318	291	291
q15	561	535	513	513
q16	623	695	676	676
q17	1145	1308	1348	1308
q18	8178	7818	7969	7818
q19	929	899	861	861
q20	2051	2134	1926	1926
q21	4674	4245	4105	4105
q22	1092	1027	996	996
Total cold run time: 51768 ms
Total hot run time: 49570 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 172611 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d0efa7e7d7c80c7143d5ea640f0c2c75edd4b041, data reload: false

query5	4390	631	509	509
query6	325	221	193	193
query7	4202	464	256	256
query8	347	243	230	230
query9	8731	2811	2878	2811
query10	437	332	268	268
query11	15313	15067	14878	14878
query12	168	119	114	114
query13	1247	457	389	389
query14	6225	3025	2836	2836
query14_1	2746	2658	2698	2658
query15	195	187	169	169
query16	977	477	451	451
query17	1081	662	547	547
query18	2441	429	319	319
query19	193	180	146	146
query20	126	113	110	110
query21	209	133	120	120
query22	4140	4274	3984	3984
query23	16076	15701	15301	15301
query23_1	15365	15470	15445	15445
query24	7105	1552	1149	1149
query24_1	1157	1161	1176	1161
query25	515	419	398	398
query26	1245	274	146	146
query27	2763	454	276	276
query28	4550	2147	2128	2128
query29	803	547	441	441
query30	314	243	202	202
query31	816	623	522	522
query32	83	77	72	72
query33	526	365	311	311
query34	883	891	539	539
query35	733	760	673	673
query36	867	901	837	837
query37	140	101	96	96
query38	2671	2703	2651	2651
query39	774	741	722	722
query39_1	731	716	695	695
query40	220	138	121	121
query41	72	67	70	67
query42	103	97	92	92
query43	436	433	420	420
query44	1385	756	749	749
query45	193	193	179	179
query46	830	949	578	578
query47	1325	1484	1321	1321
query48	331	327	243	243
query49	625	444	351	351
query50	691	281	219	219
query51	3849	3761	3722	3722
query52	92	93	87	87
query53	212	225	175	175
query54	295	283	293	283
query55	86	82	81	81
query56	309	313	306	306
query57	1036	1005	893	893
query58	283	270	278	270
query59	1994	2140	1971	1971
query60	342	349	336	336
query61	179	172	168	168
query62	409	358	305	305
query63	203	173	164	164
query64	5106	1222	894	894
query65	3837	3732	3721	3721
query66	1432	415	311	311
query67	15516	15645	15471	15471
query68	2456	1076	726	726
query69	390	308	275	275
query70	1006	940	941	940
query71	306	284	269	269
query72	5233	3216	3217	3216
query73	587	723	318	318
query74	8769	8762	8533	8533
query75	2302	2307	1883	1883
query76	2277	1057	650	650
query77	354	386	310	310
query78	9687	9889	9141	9141
query79	1077	921	580	580
query80	1288	520	439	439
query81	547	257	232	232
query82	1011	152	117	117
query83	329	266	245	245
query84	264	114	100	100
query85	881	461	414	414
query86	405	281	317	281
query87	2895	2900	2794	2794
query88	3515	2572	2569	2569
query89	323	256	243	243
query90	1965	178	174	174
query91	161	165	139	139
query92	77	70	70	70
query93	1192	1011	654	654
query94	630	318	294	294
query95	582	337	341	337
query96	636	511	228	228
query97	2355	2393	2391	2391
query98	212	208	202	202
query99	612	558	507	507
Total cold run time: 246332 ms
Total hot run time: 172611 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 26.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d0efa7e7d7c80c7143d5ea640f0c2c75edd4b041, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.04	0.05
query3	0.25	0.08	0.08
query4	1.60	0.12	0.12
query5	0.28	0.26	0.25
query6	1.15	0.65	0.65
query7	0.02	0.02	0.02
query8	0.06	0.04	0.04
query9	0.57	0.50	0.50
query10	0.56	0.56	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.59	0.58	0.58
query14	0.96	0.96	0.93
query15	0.79	0.77	0.79
query16	0.39	0.39	0.42
query17	0.98	1.06	1.01
query18	0.22	0.24	0.21
query19	1.97	1.91	1.87
query20	0.02	0.01	0.02
query21	15.44	0.27	0.14
query22	5.45	0.04	0.04
query23	16.04	0.30	0.10
query24	2.45	0.41	0.30
query25	0.11	0.12	0.07
query26	0.15	0.13	0.12
query27	0.07	0.05	0.08
query28	4.61	1.06	0.88
query29	12.55	3.86	3.18
query30	0.27	0.15	0.12
query31	2.81	0.63	0.40
query32	3.24	0.55	0.45
query33	3.00	2.99	3.02
query34	16.16	5.05	4.47
query35	4.42	4.43	4.48
query36	0.67	0.50	0.48
query37	0.11	0.07	0.06
query38	0.07	0.04	0.03
query39	0.05	0.03	0.03
query40	0.16	0.14	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 98.86 s
Total hot run time: 26.86 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 76.92% (10/13) 🎉
Increment coverage report
Complete coverage report

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from d0efa7e to 6db4017 Compare January 26, 2026 07:10
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 26, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 30909 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6db4017f3258605c600fd61b35c4b805b4a7e837, data reload: false

------ Round 1 ----------------------------------
q1	17645	4912	4581	4581
q2	2093	322	217	217
q3	10161	1308	742	742
q4	10186	812	306	306
q5	7531	2026	1920	1920
q6	181	166	138	138
q7	869	711	573	573
q8	9291	1326	1219	1219
q9	4886	4763	4583	4583
q10	6750	1663	1268	1268
q11	533	295	292	292
q12	338	383	225	225
q13	17789	3840	3079	3079
q14	240	237	214	214
q15	607	539	523	523
q16	619	619	584	584
q17	638	791	516	516
q18	6649	6517	6240	6240
q19	1224	967	634	634
q20	397	349	233	233
q21	2735	2082	1881	1881
q22	1050	1004	941	941
Total cold run time: 102412 ms
Total hot run time: 30909 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4727	4701	4662	4662
q2	308	421	321	321
q3	2137	2697	2236	2236
q4	1337	1748	1331	1331
q5	4092	4010	4016	4010
q6	212	168	130	130
q7	1921	1838	1709	1709
q8	2896	2459	2515	2459
q9	7199	7224	7290	7224
q10	2632	2733	2330	2330
q11	568	471	484	471
q12	739	784	635	635
q13	3651	4030	3349	3349
q14	312	329	319	319
q15	552	512	513	512
q16	667	680	628	628
q17	1188	1354	1446	1354
q18	8003	8022	8011	8011
q19	889	828	840	828
q20	1996	2106	1906	1906
q21	4796	4587	4213	4213
q22	1047	1003	959	959
Total cold run time: 51869 ms
Total hot run time: 49597 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 171892 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6db4017f3258605c600fd61b35c4b805b4a7e837, data reload: false

query5	4420	626	486	486
query6	327	209	189	189
query7	4224	468	262	262
query8	352	247	240	240
query9	8690	2847	2861	2847
query10	425	313	281	281
query11	15436	15149	14824	14824
query12	180	120	117	117
query13	1270	486	377	377
query14	6205	3053	2734	2734
query14_1	2642	2641	2620	2620
query15	195	184	172	172
query16	975	484	372	372
query17	1068	656	560	560
query18	2512	421	338	338
query19	192	171	149	149
query20	122	116	111	111
query21	211	145	118	118
query22	4101	4024	3957	3957
query23	16101	15594	15435	15435
query23_1	15387	15332	15302	15302
query24	7173	1537	1153	1153
query24_1	1165	1160	1182	1160
query25	514	428	373	373
query26	1235	262	158	158
query27	2766	437	274	274
query28	4585	2179	2165	2165
query29	756	513	430	430
query30	309	250	202	202
query31	798	616	555	555
query32	80	79	73	73
query33	521	345	300	300
query34	899	876	539	539
query35	710	752	662	662
query36	883	891	866	866
query37	134	92	88	88
query38	2708	2682	2650	2650
query39	756	745	733	733
query39_1	701	710	707	707
query40	222	133	117	117
query41	67	60	61	60
query42	93	93	95	93
query43	439	461	419	419
query44	1328	752	750	750
query45	188	191	178	178
query46	832	947	587	587
query47	1424	1410	1414	1410
query48	324	326	266	266
query49	616	443	356	356
query50	682	280	211	211
query51	3733	3791	3670	3670
query52	87	90	85	85
query53	203	225	171	171
query54	295	267	259	259
query55	84	80	82	80
query56	320	298	305	298
query57	1015	1037	950	950
query58	278	272	256	256
query59	2002	2068	2127	2068
query60	341	331	324	324
query61	170	170	170	170
query62	432	359	320	320
query63	196	167	175	167
query64	4970	1144	811	811
query65	3765	3667	3772	3667
query66	1421	416	327	327
query67	15587	15505	15453	15453
query68	2412	1070	710	710
query69	402	314	269	269
query70	998	943	910	910
query71	298	278	272	272
query72	5220	2985	3241	2985
query73	582	678	309	309
query74	8701	8634	8464	8464
query75	2291	2297	1922	1922
query76	2287	1045	645	645
query77	357	370	294	294
query78	9684	9838	9081	9081
query79	1046	905	605	605
query80	1152	528	446	446
query81	515	259	224	224
query82	1303	149	115	115
query83	359	258	243	243
query84	250	109	96	96
query85	933	456	434	434
query86	362	292	314	292
query87	2782	2838	2694	2694
query88	3514	2595	2571	2571
query89	302	263	233	233
query90	1839	169	155	155
query91	157	157	132	132
query92	80	70	68	68
query93	1042	999	650	650
query94	573	319	288	288
query95	582	389	315	315
query96	648	496	225	225
query97	2337	2399	2289	2289
query98	216	202	198	198
query99	624	579	488	488
Total cold run time: 245448 ms
Total hot run time: 171892 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 26.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6db4017f3258605c600fd61b35c4b805b4a7e837, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.11	0.11
query5	0.27	0.26	0.26
query6	1.16	0.65	0.64
query7	0.04	0.02	0.02
query8	0.05	0.04	0.05
query9	0.56	0.51	0.49
query10	0.54	0.55	0.54
query11	0.14	0.09	0.11
query12	0.15	0.11	0.10
query13	0.60	0.58	0.59
query14	0.96	0.95	0.93
query15	0.80	0.78	0.77
query16	0.39	0.39	0.39
query17	1.03	1.02	1.02
query18	0.26	0.21	0.20
query19	1.86	1.83	1.77
query20	0.02	0.01	0.02
query21	15.45	0.25	0.14
query22	5.30	0.05	0.05
query23	16.18	0.28	0.10
query24	1.40	0.69	0.17
query25	0.06	0.09	0.08
query26	0.14	0.14	0.13
query27	0.08	0.08	0.06
query28	4.59	1.07	0.88
query29	12.51	3.94	3.18
query30	0.28	0.14	0.12
query31	2.82	0.62	0.38
query32	3.24	0.56	0.46
query33	3.05	2.96	3.03
query34	16.14	5.07	4.44
query35	4.40	4.39	4.52
query36	0.64	0.50	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.04
Total cold run time: 97.7 s
Total hot run time: 26.53 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (13/13) 🎉
Increment coverage report
Complete coverage report

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from 6db4017 to 9632478 Compare January 26, 2026 14:33
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 26, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 32762 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 963247820dba459d7030e156bb1ba5f37973f58f, data reload: false

------ Round 1 ----------------------------------
q1	17633	5248	5061	5061
q2	2091	303	189	189
q3	10196	1306	741	741
q4	10210	853	316	316
q5	7533	2167	1890	1890
q6	197	179	156	156
q7	861	727	612	612
q8	9253	1357	1062	1062
q9	5217	4850	4843	4843
q10	6850	2022	1596	1596
q11	518	295	302	295
q12	352	376	227	227
q13	18686	4160	3432	3432
q14	250	241	220	220
q15	905	826	844	826
q16	695	689	627	627
q17	649	781	521	521
q18	6938	6685	6446	6446
q19	1504	988	616	616
q20	388	335	226	226
q21	2609	2044	1937	1937
q22	1054	1018	923	923
Total cold run time: 104589 ms
Total hot run time: 32762 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5320	5347	5325	5325
q2	263	340	245	245
q3	2132	2624	2246	2246
q4	1387	1756	1319	1319
q5	4304	4245	5136	4245
q6	228	191	148	148
q7	2074	1958	1864	1864
q8	2492	2399	2427	2399
q9	7581	7418	7502	7418
q10	2802	3134	2606	2606
q11	549	478	437	437
q12	666	758	618	618
q13	3998	4264	3707	3707
q14	300	308	279	279
q15	849	844	855	844
q16	665	776	675	675
q17	1124	1324	1396	1324
q18	8257	7971	8121	7971
q19	872	866	843	843
q20	2093	2203	2066	2066
q21	4699	4222	4089	4089
q22	1073	1022	939	939
Total cold run time: 53728 ms
Total hot run time: 51607 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 963247820dba459d7030e156bb1ba5f37973f58f, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.04	0.04
query3	0.25	0.09	0.09
query4	1.61	0.11	0.10
query5	0.27	0.26	0.26
query6	1.16	0.68	0.67
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.57	0.49	0.50
query10	0.54	0.54	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.11
query13	0.64	0.61	0.62
query14	1.05	1.05	1.08
query15	0.87	0.87	0.86
query16	0.39	0.40	0.40
query17	1.15	1.07	1.10
query18	0.23	0.21	0.22
query19	2.04	2.02	1.96
query20	0.02	0.01	0.01
query21	15.42	0.26	0.14
query22	4.96	0.06	0.05
query23	15.77	0.29	0.10
query24	2.07	0.31	0.63
query25	0.09	0.08	0.08
query26	0.15	0.13	0.13
query27	0.08	0.07	0.06
query28	4.34	1.16	0.96
query29	12.58	3.97	3.21
query30	0.28	0.13	0.12
query31	2.81	0.64	0.40
query32	3.23	0.60	0.49
query33	3.19	3.26	3.26
query34	16.41	5.36	4.71
query35	4.78	4.79	4.79
query36	0.64	0.50	0.50
query37	0.11	0.07	0.07
query38	0.08	0.04	0.03
query39	0.04	0.03	0.04
query40	0.18	0.17	0.16
query41	0.09	0.03	0.02
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 98.67 s
Total hot run time: 28.37 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (13/13) 🎉
Increment coverage report
Complete coverage report

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from 9632478 to 4ed75ff Compare January 27, 2026 03:06
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 27, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 32473 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4ed75ff24309263ada1e9faa3f0b64be6e48bdfd, data reload: false

------ Round 1 ----------------------------------
q1	17759	5254	5035	5035
q2	2054	302	211	211
q3	10220	1307	743	743
q4	10231	851	319	319
q5	7473	2178	1927	1927
q6	201	189	149	149
q7	898	756	605	605
q8	9270	1373	1115	1115
q9	5240	4934	4836	4836
q10	6825	1958	1560	1560
q11	523	291	266	266
q12	377	379	219	219
q13	17819	4019	3220	3220
q14	234	233	227	227
q15	908	805	821	805
q16	676	677	630	630
q17	627	827	465	465
q18	6705	6438	6470	6438
q19	1220	995	623	623
q20	372	337	236	236
q21	2881	1996	1868	1868
q22	1050	1044	976	976
Total cold run time: 103563 ms
Total hot run time: 32473 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5316	5312	5285	5285
q2	255	330	247	247
q3	2147	2695	2291	2291
q4	1359	1767	1260	1260
q5	4379	4125	4611	4125
q6	255	216	167	167
q7	2180	1993	1859	1859
q8	2595	2510	2418	2418
q9	7689	7396	7484	7396
q10	2771	3082	2621	2621
q11	570	476	460	460
q12	677	768	586	586
q13	3921	4318	3635	3635
q14	313	340	316	316
q15	874	816	828	816
q16	694	725	711	711
q17	1122	1283	1408	1283
q18	8096	7986	8008	7986
q19	924	827	811	811
q20	1985	2053	1904	1904
q21	4481	4205	4097	4097
q22	1051	1027	961	961
Total cold run time: 53654 ms
Total hot run time: 51235 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4ed75ff24309263ada1e9faa3f0b64be6e48bdfd, data reload: false

query1	0.06	0.06	0.05
query2	0.10	0.04	0.05
query3	0.26	0.09	0.08
query4	1.61	0.11	0.12
query5	0.27	0.24	0.26
query6	1.15	0.68	0.67
query7	0.03	0.03	0.02
query8	0.06	0.04	0.04
query9	0.57	0.50	0.49
query10	0.54	0.54	0.54
query11	0.15	0.09	0.09
query12	0.15	0.11	0.10
query13	0.63	0.62	0.62
query14	1.06	1.04	1.06
query15	0.88	0.87	0.87
query16	0.38	0.42	0.37
query17	1.10	1.12	1.09
query18	0.24	0.21	0.22
query19	2.07	1.93	1.96
query20	0.02	0.01	0.01
query21	15.42	0.25	0.14
query22	5.14	0.05	0.05
query23	15.83	0.28	0.11
query24	1.20	0.63	0.34
query25	0.11	0.09	0.07
query26	0.13	0.13	0.13
query27	0.08	0.04	0.06
query28	3.48	1.15	0.97
query29	12.57	3.98	3.15
query30	0.28	0.13	0.12
query31	2.82	0.64	0.40
query32	3.24	0.59	0.51
query33	3.16	3.16	3.22
query34	15.99	5.43	4.75
query35	4.76	4.82	4.77
query36	0.65	0.50	0.50
query37	0.11	0.07	0.07
query38	0.07	0.04	0.03
query39	0.04	0.03	0.03
query40	0.19	0.16	0.14
query41	0.09	0.03	0.02
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.78 s
Total hot run time: 28.17 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (13/13) 🎉
Increment coverage report
Complete coverage report

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from 4ed75ff to 4ba0344 Compare January 28, 2026 07:19
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 28, 2026

run buildall

@sollhui sollhui force-pushed the fix_rl_job_schedule_stuck branch from 4ba0344 to 9be3f73 Compare January 28, 2026 08:13
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 28, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 32196 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9be3f73f76271afac8f1d8d39f2b28aa94aee77f, data reload: false

------ Round 1 ----------------------------------
q1	17680	5369	5078	5078
q2	2029	315	197	197
q3	10199	1319	737	737
q4	10224	870	313	313
q5	7574	2178	1908	1908
q6	227	180	149	149
q7	875	747	601	601
q8	9248	1332	1089	1089
q9	5183	4844	4814	4814
q10	6749	1948	1575	1575
q11	510	284	281	281
q12	328	379	227	227
q13	17790	4001	3215	3215
q14	240	239	228	228
q15	877	824	812	812
q16	665	722	617	617
q17	625	758	502	502
q18	6875	6608	7497	6608
q19	1277	1001	665	665
q20	429	367	265	265
q21	3033	2294	2022	2022
q22	376	318	293	293
Total cold run time: 103013 ms
Total hot run time: 32196 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5659	5661	5523	5523
q2	267	333	260	260
q3	2350	2866	2458	2458
q4	1437	1866	1477	1477
q5	4776	4534	4602	4534
q6	231	179	137	137
q7	2033	2011	1771	1771
q8	2520	2384	2371	2371
q9	7589	7539	7428	7428
q10	2959	3065	2579	2579
q11	553	458	444	444
q12	637	744	615	615
q13	3760	4038	3226	3226
q14	269	288	259	259
q15	827	794	787	787
q16	637	674	629	629
q17	1066	1221	1290	1221
q18	7673	7587	7440	7440
q19	835	777	795	777
q20	1955	2076	1910	1910
q21	4487	4197	4056	4056
q22	581	547	498	498
Total cold run time: 53101 ms
Total hot run time: 50400 ms

@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Jan 28, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 32126 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9be3f73f76271afac8f1d8d39f2b28aa94aee77f, data reload: false

------ Round 1 ----------------------------------
q1	17601	5215	5033	5033
q2	2073	312	186	186
q3	10240	1279	764	764
q4	10203	830	316	316
q5	7509	2155	1936	1936
q6	190	180	152	152
q7	909	726	631	631
q8	9247	1396	1127	1127
q9	5069	4908	4760	4760
q10	6778	1935	1565	1565
q11	515	290	276	276
q12	337	378	221	221
q13	17762	4042	3197	3197
q14	240	235	224	224
q15	885	835	814	814
q16	677	687	635	635
q17	633	838	439	439
q18	6886	6419	7435	6419
q19	1338	1038	681	681
q20	418	370	249	249
q21	2977	2210	2203	2203
q22	377	316	298	298
Total cold run time: 102864 ms
Total hot run time: 32126 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5553	5430	5580	5430
q2	278	335	279	279
q3	2397	2816	2518	2518
q4	1524	1927	1406	1406
q5	4581	4567	4661	4567
q6	226	181	139	139
q7	2020	1945	1829	1829
q8	2544	2469	2354	2354
q9	7570	7535	7333	7333
q10	2931	3029	2716	2716
q11	537	455	443	443
q12	648	707	624	624
q13	3795	4017	3249	3249
q14	270	289	267	267
q15	833	800	797	797
q16	629	679	650	650
q17	1101	1314	1331	1314
q18	7539	7316	7380	7316
q19	821	801	811	801
q20	1946	2055	1868	1868
q21	4549	4255	4159	4159
q22	582	547	546	546
Total cold run time: 52874 ms
Total hot run time: 50605 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9be3f73f76271afac8f1d8d39f2b28aa94aee77f, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.05
query3	0.25	0.09	0.08
query4	1.61	0.11	0.11
query5	0.27	0.25	0.25
query6	1.17	0.68	0.66
query7	0.03	0.03	0.03
query8	0.04	0.04	0.04
query9	0.56	0.50	0.50
query10	0.56	0.55	0.53
query11	0.15	0.09	0.09
query12	0.14	0.11	0.10
query13	0.63	0.61	0.63
query14	1.06	1.04	1.06
query15	0.88	0.86	0.86
query16	0.38	0.41	0.40
query17	1.17	1.16	1.13
query18	0.22	0.20	0.23
query19	2.12	2.00	2.08
query20	0.02	0.02	0.01
query21	15.44	0.26	0.15
query22	5.06	0.06	0.05
query23	15.87	0.28	0.10
query24	1.47	0.45	0.37
query25	0.10	0.09	0.08
query26	0.14	0.14	0.13
query27	0.07	0.05	0.06
query28	5.25	1.13	0.96
query29	12.62	3.93	3.18
query30	0.28	0.13	0.12
query31	2.83	0.62	0.41
query32	3.24	0.59	0.48
query33	3.18	3.27	3.29
query34	16.00	5.34	4.67
query35	4.79	4.81	4.78
query36	0.66	0.51	0.50
query37	0.10	0.07	0.06
query38	0.06	0.04	0.03
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 98.95 s
Total hot run time: 28.4 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (3/3) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (3/3) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 650932c into apache:master Jan 30, 2026
29 of 30 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 30, 2026
#60143)

### What problem does this PR solve?

fix routine load task schedule stuck after create task fail:
```
026-01-21 18:46:11,938 WARN (Routine load task scheduler|52) [RoutineLoadTaskScheduler.process():117] Taking routine load task from queue has been interrupted
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:499)
        at org.apache.doris.analysis.SlotRef.getTableName(SlotRef.java:356)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.rewriteOrToIn(ExtractCommonFactorsRule.java:536)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.makeCompoundRemaining(ExtractCommonFactorsRule.java:459)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.extractCommonFactors(ExtractCommonFactorsRule.java:205)
        at org.apache.doris.rewrite.ExtractCommonFactorsRule.apply(ExtractCommonFactorsRule.java:80)
        at org.apache.doris.rewrite.ExprRewriter.applyRuleOnce(ExprRewriter.java:178)
        at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:171)
        at org.apache.doris.planner.FileLoadScanNode.initWhereExpr(FileLoadScanNode.java:171)
        at org.apache.doris.planner.FileLoadScanNode.initAndSetPrecedingFilter(FileLoadScanNode.java:144)
        at org.apache.doris.planner.FileLoadScanNode.initParamCreateContexts(FileLoadScanNode.java:134)
        at org.apache.doris.planner.FileLoadScanNode.init(FileLoadScanNode.java:125)
        at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:307)
        at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:116)
        at org.apache.doris.load.routineload.RoutineLoadJob.plan(RoutineLoadJob.java:1032)
        at org.apache.doris.load.routineload.KafkaTaskInfo.rePlan(KafkaTaskInfo.java:136)
        at org.apache.doris.load.routineload.KafkaTaskInfo.createRoutineLoadTask(KafkaTaskInfo.java:99)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.scheduleOneTask(RoutineLoadTaskScheduler.java:193)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.process(RoutineLoadTaskScheduler.java:115)
        at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.runAfterCatalogReady(RoutineLoadTaskScheduler.java:84)
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119)
```
yiguolei pushed a commit that referenced this pull request Feb 2, 2026
…ate task fail #60143 (#60401)

Cherry-picked from #60143

Co-authored-by: hui lai <laihui@selectdb.com>
ybtsdst pushed a commit to ybtsdst/doris that referenced this pull request Feb 27, 2026
…ate task fail apache#60143 (apache#60401)

Cherry-picked from apache#60143

Co-authored-by: hui lai <laihui@selectdb.com>
@yiguolei yiguolei mentioned this pull request Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants