Skip to content

[fix](search) Use FE-provided analyzer key for multi-index columns in search()#60798

Merged
airborne12 merged 3 commits intoapache:masterfrom
airborne12:fix-DORIS-24542-multi-analyzer-search-empty
Feb 25, 2026
Merged

[fix](search) Use FE-provided analyzer key for multi-index columns in search()#60798
airborne12 merged 3 commits intoapache:masterfrom
airborne12:fix-DORIS-24542-multi-analyzer-search-empty

Conversation

@airborne12
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: close #DORIS-24542

Problem Summary:
When a column has multiple inverted indexes with different analyzers (e.g., one default untokenized index and one with English parser), search() in Lucene/scalar mode returns empty results.

Root cause: In FieldReaderResolver::resolve(), select_best_reader() was always called with an empty analyzer key "", causing it to pick the wrong (untokenized) index for tokenized queries. Additionally, the EQUAL_QUERY → MATCH_ANY_QUERY upgrade was restricted to variant subcolumns only.

Fix:

  1. Extract analyzer_key from FE-provided index_properties before calling select_best_reader() and pass it through
  2. Remove the is_variant_sub restriction on the query type upgrade so regular columns with multiple indexes also get the correct FULLTEXT reader

Release note

Fix search() returning empty results when a column has multiple inverted indexes with different analyzers.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. search() now correctly uses the analyzer-based index when multiple inverted indexes exist on the same column.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28833 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit abc5e4f2ca9c61e1f7887a56bc7f2b80bfbceacc, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17603	4484	4268	4268
q2	q3	10647	769	528	528
q4	4679	362	251	251
q5	7549	1183	1023	1023
q6	178	175	147	147
q7	770	835	680	680
q8	9293	1457	1329	1329
q9	4788	4709	4700	4700
q10	6919	1909	1628	1628
q11	467	256	235	235
q12	680	577	480	480
q13	17778	4255	3424	3424
q14	234	233	210	210
q15	931	791	793	791
q16	751	734	670	670
q17	720	841	454	454
q18	5883	5319	5265	5265
q19	1142	974	610	610
q20	542	493	385	385
q21	4597	1928	1498	1498
q22	377	321	257	257
Total cold run time: 96528 ms
Total hot run time: 28833 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4667	4680	4527	4527
q2	q3	1798	2272	1797	1797
q4	860	1222	762	762
q5	4051	4360	4308	4308
q6	180	171	145	145
q7	1768	1639	1548	1548
q8	2437	2706	2518	2518
q9	7531	7338	7397	7338
q10	2665	2872	2422	2422
q11	508	428	416	416
q12	501	582	427	427
q13	4190	4573	3581	3581
q14	287	305	267	267
q15	847	815	823	815
q16	711	803	752	752
q17	1159	1552	1425	1425
q18	7258	6832	6442	6442
q19	872	889	1050	889
q20	2098	2221	1973	1973
q21	3937	3617	3359	3359
q22	456	458	418	418
Total cold run time: 48781 ms
Total hot run time: 46129 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 183482 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit abc5e4f2ca9c61e1f7887a56bc7f2b80bfbceacc, data reload: false

query5	4816	619	512	512
query6	326	209	195	195
query7	4210	462	271	271
query8	355	235	233	233
query9	8728	2788	2746	2746
query10	547	367	352	352
query11	16957	17345	17060	17060
query12	191	134	133	133
query13	1373	518	367	367
query14	6642	3206	3122	3122
query14_1	3049	2977	2883	2883
query15	214	191	198	191
query16	1044	503	457	457
query17	1556	741	614	614
query18	3922	449	344	344
query19	212	214	188	188
query20	138	135	132	132
query21	226	142	133	133
query22	5446	4799	4898	4799
query23	17296	16733	16529	16529
query23_1	16657	16568	16726	16568
query24	7309	1632	1233	1233
query24_1	1205	1238	1243	1238
query25	540	457	392	392
query26	1224	258	146	146
query27	2785	467	284	284
query28	4487	1863	1859	1859
query29	789	555	483	483
query30	316	242	209	209
query31	891	734	657	657
query32	87	78	68	68
query33	519	357	290	290
query34	915	940	555	555
query35	636	660	594	594
query36	1082	1124	940	940
query37	132	97	85	85
query38	2972	2971	2822	2822
query39	898	886	856	856
query39_1	829	818	831	818
query40	239	154	136	136
query41	69	65	66	65
query42	108	106	109	106
query43	377	390	363	363
query44	
query45	200	202	187	187
query46	885	981	607	607
query47	2099	2150	2057	2057
query48	332	320	236	236
query49	649	484	391	391
query50	680	285	218	218
query51	4091	4068	4030	4030
query52	107	112	98	98
query53	293	334	294	294
query54	315	275	266	266
query55	91	88	83	83
query56	335	325	326	325
query57	1351	1343	1257	1257
query58	310	285	281	281
query59	2680	2721	2586	2586
query60	351	347	321	321
query61	173	176	173	173
query62	639	586	552	552
query63	319	278	282	278
query64	5001	1376	1090	1090
query65	
query66	1463	446	352	352
query67	16582	16408	16465	16408
query68	
query69	412	308	298	298
query70	978	914	906	906
query71	349	312	292	292
query72	2769	2651	2387	2387
query73	532	545	330	330
query74	9967	9885	9770	9770
query75	2831	2750	2448	2448
query76	2317	1025	657	657
query77	341	381	318	318
query78	11275	11407	10655	10655
query79	2030	792	627	627
query80	1660	603	539	539
query81	555	290	244	244
query82	984	153	112	112
query83	337	264	247	247
query84	250	122	105	105
query85	910	477	434	434
query86	436	297	310	297
query87	3111	3124	3013	3013
query88	3603	2689	2670	2670
query89	433	373	347	347
query90	1994	179	174	174
query91	169	159	139	139
query92	80	75	70	70
query93	1015	819	519	519
query94	655	318	303	303
query95	590	331	375	331
query96	631	518	225	225
query97	2492	2453	2373	2373
query98	243	218	221	218
query99	1005	987	855	855
Total cold run time: 257655 ms
Total hot run time: 183482 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.50% (19547/37230)
Line Coverage 36.11% (182282/504760)
Region Coverage 32.46% (141460/435785)
Branch Coverage 33.43% (61330/183473)

zclllyybb
zclllyybb previously approved these changes Feb 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.17% (26698/36488)
Line Coverage 56.45% (284246/503520)
Region Coverage 53.97% (237562/440169)
Branch Coverage 55.56% (102327/184177)

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

1 similar comment
@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Feb 23, 2026
@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28665 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cc6d06af1bc85dba1d4959c98d73b21205fcbecb, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17617	4484	4314	4314
q2	q3	10652	772	516	516
q4	4681	359	242	242
q5	7564	1200	1033	1033
q6	170	174	145	145
q7	801	846	672	672
q8	9284	1466	1306	1306
q9	4881	4671	4662	4662
q10	6817	1865	1642	1642
q11	457	252	247	247
q12	683	563	468	468
q13	17778	4233	3424	3424
q14	234	234	222	222
q15	938	790	786	786
q16	750	721	669	669
q17	695	825	441	441
q18	6059	5361	5195	5195
q19	1120	963	597	597
q20	508	495	389	389
q21	4779	1992	1441	1441
q22	410	334	254	254
Total cold run time: 96878 ms
Total hot run time: 28665 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4571	4564	4609	4564
q2	q3	1818	2275	1785	1785
q4	860	1177	775	775
q5	4010	4522	4357	4357
q6	190	176	165	165
q7	1802	1646	1513	1513
q8	2519	2787	2560	2560
q9	7600	7654	7456	7456
q10	2757	2931	2399	2399
q11	506	437	415	415
q12	509	592	451	451
q13	4356	4539	3679	3679
q14	287	310	291	291
q15	851	804	829	804
q16	696	781	738	738
q17	1160	1581	1427	1427
q18	7260	6817	6634	6634
q19	855	857	853	853
q20	2038	2168	1993	1993
q21	3897	3499	3357	3357
q22	538	462	413	413
Total cold run time: 49080 ms
Total hot run time: 46629 ms

@airborne12 airborne12 force-pushed the fix-DORIS-24542-multi-analyzer-search-empty branch from cc6d06a to dd2aef3 Compare February 23, 2026 14:14
@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28953 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dd2aef3693f9c312dbb917b83bb234d0a9c55a57, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17603	4599	4329	4329
q2	q3	10634	798	534	534
q4	4680	364	250	250
q5	7555	1210	1021	1021
q6	176	179	147	147
q7	812	846	671	671
q8	10015	1498	1368	1368
q9	6414	4774	4775	4774
q10	6859	1890	1636	1636
q11	483	256	256	256
q12	746	578	466	466
q13	17832	4236	3420	3420
q14	231	234	211	211
q15	949	795	799	795
q16	755	722	680	680
q17	736	872	427	427
q18	6077	5501	5280	5280
q19	1123	998	644	644
q20	514	502	390	390
q21	4645	1928	1396	1396
q22	338	288	258	258
Total cold run time: 99177 ms
Total hot run time: 28953 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4407	4328	4365	4328
q2	q3	1785	2191	1743	1743
q4	832	1159	739	739
q5	4038	4371	4337	4337
q6	182	174	140	140
q7	1727	1603	1498	1498
q8	2444	2654	2552	2552
q9	7247	7079	7062	7062
q10	2550	2715	2305	2305
q11	503	437	417	417
q12	483	580	438	438
q13	3789	4229	3419	3419
q14	269	279	261	261
q15	837	785	785	785
q16	681	750	693	693
q17	1116	1483	1251	1251
q18	7128	6655	6501	6501
q19	829	828	883	828
q20	2042	2088	1935	1935
q21	3979	3440	3292	3292
q22	466	454	389	389
Total cold run time: 47334 ms
Total hot run time: 44913 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 182830 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dd2aef3693f9c312dbb917b83bb234d0a9c55a57, data reload: false

query5	4019	602	502	502
query6	325	214	213	213
query7	4192	456	260	260
query8	336	225	229	225
query9	8613	2763	2711	2711
query10	507	390	334	334
query11	16996	16808	16559	16559
query12	177	125	125	125
query13	1252	461	332	332
query14	4751	3217	2975	2975
query14_1	2836	2788	2789	2788
query15	196	198	175	175
query16	904	478	449	449
query17	879	722	609	609
query18	2290	447	347	347
query19	219	202	215	202
query20	129	126	125	125
query21	193	148	118	118
query22	5087	4994	4925	4925
query23	17214	16763	16514	16514
query23_1	16667	16674	16600	16600
query24	7107	1639	1268	1268
query24_1	1256	1214	1228	1214
query25	535	458	407	407
query26	1127	262	154	154
query27	2753	480	285	285
query28	4528	1872	1868	1868
query29	779	561	465	465
query30	310	257	215	215
query31	890	716	659	659
query32	83	73	70	70
query33	517	324	284	284
query34	929	914	571	571
query35	639	686	604	604
query36	1071	1100	1036	1036
query37	130	98	85	85
query38	2999	2901	2852	2852
query39	894	868	857	857
query39_1	842	833	830	830
query40	226	157	143	143
query41	66	61	61	61
query42	106	106	101	101
query43	374	379	346	346
query44	
query45	197	190	184	184
query46	880	979	612	612
query47	2099	2144	2056	2056
query48	340	328	226	226
query49	610	469	379	379
query50	731	283	220	220
query51	4131	4075	4045	4045
query52	106	107	98	98
query53	284	333	281	281
query54	288	263	276	263
query55	84	82	88	82
query56	311	302	291	291
query57	1393	1360	1265	1265
query58	290	268	270	268
query59	2565	2672	2552	2552
query60	337	328	309	309
query61	184	140	144	140
query62	605	584	540	540
query63	307	273	272	272
query64	4806	1263	987	987
query65	
query66	1411	449	346	346
query67	16439	16382	16219	16219
query68	
query69	411	319	285	285
query70	1007	961	962	961
query71	346	298	299	298
query72	2758	2657	2438	2438
query73	546	558	325	325
query74	9989	9972	9774	9774
query75	2842	2782	2441	2441
query76	2230	1045	678	678
query77	380	403	306	306
query78	11305	11437	10805	10805
query79	1069	790	633	633
query80	692	614	571	571
query81	507	274	247	247
query82	1265	153	113	113
query83	344	275	241	241
query84	295	125	106	106
query85	1029	496	433	433
query86	399	326	329	326
query87	3120	3108	2970	2970
query88	3606	2678	2636	2636
query89	421	372	347	347
query90	1645	178	172	172
query91	163	157	139	139
query92	83	77	75	75
query93	942	862	504	504
query94	467	310	313	310
query95	595	334	313	313
query96	649	531	239	239
query97	2513	2488	2410	2410
query98	223	214	216	214
query99	1011	990	933	933
Total cold run time: 246310 ms
Total hot run time: 182830 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 33.33% (3/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.50% (19547/37230)
Line Coverage 36.11% (182291/504763)
Region Coverage 32.49% (141584/435789)
Branch Coverage 33.44% (61353/183477)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (26088/36488)
Line Coverage 54.24% (273092/503523)
Region Coverage 51.60% (227114/440173)
Branch Coverage 52.99% (97605/184181)

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 63518c125939d213e51374875c00ac64fcabcf1b, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17588	4438	4282	4282
q2	q3	10660	765	519	519
q4	4677	366	248	248
q5	7537	1200	1017	1017
q6	167	171	145	145
q7	775	843	669	669
q8	9286	1470	1273	1273
q9	4808	4657	4713	4657
q10	6818	1845	1661	1661
q11	487	255	241	241
q12	712	562	462	462
q13	17805	4240	3417	3417
q14	235	229	213	213
q15	928	790	799	790
q16	753	726	678	678
q17	726	850	398	398
q18	5972	5324	5330	5324
q19	1132	967	611	611
q20	508	486	387	387
q21	4900	1982	1538	1538
q22	385	340	286	286
Total cold run time: 96859 ms
Total hot run time: 28816 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4621	4553	4545	4545
q2	q3	1795	2223	1756	1756
q4	943	1185	743	743
q5	4089	4476	4318	4318
q6	182	177	148	148
q7	1777	1691	1558	1558
q8	2483	2675	2539	2539
q9	7473	7592	7303	7303
q10	2627	2835	2393	2393
q11	506	478	502	478
q12	503	585	449	449
q13	3959	4641	3587	3587
q14	279	306	275	275
q15	891	811	823	811
q16	714	799	751	751
q17	1228	1547	1341	1341
q18	7222	6755	6706	6706
q19	828	821	908	821
q20	2080	2150	1994	1994
q21	3944	3433	3389	3389
q22	475	483	403	403
Total cold run time: 48619 ms
Total hot run time: 46308 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 183559 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 63518c125939d213e51374875c00ac64fcabcf1b, data reload: false

query5	4338	628	516	516
query6	329	229	232	229
query7	4260	484	281	281
query8	341	246	241	241
query9	8707	2745	2741	2741
query10	525	374	324	324
query11	16997	17349	17232	17232
query12	200	137	130	130
query13	1594	480	376	376
query14	6407	3332	2994	2994
query14_1	3089	2864	2876	2864
query15	223	202	180	180
query16	1112	478	459	459
query17	1157	784	608	608
query18	2804	450	362	362
query19	206	213	195	195
query20	146	137	137	137
query21	214	142	120	120
query22	5074	5031	4940	4940
query23	17199	16898	16614	16614
query23_1	16736	16794	16777	16777
query24	7155	1614	1218	1218
query24_1	1187	1252	1229	1229
query25	547	454	389	389
query26	1231	254	151	151
query27	2788	462	287	287
query28	4490	1830	1844	1830
query29	772	552	476	476
query30	307	251	209	209
query31	886	722	650	650
query32	82	73	68	68
query33	512	338	294	294
query34	923	893	569	569
query35	625	673	602	602
query36	1094	1139	982	982
query37	127	88	82	82
query38	2927	2925	2924	2924
query39	889	909	856	856
query39_1	816	807	854	807
query40	227	151	134	134
query41	62	65	58	58
query42	110	99	102	99
query43	369	374	360	360
query44	
query45	200	187	181	181
query46	867	982	598	598
query47	2141	2163	2062	2062
query48	303	325	229	229
query49	624	465	398	398
query50	672	286	211	211
query51	4109	4120	4041	4041
query52	104	109	94	94
query53	287	334	278	278
query54	294	276	254	254
query55	96	83	81	81
query56	304	299	295	295
query57	1359	1348	1301	1301
query58	280	274	273	273
query59	2595	2673	2532	2532
query60	340	330	335	330
query61	152	141	146	141
query62	611	595	540	540
query63	305	272	277	272
query64	4907	1301	1095	1095
query65	
query66	1468	461	368	368
query67	16259	16364	16237	16237
query68	
query69	408	305	287	287
query70	981	978	905	905
query71	332	310	296	296
query72	2911	2836	2370	2370
query73	541	538	321	321
query74	9892	9954	9711	9711
query75	2829	2730	2447	2447
query76	2287	1024	667	667
query77	359	392	312	312
query78	11248	11429	10742	10742
query79	1166	745	591	591
query80	1292	622	517	517
query81	558	273	248	248
query82	982	151	115	115
query83	335	257	234	234
query84	253	119	104	104
query85	880	461	428	428
query86	411	302	289	289
query87	3104	3107	2969	2969
query88	3517	2669	2628	2628
query89	421	360	344	344
query90	1995	171	164	164
query91	162	155	131	131
query92	75	79	72	72
query93	984	860	506	506
query94	642	326	277	277
query95	590	338	324	324
query96	644	516	225	225
query97	2467	2551	2439	2439
query98	233	220	221	220
query99	1002	966	885	885
Total cold run time: 252774 ms
Total hot run time: 183559 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 40.00% (4/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.51% (19548/37230)
Line Coverage 36.11% (182272/504764)
Region Coverage 32.45% (141404/435795)
Branch Coverage 33.42% (61315/183483)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 40.00% (4/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.14% (20851/36488)
Line Coverage 40.20% (202433/503524)
Region Coverage 36.85% (162192/440179)
Branch Coverage 37.54% (69139/184187)

… multi-index columns

When a column has multiple inverted indexes with different analyzers
(e.g., one default untokenized and one with English parser), search()
in Lucene/scalar mode returned empty results because BE always passed
an empty analyzer key to select_best_reader(), causing it to pick the
wrong (untokenized) index for tokenized queries.

The fix:
1. Extract analyzer_key from FE-provided index_properties before
   calling select_best_reader() and pass it through.
2. Remove the is_variant_sub restriction on the EQUAL_QUERY to
   MATCH_ANY_QUERY upgrade, so regular columns with multiple indexes
   also get the correct FULLTEXT reader.

Fixes DORIS-24542
…x reader selection

The previous fix unconditionally upgraded all EQUAL_QUERY to MATCH_ANY_QUERY
in resolve(), which broke EXACT queries (they also map to EQUAL_QUERY but need
the untokenized STRING_TYPE reader).

Move the fix to build_leaf_query() where clause_type is known:
- TERM → override to MATCH_ANY_QUERY (selects FULLTEXT/tokenized reader)
- EXACT → keep EQUAL_QUERY (selects STRING_TYPE/untokenized reader)

For variant subcolumns, resolve() still uses FE-provided analyzer_key.
For regular columns with multiple indexes, query_type alone drives the
reader type preference in select_best_reader's select_for_text().

Fixes: DORIS-24542
…multi-index reader selection

When a column has both tokenized and untokenized indexes, WILDCARD/PREFIX/REGEXP
queries selected the untokenized reader, causing patterns like "h*llo" to match
against full strings ("hello world") instead of individual tokens ("hello").

Extend the MATCH_ANY_QUERY override (already applied to TERM) to also cover
WILDCARD, PREFIX, and REGEXP clause types. Safe for single-index columns due to
select_best_reader()'s single-reader fast path.

Add untokenized-only index regression tests to verify no behavior change.
@airborne12 airborne12 force-pushed the fix-DORIS-24542-multi-analyzer-search-empty branch from 63518c1 to 9b3764b Compare February 24, 2026 16:00
@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 28975 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9b3764bb7592f841d3ef02e079f03988714bb2f9, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17613	4661	4339	4339
q2	q3	10645	821	537	537
q4	4678	360	254	254
q5	7569	1210	1006	1006
q6	174	175	145	145
q7	811	862	666	666
q8	9297	1500	1341	1341
q9	4916	4808	4692	4692
q10	6763	1887	1672	1672
q11	462	256	256	256
q12	697	570	459	459
q13	17768	4254	3429	3429
q14	231	227	215	215
q15	935	797	798	797
q16	755	724	675	675
q17	758	849	439	439
q18	6327	5341	5287	5287
q19	1256	987	640	640
q20	518	488	397	397
q21	4981	2022	1458	1458
q22	418	358	271	271
Total cold run time: 97572 ms
Total hot run time: 28975 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4708	4585	4536	4536
q2	q3	1808	2242	1753	1753
q4	878	1185	766	766
q5	4047	4386	4332	4332
q6	182	174	150	150
q7	1817	1620	1517	1517
q8	2476	2825	2627	2627
q9	7465	7383	7318	7318
q10	2644	2829	2411	2411
q11	517	425	410	410
q12	492	586	445	445
q13	3922	4543	3608	3608
q14	293	310	280	280
q15	878	806	810	806
q16	762	758	698	698
q17	1208	1552	1325	1325
q18	6973	6805	6641	6641
q19	926	957	913	913
q20	2095	2292	1996	1996
q21	3964	3457	3334	3334
q22	512	477	419	419
Total cold run time: 48567 ms
Total hot run time: 46285 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 183936 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9b3764bb7592f841d3ef02e079f03988714bb2f9, data reload: false

query5	4335	632	519	519
query6	335	232	205	205
query7	4219	473	266	266
query8	332	252	242	242
query9	8742	2734	2725	2725
query10	524	372	343	343
query11	17010	17396	17259	17259
query12	206	146	137	137
query13	1332	480	363	363
query14	6792	3438	3169	3169
query14_1	2913	2913	2817	2817
query15	206	200	191	191
query16	1026	525	402	402
query17	1118	754	640	640
query18	2882	462	352	352
query19	222	223	183	183
query20	134	128	133	128
query21	223	137	117	117
query22	4861	5153	5034	5034
query23	17259	16809	16584	16584
query23_1	16791	16780	16727	16727
query24	7237	1616	1241	1241
query24_1	1230	1239	1229	1229
query25	567	490	472	472
query26	1234	256	150	150
query27	2771	465	286	286
query28	4513	1827	1834	1827
query29	790	558	461	461
query30	315	241	209	209
query31	905	735	649	649
query32	81	70	71	70
query33	532	345	278	278
query34	919	898	565	565
query35	638	692	590	590
query36	1112	1115	971	971
query37	136	101	80	80
query38	3012	2939	2884	2884
query39	888	894	844	844
query39_1	830	828	829	828
query40	229	153	139	139
query41	65	61	60	60
query42	106	105	103	103
query43	362	379	348	348
query44	
query45	198	191	186	186
query46	884	989	606	606
query47	2122	2124	2094	2094
query48	307	310	230	230
query49	617	449	393	393
query50	674	278	216	216
query51	4111	4075	4015	4015
query52	106	107	97	97
query53	284	331	278	278
query54	299	283	260	260
query55	89	81	85	81
query56	301	318	294	294
query57	1329	1342	1289	1289
query58	283	312	271	271
query59	2585	2638	2483	2483
query60	341	332	327	327
query61	149	140	145	140
query62	617	586	536	536
query63	307	280	278	278
query64	4850	1298	984	984
query65	
query66	1403	459	361	361
query67	16391	16509	16219	16219
query68	
query69	413	309	287	287
query70	948	985	995	985
query71	327	314	298	298
query72	2770	2860	2398	2398
query73	536	542	332	332
query74	9994	9931	9799	9799
query75	2840	2749	2445	2445
query76	2289	1036	678	678
query77	366	401	294	294
query78	11288	11286	10731	10731
query79	3028	790	611	611
query80	1761	633	546	546
query81	599	280	240	240
query82	986	152	112	112
query83	327	266	238	238
query84	260	125	96	96
query85	908	501	430	430
query86	489	290	292	290
query87	3094	3135	2965	2965
query88	3584	2661	2632	2632
query89	421	379	341	341
query90	2039	169	166	166
query91	159	157	138	138
query92	85	75	71	71
query93	1777	820	497	497
query94	629	318	304	304
query95	582	340	321	321
query96	646	508	226	226
query97	2468	2479	2426	2426
query98	246	227	220	220
query99	1014	993	938	938
Total cold run time: 256446 ms
Total hot run time: 183936 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 40.00% (4/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.53% (19573/37263)
Line Coverage 36.13% (182517/505164)
Region Coverage 32.48% (141646/436153)
Branch Coverage 33.43% (61390/183652)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (10/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.48% (26089/36496)
Line Coverage 54.22% (273094/503664)
Region Coverage 51.52% (226837/440321)
Branch Coverage 52.98% (97612/184238)

Copy link
Copy Markdown
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Copy Markdown
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit eff59f4 into apache:master Feb 25, 2026
30 of 33 checks passed
@airborne12 airborne12 deleted the fix-DORIS-24542-multi-analyzer-search-empty branch February 25, 2026 10:00
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Mar 3, 2026
… search() (apache#60798)

### What problem does this PR solve?

Issue Number: close #DORIS-24542

Problem Summary:
When a column has multiple inverted indexes with different analyzers
(e.g., one default untokenized index and one with English parser),
`search()` in Lucene/scalar mode returns empty results.

**Root cause:** In `FieldReaderResolver::resolve()`,
`select_best_reader()` was always called with an empty analyzer key
`""`, causing it to pick the wrong (untokenized) index for tokenized
queries. Additionally, the EQUAL_QUERY → MATCH_ANY_QUERY upgrade was
restricted to variant subcolumns only.

**Fix:**
1. Extract `analyzer_key` from FE-provided `index_properties` before
calling `select_best_reader()` and pass it through
2. Remove the `is_variant_sub` restriction on the query type upgrade so
regular columns with multiple indexes also get the correct FULLTEXT
reader
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Mar 4, 2026
…o branch-4.0

Squashed backport of the following master PRs:

- apache#59747 [fix](search) Make AND/OR/NOT operators case-sensitive in search DSL
- apache#60654 [refactor](search) Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues
- apache#60782 [fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes
- apache#60784 [fix](search) Fix MATCH_ALL_DOCS query failing in multi-field search mode
- apache#60786 [feat](search) Support field-grouped query syntax field:(term1 OR term2)
- apache#60790 [fix](search) Add searcher cache reuse and DSL result cache for search() function
- apache#60793 [fix](search) Fix wildcard query on variant subcolumns returning empty results
- apache#60798 [fix](search) Use FE-provided analyzer key for multi-index columns in search()
- apache#60814 [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode
- apache#60834 [test](search) Add regression test for wildcard query on variant subcolumns with multi-index
- apache#60873 [fix](search) fix MATCH_ALL_DOCS losing occur attribute in multi-field expansion
- apache#60891 [fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Mar 4, 2026
…o branch-4.0

Squashed backport of the following master PRs:

- apache#59747 [fix](search) Make AND/OR/NOT operators case-sensitive in search DSL
- apache#60654 [refactor](search) Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues
- apache#60782 [fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes
- apache#60784 [fix](search) Fix MATCH_ALL_DOCS query failing in multi-field search mode
- apache#60786 [feat](search) Support field-grouped query syntax field:(term1 OR term2)
- apache#60790 [fix](search) Add searcher cache reuse and DSL result cache for search() function
- apache#60793 [fix](search) Fix wildcard query on variant subcolumns returning empty results
- apache#60798 [fix](search) Use FE-provided analyzer key for multi-index columns in search()
- apache#60814 [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode
- apache#60834 [test](search) Add regression test for wildcard query on variant subcolumns with multi-index
- apache#60873 [fix](search) fix MATCH_ALL_DOCS losing occur attribute in multi-field expansion
- apache#60891 [fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode
yiguolei pushed a commit that referenced this pull request Mar 4, 2026
… bug fixes (#61028)

### What problem does this PR solve?

Squashed backport of all search() function improvements and bug fixes
from master to branch-4.0.

This PR combines the following master PRs into a single backport:

| Master PR | Type | Description |
|-----------|------|-------------|
| #59747 | fix | Make AND/OR/NOT operators case-sensitive in search DSL
|
| #60654 | refactor | Refactor SearchDslParser to single-phase ANTLR
parsing and fix ES compatibility issues |
| #60782 | fix | Upgrade query type for variant subcolumns with
analyzer-based indexes |
| #60784 | fix | Fix MATCH_ALL_DOCS query failing in multi-field search
mode |
| #60786 | feat | Support field-grouped query syntax field:(term1 OR
term2) |
| #60790 | fix | Add searcher cache reuse and DSL result cache for
search() function |
| #60793 | fix | Fix wildcard query on variant subcolumns returning
empty results |
| #60798 | fix | Use FE-provided analyzer key for multi-index columns in
search() |
| #60814 | fix | Fix implicit conjunction incorrectly modifying
preceding term in lucene mode |
| #60834 | test | Add regression test for wildcard query on variant
subcolumns with multi-index |
| #60873 | fix | fix MATCH_ALL_DOCS losing occur attribute in
multi-field expansion |
| #60891 | fix | inject MATCH_ALL_DOCS for multi-MUST_NOT queries in
lucene mode |

### Release note

Backport search() function improvements including DSL parser
refactoring, multi-field search fixes, variant subcolumn support, query
caching, and field-grouped query syntax.

### Check List (For Author)

- Test
    - [x] Regression test
    - [x] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [x] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. New search() function features and bug fixes backported from
master.

- Does this need documentation?
    - [x] No.
    - [ ] Yes.

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants