Skip to content

branch-4.1: [fix](iceberg) Avoid dict reads on mixed-encoding position delete files #61759#62036

Closed
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61759-branch-4.1
Closed

branch-4.1: [fix](iceberg) Avoid dict reads on mixed-encoding position delete files #61759#62036
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61759-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 2, 2026

Cherry-picked from #61759

@github-actions github-actions bot requested a review from yiguolei as a code owner April 2, 2026 06:00
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Apr 2, 2026
@dataroaring dataroaring reopened this Apr 2, 2026
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 2, 2026

run buildall

…es (#61759)

### What problem does this PR solve?

Iceberg parquet position delete files currently treat the `file_path`
column as dictionary-coded as long as the column chunk has a dictionary
page. That check is too loose: parquet allows mixed encodings in the
same column chunk, so a chunk can contain both dictionary-encoded and
plain-encoded data pages.

When that happens, Doris builds a `ColumnDictI32` for `file_path`, but
the plain decoder later calls `insert_many_strings()`, which fails with:

`Method insert_many_strings is not supported for ColumnDictionary`

This PR fixes the issue by only using dictionary-backed decoding for
Iceberg position delete `file_path` columns when the entire parquet
column chunk is fully dictionary encoded. Mixed-encoding chunks now fall
back to normal string columns.

It also adds BE unit coverage for:
- fully dictionary-encoded parquet metadata
- mixed dictionary/plain parquet metadata
- parquet metadata without `encoding_stats` but with non-dictionary
encodings
@morningman morningman force-pushed the auto-pick-61759-branch-4.1 branch from 231f6ca to 45e59b3 Compare April 4, 2026 00:55
@morningman
Copy link
Copy Markdown
Contributor

run buildall

suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Apr 6, 2026
### What problem does this PR solve?\n\nIssue Number: None\nRelated PR: apache#62036\nProblem Summary: The auto-pick branch added BE tests that used the removed RuntimeState(TQueryOptions, TQueryGlobals) constructor, causing BE UT compilation to fail. This patch keeps the mixed-encoding position delete fix and adjusts the test to the current RuntimeState constructor.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Not run (per request)\n    - No need to test\n- Behavior changed: No\n- Does this need documentation: No
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Apr 6, 2026
### What problem does this PR solve?\n\nIssue Number: None\nRelated PR: apache#62036\nProblem Summary: The cherry-picked Iceberg mixed-encoding position delete fix included a BE unit test that still used the removed RuntimeState(TQueryOptions, TQueryGlobals) constructor. This follow-up updates the test to the current RuntimeState constructor.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Not run (per request)\n    - No need to test\n- Behavior changed: No\n- Does this need documentation: No
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Apr 6, 2026
### What problem does this PR solve?\n\nIssue Number: None\nRelated PR: apache#62036\nProblem Summary: The cherry-picked Iceberg mixed-encoding fix still had a BE unit test that used the removed RuntimeState(TQueryOptions, TQueryGlobals) constructor. This follow-up updates the test to the current RuntimeState constructor.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Not run (per request)\n    - No need to test\n- Behavior changed: No\n- Does this need documentation: No
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Apr 7, 2026
### What problem does this PR solve?\n\nIssue Number: None\nRelated PR: apache#62036\nProblem Summary: The Iceberg reader test used a parenthesized RuntimeState initialization that was parsed as a function declaration, causing BE test compilation to fail. This follow-up uses brace initialization so the test constructs a RuntimeState object correctly.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Not run (per request)\n    - No need to test\n- Behavior changed: No\n- Does this need documentation: No
@morningman morningman closed this Apr 7, 2026
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Apr 7, 2026
### What problem does this PR solve?\n\nIssue Number: None\nRelated PR: apache#62036\nProblem Summary: The replacement branch for the Iceberg mixed-encoding delete-file fix was missing the parquet test data file from the original master commit, causing the new BE unit tests to fail with file-not-found errors. This follow-up restores the original test asset.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Not run (per request)\n    - No need to test\n- Behavior changed: No\n- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants