[Feature](iceberg) Support schema change for complex types in Iceberg external tables#60169
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR adds support for safe schema evolution of complex types (STRUCT/ARRAY/MAP) in Iceberg external tables, primarily for ALTER TABLE ... MODIFY COLUMN on nested fields. It introduces per-type validation and update logic to enforce safe promotions, nullability constraints, and struct field append rules, and extends the generic complex-type compatibility checks to allow certain nested primitive promotions.
Changes:
- Refactor
IcebergMetadataOps.modifyColumnto support complex-type columns: add complex-type validation (validateForModifyComplexColumn), recursive application of schema changes (applyComplexTypeChange,applyStructChange,applyListChange,applyMapChange), and tighter nullability/default constraints for complex columns. - Extend
ColumnType.checkSupportSchemaChangeForComplexTypevia a new helpercheckSupportSchemaChangeForNestedPrimitiveto allow safe nested primitive promotions (numeric widening andVARCHARlength increase) in complex types while still prohibiting incompatible changes. - Keep the primitive-type modify path for Iceberg external tables, factoring out validation logic to work with
NestedFieldinstead ofTableand preserving existing behaviors for primitive columns.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergMetadataOps.java |
Adds complex-type aware modifyColumn handling for Iceberg external tables, including validation of complex schema changes and recursive application of nested type and nullability updates to the Iceberg schema. |
fe/fe-core/src/main/java/org/apache/doris/catalog/ColumnType.java |
Enhances complex-type schema-change compatibility checks to allow specific safe nested primitive promotions (numeric widening and VARCHAR growth) and wires these into existing complex-type validation logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| org.apache.doris.catalog.Type newDorisType) throws UserException { | ||
| switch (oldIcebergType.typeId()) { | ||
| case STRUCT: | ||
| applyStructChange(updateSchema, path, oldIcebergType.asStructType(), (StructType) newDorisType); | ||
| break; | ||
| case LIST: | ||
| applyListChange(updateSchema, path, (Types.ListType) oldIcebergType, (ArrayType) newDorisType); | ||
| break; | ||
| case MAP: |
There was a problem hiding this comment.
applyComplexTypeChange decides how to cast newDorisType based on oldIcebergType.typeId(), but there is no validation that the new type has the same complex kind (STRUCT/ARRAY/MAP) as the existing column. For example, altering a STRUCT column to ARRAY (or a primitive column to a complex type) will pass validateForModifyComplexColumn but then hit a ClassCastException when casting newDorisType to StructType/ArrayType/MapType, resulting in an internal error instead of a user-facing validation error. It would be safer to explicitly check in validateForModifyComplexColumn (or here) that the old and new Doris types are both complex and of the same kind, and throw a UserException when they differ, before calling this method.
| org.apache.doris.catalog.Type newDorisType) throws UserException { | |
| switch (oldIcebergType.typeId()) { | |
| case STRUCT: | |
| applyStructChange(updateSchema, path, oldIcebergType.asStructType(), (StructType) newDorisType); | |
| break; | |
| case LIST: | |
| applyListChange(updateSchema, path, (Types.ListType) oldIcebergType, (ArrayType) newDorisType); | |
| break; | |
| case MAP: | |
| org.apache.doris.catalog.Type newDorisType) throws UserException { | |
| // Ensure the new Doris type is complex before performing any casts based on the old Iceberg type. | |
| if (!newDorisType.isComplexType()) { | |
| throw new UserException("Modify column type to non-complex type is not supported: " + newDorisType); | |
| } | |
| switch (oldIcebergType.typeId()) { | |
| case STRUCT: | |
| if (!(newDorisType instanceof StructType)) { | |
| throw new UserException("Cannot modify STRUCT column to type: " + newDorisType); | |
| } | |
| applyStructChange(updateSchema, path, oldIcebergType.asStructType(), (StructType) newDorisType); | |
| break; | |
| case LIST: | |
| if (!(newDorisType instanceof ArrayType)) { | |
| throw new UserException("Cannot modify LIST/ARRAY column to type: " + newDorisType); | |
| } | |
| applyListChange(updateSchema, path, (Types.ListType) oldIcebergType, (ArrayType) newDorisType); | |
| break; | |
| case MAP: | |
| if (!(newDorisType instanceof MapType)) { | |
| throw new UserException("Cannot modify MAP column to type: " + newDorisType); | |
| } |
| if (column.getType().isComplexType()) { | ||
| // Complex type processing branch | ||
| validateForModifyComplexColumn(column, currentCol); | ||
| applyComplexTypeChange(updateSchema, column.getName(), currentCol.type(), column.getType()); | ||
| if (column.isAllowNull()) { | ||
| updateSchema.makeColumnOptional(column.getName()); | ||
| } | ||
| if (!Objects.equals(currentCol.doc(), column.getComment())) { | ||
| updateSchema.updateColumnDoc(column.getName(), column.getComment()); | ||
| } | ||
| } else { |
There was a problem hiding this comment.
The new complex-type schema evolution logic in modifyColumn (STRUCT/ARRAY/MAP handling, nested promotions, nullability checks, etc.) isn’t covered by any tests in IcebergMetadataOpTest or other Iceberg tests, so regressions here would be hard to catch. Given that the rest of the Iceberg integration has unit tests, please consider adding tests that exercise successful and failing ALTER TABLE ... MODIFY COLUMN cases for complex types (e.g., struct field append, nested array element promotions, map value promotions, illegal field renames).
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 32175 ms |
ClickBench: Total hot run time: 28.5 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 30899 ms |
ClickBench: Total hot run time: 28.24 s |
FE Regression Coverage ReportIncrement line coverage |
3 similar comments
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
morningman
left a comment
There was a problem hiding this comment.
- Add UT to cover those validation methods.
- For regression test, need to insert new data after schema change, and check if both old and new data can read correctly.
|
run buildall |
TPC-H: Total hot run time: 30461 ms |
TPC-DS: Total hot run time: 190512 ms |
ClickBench: Total hot run time: 28.26 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
4cecf61 to
224005d
Compare
|
run buildall |
TPC-H: Total hot run time: 28922 ms |
TPC-DS: Total hot run time: 183872 ms |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Currently, Doris does not support
ALTER TABLE ... MODIFY COLUMNfor complex type columns (STRUCT/ARRAY/MAP) in Iceberg external tables. This PR adds support for safe schema evolution of complex types in Iceberg tables.Supported operations:
Safe type promotions supported in nested types:
INT→BIGINT,LARGEINTTINYINT→SMALLINT,INT,BIGINT,LARGEINTSMALLINT→INT,BIGINT,LARGEINTBIGINT→LARGEINTFLOAT→DOUBLEVARCHAR(n)→VARCHAR(m)where m > nConstraints:
Release note
Support ALTER TABLE MODIFY COLUMN for complex types (STRUCT/ARRAY/MAP) in Iceberg external tables, including appending struct fields and safe type promotions.
Check List (For Author)
Manual test steps: