Reduce per-file metadata overhead for wide-schema parquet scans#22829
Reduce per-file metadata overhead for wide-schema parquet scans#22829adriangb wants to merge 4 commits into
Conversation
`DefaultFilesMetadataCache` recomputed `FileMetadata::memory_size()` — which walks the entire metadata structure — on every put, eviction, and remove. Store the size alongside each entry (`SizedCacheEntry`) and compute it once at insertion. For wide-schema files (large metadata) this removes repeated structural walks from the cache hot path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`apply_file_schema_type_coercions` always built a HashMap of every table field before checking whether any view/string coercion was actually required, then discarded it on the common no-op early return. Do a cheap flag-only first pass and only build the lookup map when a transformation is needed. Saves an allocation proportional to schema width per file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
run benchmark wide_schema |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adrian/wide-schema-metadata-overhead (7245a4c) to 883c38e (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmark wide_schema |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adrian/wide-schema-metadata-overhead (d3a8010) to 883c38e (merge-base) diff using: wide_schema File an issue against this benchmark runner |
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
|
run benchmark wide_schema |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema File an issue against this benchmark runner |
|
run benchmark wide_schema |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
Which issue does this PR close?
Part of the wide-schema parquet read performance work in #21968.
Rationale for this change
Scanning parquet datasets with very wide schemas (hundreds/thousands of
columns) pays per-file CPU costs that scale with schema width even when a
query touches only a handful of columns. Two of those costs are pure
DataFusion-side overhead with no dependency on arrow-rs, so they can land
independently of the larger arrow-metadata caching work tracked in #21968
/ #21987:
DefaultFilesMetadataCacherecomputedFileMetadata::memory_size()—which walks the entire metadata structure — on every put, eviction,
and remove. For wide files the metadata is large, so this structural
walk on the cache hot path is significant.
apply_file_schema_type_coercionsalways built aHashMapof everytable field up front, even on the common path where no view/string
coercion is needed and the map is immediately discarded.
What changes are included in this PR?
Two small, independent commits:
memory_sizeinDefaultFilesMetadataCache. Store eachentry's size alongside it (
SizedCacheEntry), computed once atinsertion, so put/evict/remove no longer re-walk the metadata.
flag-only first pass over the table fields and only build the
name→type
HashMapwhen a transformation is actually required.Are these changes tested?
Covered by existing tests —
datafusion-executioncache tests and theschema_coerciontests indatafusion-datasource-parquetpass. Thechanges preserve existing behavior; they only remove redundant work.
Are there any user-facing changes?
No.
SizedCacheEntryis an internal cache type; no public API changes.