Skip to content

Reduce per-file metadata overhead for wide-schema parquet scans#22829

Open
adriangb wants to merge 4 commits into
apache:mainfrom
pydantic:adrian/wide-schema-metadata-overhead
Open

Reduce per-file metadata overhead for wide-schema parquet scans#22829
adriangb wants to merge 4 commits into
apache:mainfrom
pydantic:adrian/wide-schema-metadata-overhead

Conversation

@adriangb

@adriangb adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of the wide-schema parquet read performance work in #21968.

Rationale for this change

Scanning parquet datasets with very wide schemas (hundreds/thousands of
columns) pays per-file CPU costs that scale with schema width even when a
query touches only a handful of columns. Two of those costs are pure
DataFusion-side overhead with no dependency on arrow-rs, so they can land
independently of the larger arrow-metadata caching work tracked in #21968
/ #21987:

  1. DefaultFilesMetadataCache recomputed FileMetadata::memory_size()
    which walks the entire metadata structure — on every put, eviction,
    and remove. For wide files the metadata is large, so this structural
    walk on the cache hot path is significant.
  2. apply_file_schema_type_coercions always built a HashMap of every
    table field up front, even on the common path where no view/string
    coercion is needed and the map is immediately discarded.

What changes are included in this PR?

Two small, independent commits:

  • Cache entry memory_size in DefaultFilesMetadataCache. Store each
    entry's size alongside it (SizedCacheEntry), computed once at
    insertion, so put/evict/remove no longer re-walk the metadata.
  • Skip the coercion lookup map when no coercion is needed. Do a cheap
    flag-only first pass over the table fields and only build the
    name→type HashMap when a transformation is actually required.

Are these changes tested?

Covered by existing tests — datafusion-execution cache tests and the
schema_coercion tests in datafusion-datasource-parquet pass. The
changes preserve existing behavior; they only remove redundant work.

Are there any user-facing changes?

No. SizedCacheEntry is an internal cache type; no public API changes.

adriangb and others added 2 commits June 8, 2026 14:20
`DefaultFilesMetadataCache` recomputed `FileMetadata::memory_size()` —
which walks the entire metadata structure — on every put, eviction, and
remove. Store the size alongside each entry (`SizedCacheEntry`) and
compute it once at insertion. For wide-schema files (large metadata)
this removes repeated structural walks from the cache hot path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`apply_file_schema_type_coercions` always built a HashMap of every table
field before checking whether any view/string coercion was actually
required, then discarded it on the common no-op early return. Do a cheap
flag-only first pass and only build the lookup map when a transformation
is needed. Saves an allocation proportional to schema width per file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark wide_schema

@adriangbot

Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4652688287-486-sw8pz 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (7245a4c) to 883c38e (merge-base) diff
BENCH_NAME=wide_schema
BENCH_COMMAND=cargo bench --features=parquet --bench wide_schema
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    unions_to_filter
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

Comment thread datafusion/execution/src/cache/file_metadata_cache.rs
@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark wide_schema

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4652969993-490-s72qq 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (d3a8010) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete


File an issue against this benchmark runner

Comment thread datafusion/execution/src/cache/file_metadata_cache.rs
Comment thread datafusion/execution/src/cache/file_metadata_cache.rs Outdated
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark wide_schema

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4654257875-493-7jkxl 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb

adriangb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark wide_schema

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4654541099-494-hglqf 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion-pruning v53.1.0 (/workspace/datafusion-base/datafusion/pruning)
   Compiling datafusion-datasource-json v53.1.0 (/workspace/datafusion-base/datafusion/datasource-json)
   Compiling datafusion-datasource-arrow v53.1.0 (/workspace/datafusion-base/datafusion/datasource-arrow)
   Compiling datafusion-datasource-csv v53.1.0 (/workspace/datafusion-base/datafusion/datasource-csv)
   Compiling datafusion-datasource-parquet v53.1.0 (/workspace/datafusion-base/datafusion/datasource-parquet)
   Compiling datafusion-physical-optimizer v53.1.0 (/workspace/datafusion-base/datafusion/physical-optimizer)
   Compiling datafusion-sql v53.1.0 (/workspace/datafusion-base/datafusion/sql)
   Compiling datafusion-functions-table v53.1.0 (/workspace/datafusion-base/datafusion/functions-table)
   Compiling datafusion-catalog-listing v53.1.0 (/workspace/datafusion-base/datafusion/catalog-listing)
   Compiling datafusion v53.1.0 (/workspace/datafusion-base/datafusion/core)
   Compiling datafusion-proto v53.1.0 (/workspace/datafusion-base/datafusion/proto)
   Compiling datafusion-benchmarks v53.1.0 (/workspace/datafusion-base/benchmarks)
    Finished `bench` profile [optimized] target(s) in 13m 43s
     Running benches/sql.rs (/workspace/datafusion-base/target/release/deps/sql-eec171b6eb33866e)
Gnuplot not found, using plotters backend

thread 'main' (27507) panicked at benchmarks/benches/sql.rs:125:22:
initialization failed: Plan("No files found at file:///workspace/datafusion-base/benchmarks/data/wide_schema/wide/. Cannot infer schema from an empty location; either add data files or declare an explicit schema for the table.")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `--bench sql`

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate execution Related to the execution crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants