Reduce per-file metadata overhead for wide-schema parquet scans by adriangb · Pull Request #22829 · apache/datafusion

adriangb · 2026-06-08T19:22:37Z

Which issue does this PR close?

Part of the wide-schema parquet read performance work in #21968.

Part of Add benchmarks for queries against wide schema parquet files #21968.

Rationale for this change

Scanning parquet datasets with very wide schemas (hundreds/thousands of
columns) pays per-file CPU costs that scale with schema width even when a
query touches only a handful of columns. Two of those costs are pure
DataFusion-side overhead with no dependency on arrow-rs, so they can land
independently of the larger arrow-metadata caching work tracked in #21968
/ #21987:

DefaultFilesMetadataCache recomputed FileMetadata::memory_size() —
which walks the entire metadata structure — on every put, eviction,
and remove. For wide files the metadata is large, so this structural
walk on the cache hot path is significant.
apply_file_schema_type_coercions always built a HashMap of every
table field up front, even on the common path where no view/string
coercion is needed and the map is immediately discarded.

What changes are included in this PR?

Two small, independent commits:

Cache entry memory_size in DefaultFilesMetadataCache. Store each
entry's size alongside it (SizedCacheEntry), computed once at
insertion, so put/evict/remove no longer re-walk the metadata.
Skip the coercion lookup map when no coercion is needed. Do a cheap
flag-only first pass over the table fields and only build the
name→type HashMap when a transformation is actually required.

Are these changes tested?

Covered by existing tests — datafusion-execution cache tests and the
schema_coercion tests in datafusion-datasource-parquet pass. The
changes preserve existing behavior; they only remove redundant work.

Are there any user-facing changes?

No. SizedCacheEntry is an internal cache type; no public API changes.

`DefaultFilesMetadataCache` recomputed `FileMetadata::memory_size()` — which walks the entire metadata structure — on every put, eviction, and remove. Store the size alongside each entry (`SizedCacheEntry`) and compute it once at insertion. For wide-schema files (large metadata) this removes repeated structural walks from the cache hot path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`apply_file_schema_type_coercions` always built a HashMap of every table field before checking whether any view/string coercion was actually required, then discarded it on the common no-op early return. Do a cheap flag-only first pass and only build the lookup map when a transformation is needed. Saves an allocation proportional to schema width per file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

adriangb · 2026-06-08T19:31:31Z

run benchmark wide_schema

adriangbot · 2026-06-08T19:34:21Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4652688287-486-sw8pz 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (7245a4c) to 883c38e (merge-base) diff
BENCH_NAME=wide_schema
BENCH_COMMAND=cargo bench --features=parquet --bench wide_schema
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-08T19:34:26Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    unions_to_filter
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

adriangb · 2026-06-08T20:02:50Z

run benchmark wide_schema

adriangbot · 2026-06-08T20:05:15Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4652969993-490-s72qq 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (d3a8010) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

adriangb · 2026-06-08T22:51:16Z

run benchmark wide_schema

adriangbot · 2026-06-08T22:54:34Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4654257875-493-7jkxl 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangb · 2026-06-08T23:42:06Z

run benchmark wide_schema

adriangbot · 2026-06-08T23:44:59Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4654541099-494-hglqf 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adrian/wide-schema-metadata-overhead (fc04273) to 883c38e (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-09T00:15:30Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

   Compiling datafusion-pruning v53.1.0 (/workspace/datafusion-base/datafusion/pruning)
   Compiling datafusion-datasource-json v53.1.0 (/workspace/datafusion-base/datafusion/datasource-json)
   Compiling datafusion-datasource-arrow v53.1.0 (/workspace/datafusion-base/datafusion/datasource-arrow)
   Compiling datafusion-datasource-csv v53.1.0 (/workspace/datafusion-base/datafusion/datasource-csv)
   Compiling datafusion-datasource-parquet v53.1.0 (/workspace/datafusion-base/datafusion/datasource-parquet)
   Compiling datafusion-physical-optimizer v53.1.0 (/workspace/datafusion-base/datafusion/physical-optimizer)
   Compiling datafusion-sql v53.1.0 (/workspace/datafusion-base/datafusion/sql)
   Compiling datafusion-functions-table v53.1.0 (/workspace/datafusion-base/datafusion/functions-table)
   Compiling datafusion-catalog-listing v53.1.0 (/workspace/datafusion-base/datafusion/catalog-listing)
   Compiling datafusion v53.1.0 (/workspace/datafusion-base/datafusion/core)
   Compiling datafusion-proto v53.1.0 (/workspace/datafusion-base/datafusion/proto)
   Compiling datafusion-benchmarks v53.1.0 (/workspace/datafusion-base/benchmarks)
    Finished `bench` profile [optimized] target(s) in 13m 43s
     Running benches/sql.rs (/workspace/datafusion-base/target/release/deps/sql-eec171b6eb33866e)
Gnuplot not found, using plotters backend

thread 'main' (27507) panicked at benchmarks/benches/sql.rs:125:22:
initialization failed: Plan("No files found at file:///workspace/datafusion-base/benchmarks/data/wide_schema/wide/. Cannot infer schema from an empty location; either add data files or declare an explicit schema for the table.")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `--bench sql`

File an issue against this benchmark runner

adriangb and others added 2 commits June 8, 2026 14:20

github-actions Bot added execution Related to the execution crate datasource Changes to the datasource crate labels Jun 8, 2026

This was referenced Jun 8, 2026

parquet: cache per-file ArrowReaderMetadata for wide-schema scans #22830

Draft

Add benchmarks for queries against wide schema parquet files #21968

Open

[do-not-merge] Wide-schema parquet read perf — visibility for #21968 #21987

Draft

This was referenced Jun 8, 2026

Add wide_schema to standard benchmark allowlist alamb/datafusion-benchmarking#15

Closed

Add wide_schema to standard benchmark allowlist adriangb/datafusion-benchmarking#10

Merged

martin-g reviewed Jun 8, 2026

View reviewed changes

Comment thread datafusion/execution/src/cache/file_metadata_cache.rs

Fix link in docstring

d3a8010

martin-g reviewed Jun 8, 2026

View reviewed changes

Comment thread datafusion/execution/src/cache/file_metadata_cache.rs

Comment thread datafusion/execution/src/cache/file_metadata_cache.rs Outdated

Apply suggestions from code review

fc04273

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce per-file metadata overhead for wide-schema parquet scans#22829

Reduce per-file metadata overhead for wide-schema parquet scans#22829
adriangb wants to merge 4 commits into
apache:mainfrom
pydantic:adrian/wide-schema-metadata-overhead

adriangb commented Jun 8, 2026

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adriangb commented Jun 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangb commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 8, 2026

Uh oh!

adriangbot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants