Goal: improve MentisDB retrieval quality and scalability without breaking append-only semantics.
The four tracks are mostly parallelizable if we agree on shared interfaces first.
Do this first, then split work.
Define internal structs/enums, likely in src/search/ranked.rs or new src/search/pipeline.rs.
pub enum RetrievalRoute {
Lexical,
SemanticVector,
ExplicitGraph,
ImplicitGraph,
PprGraph,
PrfExpandedLexical,
SummaryHierarchy,
}
pub struct RouteScore {
pub route: RetrievalRoute,
pub score: f32,
}
pub struct QueryIntent {
pub temporal: bool,
pub entity_focused: bool,
pub agent_focused: bool,
pub causal: bool,
pub semantic: bool,
pub summary_or_global: bool,
}All new behavior should be gated behind fields in RankedSearchQuery / graph config first.
Example:
{
"graph": {
"mode": "bidirectional",
"algorithm": "bfs",
"max_depth": 2
}
}Later allowed values:
algorithm: "bfs" | "ppr"
Initial default: current behavior.
Create common integration fixtures for:
- explicit relations
- implicit auto edges
- temporal memories
- summaries with
Summarizes - entity-tagged thoughts
- vocabulary mismatch queries
Useful test file:
tests/search_pipeline_integration_tests.rs
Can run independently after Phase 0.
Replace or augment bounded BFS with Personalized PageRank over explicit + implicit graph edges.
src/search/graph.rssrc/search/expansion.rssrc/search/ranked.rstests/search_pipeline_integration_tests.rs
Build a weighted graph view:
| Edge Source | Weight |
|---|---|
explicit References |
1.0 |
explicit Supports |
1.1 |
explicit DerivedFrom |
1.2 |
explicit Corrects / Invalidates |
query-dependent |
| implicit cosine edge | cosine score, e.g. 0.85..1.0 |
| temporal adjacency | optional low weight, e.g. 0.15 |
Add a PPR function:
pub struct PprConfig {
pub damping: f32,
pub max_iters: usize,
pub tolerance: f32,
pub max_nodes: usize,
pub include_implicit_edges: bool,
}
pub struct PprResult {
pub scores: HashMap<ThoughtLocator, f32>,
pub seed_paths: HashMap<ThoughtLocator, Vec<GraphExpansionPath>>,
}Algorithm:
- Seed vector from lexical/vector top results.
- Expand graph neighborhood up to
max_nodes. - Run power iteration.
- Return ranked graph scores.
- Merge with existing ranked scoring via current RRF/score fields.
- PPR ranks a 2-hop relevant node above unrelated lexical match.
- Implicit cosine edges contribute to PPR.
- PPR respects
max_nodes. - PPR deterministic across runs.
- PPR disabled keeps exact old BFS behavior.
cargo test pprcargo test --test search_pipeline_integration_tests- Compare LoCoMo R@10 against BFS.
Commit:
feat(search): add personalized pagerank graph expansion
Can run independently after Phase 0.
Improve lexical recall by expanding queries using pseudo-relevance feedback from top lexical hits.
src/search/lexical.rssrc/search/ranked.rssrc/search/query_expansion.rsnewtests/search_query_expansion_tests.rs
Start with non-LLM PRF.
Pipeline:
- Run original lexical query.
- Take top
Nhits, e.g. 5. - Extract high-IDF candidate terms.
- Remove stopwords, original query terms, too-common terms.
- Weight terms using Rocchio-like formula.
- Run expanded lexical query.
- Fuse original + expanded route via RRF.
Config:
pub struct PrfConfig {
pub enabled: bool,
pub feedback_docs: usize,
pub expansion_terms: usize,
pub min_idf: f32,
pub original_weight: f32,
pub expansion_weight: f32,
}Expose later as:
{
"query_expansion": {
"mode": "none",
"feedback_docs": 5,
"expansion_terms": 8
}
}Later allowed values:
mode: "none" | "prf"
- Do not mutate the stored query.
- Do not write expansion terms into thoughts.
- Keep expansion route visible in result metadata.
- Avoid expansion if top lexical hits are weak/noisy.
- Query
"trip cost"expands to"invoice vendor payment"from feedback docs. - Expanded route recovers a result missed by original lexical.
- Expansion disabled preserves exact old results.
- No feedback docs means no expansion.
- Very common terms are filtered.
cargo test query_expansion- LoCoMo smoke test first 200 queries.
- LongMemEval check for no R@5 regression.
Commit:
feat(search): add pseudo-relevance query expansion route
Can run mostly independently. Needs final integration with ranked search.
Create persistent summary thoughts that improve long-horizon retrieval without rewriting original memories.
src/lib.rssrc/search/ranked.rssrc/search/summary_index.rsnewtests/hierarchical_summary_tests.rs- optional dashboard/TUI later
Use existing primitives:
ThoughtType::SummaryThoughtRelationKind::Summarizesrefs- timestamps
- agent/session metadata
No destructive compaction.
Add API:
pub struct SummaryBuildConfig {
pub window_size: usize,
pub overlap: usize,
pub by_session: bool,
pub by_agent: bool,
pub by_entity_type: bool,
}
pub fn build_summary_candidates(&self, config: SummaryBuildConfig) -> Vec<SummaryCandidate>;Important: MentisDB core should not require an LLM. So split into two layers:
Core:
- selects ranges/clusters that need summaries
- returns candidate source thought IDs
- can create extractive summaries if needed
Daemon/API:
- optional LLM-generated summary content
- append as normal
Summarythought
Level 0:
- raw thoughts
Level 1:
- session/topic summaries
Level 2:
- chain-level rolling summaries
Relations:
- summary -> raw thoughts via
Summarizes - summary -> previous summary via
ContinuesFromorSummarizes
When query looks broad/global:
- search summaries first
- expand down from summaries to summarized raw thoughts
- boost raw thoughts whose parent summary matched
- Building summary candidates does not mutate chain.
- Appended summaries preserve hash-chain integrity.
- Summary retrieval can find raw thought through
Summarizes. - Re-running candidate selection skips already summarized ranges.
- Works across sessions and entity types.
cargo test hierarchical_summary- Full integrity test after summary append.
- Benchmark global/query-summary questions separately.
Commit:
feat(memory): add append-only hierarchical summary candidates
Can run independently after Phase 0, but best integrated after A/B/C.
Route queries to the right retrieval signals before scoring.
src/search/query_intent.rsnewsrc/search/ranked.rssrc/server.rsonly if API fields neededtests/query_intent_tests.rs
Start deterministic, no LLM.
Heuristics:
| Query Pattern | Intent |
|---|---|
| contains date/time words | temporal |
| contains “who”, “which agent”, “by” | agent-focused |
| contains “why”, “because”, “caused” | causal |
| contains “summarize”, “overall”, “all about” | summary/global |
| contains known entity type/concept | entity-focused |
| short abstract query | semantic |
Output:
pub struct QueryRoutingPlan {
pub lexical_weight: f32,
pub vector_weight: f32,
pub graph_weight: f32,
pub ppr_weight: f32,
pub temporal_weight: f32,
pub summary_weight: f32,
pub enable_prf: bool,
}In ranked search:
- Build
QueryIntent. - Build
QueryRoutingPlan. - Execute selected routes.
- Fuse route results.
- Include route metadata in response.
- “when did…” boosts temporal route.
- “who said…” searches agent metadata.
- “why did…” boosts causal/DerivedFrom/CausedBy edges.
- “summarize…” searches summaries first.
- Routing disabled keeps legacy weighting.
cargo test query_intent- LoCoMo category breakdown if labels available.
- Manual probes against existing chains.
Commit:
feat(search): add query-aware retrieval routing
After A-D land independently.
In src/search/ranked.rs:
Order:
- Parse query intent.
- Run lexical original.
- Optionally run PRF expanded lexical.
- Run vector route.
- Run graph route: BFS or PPR.
- Optionally run summary hierarchy route.
- Fuse with RRF.
- Add route score diagnostics.
Expose per-result route contributions:
{
"score": {
"total": 12.4,
"lexical": 4.1,
"vector": 3.2,
"graph": 2.8,
"ppr": 1.7,
"prf": 0.6
}
}Env vars or API fields:
MENTISDB_GRAPH_ALGORITHM=bfs|ppr
MENTISDB_PRF_QUERY_EXPANSION=true|false
MENTISDB_QUERY_ROUTING=true|false
MENTISDB_SUMMARY_ROUTE=true|false
Prefer API fields first, env defaults second.
- LoCoMo first 200 queries
- LongMemEval 100-query subset if available
- Compare:
- baseline current
- PPR only
- PRF only
- PPR + PRF
- all routes
- LoCoMo-10P R@10
- LongMemEval R@5/R@10/R@20
Ship if:
- LoCoMo-10P R@10 improves or holds within -0.5pp
- LongMemEval R@5 does not drop more than 2pp
- Latency increase acceptable, ideally under 25% for default settings
- New features can be disabled
Owns Workstream A.
Returns:
- implementation
- tests
- benchmark notes
Owns Workstream B.
Returns:
- implementation
- tests
- top expansion examples
Owns Workstream C.
Returns:
- candidate builder
- append-only summary flow
- tests
Owns Workstream D.
Returns:
- intent classifier
- routing weights
- tests
Owns:
- Phase 0 interfaces
- integration
- benchmarks
- docs/changelog
- final release gate
If we want maximum impact with minimum risk:
- Implement PRF query expansion first.
- Implement PPR graph expansion second.
- Benchmark both independently.
- Only then add query routing and summary hierarchy.
Reason: PRF and PPR directly target retrieval quality and can be evaluated fast. Summary hierarchy is valuable but has more product/API surface area.