[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
-
Updated
Jan 17, 2026 - Cuda
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval
The official PyTorch implementation for CascadedGaze: Efficiency in Global Context Extraction for Image Restoration, TMLR'24.
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"
Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
Nonparametric Modern Hopfield Models
O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet field + content-gated Q·K gather at dyadic offsets.
HiCI: Hierarchical Construction-Integration for Long-Context Attention
🤖 Build a customizable, reliable Discord bot with Sage, designed for flexibility to enhance your server's interaction and engagement.
Tabular foundation model experiments with learned context sampling and efficient attention
Minimal implementation of Samba by Microsoft in PyTorch
Run local AI models on your machine with a secure, Rust-based inference engine that keeps your data private and provides controlled system access.
Add a description, image, and links to the efficient-attention topic page so that developers can more easily learn about it.
To associate your repository with the efficient-attention topic, visit your repo's landing page and select "manage topics."