Technology

DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs

Thomas Wiers February 24, 2025

1 1 minute read

FlashMLA

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.

Currently released:

BF16
Paged kvcache with block size of 64

Quick start

Install

Benchmark

python tests/test_flash_mla.py

Achieving up to 3000 GB/s in memory-bound configuration and 580 TFLOPS in computation-bound configuration on H800 SXM5, using CUDA 12.6.

Usage

from flash_mla import get_mla_metadata, flash_mla_with_kvcache

tile_scheduler_metadata, num_splits = get_mla_metadata(cache_seqlens, s_q * h_q // h_kv, h_kv)

for i in range(num_layers):
    ...
    o_i, lse_i = flash_mla_with_kvcache(
        q_i, kvcache_i, block_table, cache_seqlens, dv,
        tile_scheduler_metadata, num_splits, causal=True,
    )
    ...

Requirements

Hopper GPUs
CUDA 12.3 and above
PyTorch 2.0 and above

Acknowledgement

FlashMLA is inspired by FlashAttention 2&3 and cutlass projects.

Citation

@misc{flashmla2025,
      title={FlashMLA: Efficient MLA decoding kernel}, 
      author={Jiashi Li},
      year={2025},
      publisher = {GitHub},
      howpublished = {url{https://github.com/deepseek-ai/FlashMLA}},
}

Thomas Wiers February 24, 2025

1 1 minute read

DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs

FlashMLA

Quick start

Install

Benchmark

Usage

Requirements

Acknowledgement

Citation

Thomas Wiers

HBO’s ‘The Last of Us’ Review: The Greatest Video Game Adaptation Ever Made

The Emergence of New Technologies: Blockchain and Cryptocurrency

How 5G, AI, Cloud, and Cybersecurity Are Rewriting America’s Tech Landscape

The Potential of 5G Networks in Transforming Connectivity and Communication

The Growth of the Internet of Things (IoT) and its Impact on Daily Life

Reese Witherspoon, Her Mom, and Her Daughter Could Pass for Triplets in Latest Instagram Post

FlashMLA

Quick start

Install

Benchmark

Usage

Requirements

Acknowledgement

Citation

Thomas Wiers

With Product You Purchase

Subscribe to our mailing list to get the new updates!

2025 ARLINGTON SUPERCROSS RESULTS & HIGHLIGHTS

Ether Supply Squeeze? Bybit Hacker Emerges as World's 14th-Largest ETH Holder

Related Articles

The best vlogging cameras for 2025

Celebrating 1 Trillion Web Pages Archived

Today’s NYT Mini Crossword Answers for Monday, Oct. 6

Robots on the sidewalk: big cities’ experiment in automation meets mixed reviews

HBO’s ‘The Last of Us’ Review: The Greatest Video Game Adaptation Ever Made

The Emergence of New Technologies: Blockchain and Cryptocurrency

How 5G, AI, Cloud, and Cybersecurity Are Rewriting America’s Tech Landscape

The Potential of 5G Networks in Transforming Connectivity and Communication

The Growth of the Internet of Things (IoT) and its Impact on Daily Life

Reese Witherspoon, Her Mom, and Her Daughter Could Pass for Triplets in Latest Instagram Post