CommunityNews

CommunityNews

Deepseek - starting this week we'll open-source 5 repos

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims, but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production. No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey. Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

DeepSeek-Open-Infra

Hello, DeepSeek Open Infra!

202502 Open-Source Week

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims,
but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production.
No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey.
Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

Day 1 - FlashMLA

Efficient MLA Decoding Kernel for Hopper GPUs
Optimized for variable-length sequences, battle-tested in production

:link: FlashMLA GitHub Repo
:white_check_mark: BF16 support
:white_check_mark: Paged KV cache (block size 64)
:zap: Performance: 3000 GB/s memory-bound | BF16 580 TFLOPS compute-bound on H800

Day 2 - DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

:link: DeepEP GitHub Repo
:white_check_mark: Efficient and optimized all-to-all communication
:white_check_mark: Both intranode and internode support with NVLink and RDMA
:white_check_mark: High-throughput kernels for training and inference prefilling
:white_check_mark: Low-latency kernels for inference decoding
:white_check_mark: Native FP8 dispatch support
:white_check_mark: Flexible GPU resource control for computation-communication overlapping

Day 3 - DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

:link: DeepGEMM GitHub Repo
:zap: Up to 1350+ FP8 TFLOPS on Hopper GPUs
:white_check_mark: No heavy dependency, as clean as a tutorial
:white_check_mark: Fully Just-In-Time compiled
:white_check_mark: Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
:white_check_mark: Supports dense layout and two MoE layouts

Day 4 - Optimized Parallelism Strategies

:white_check_mark: DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
:link: GitHub Repo

:white_check_mark: EPLB - an expert-parallel load balancer for V3/R1.
:link: GitHub Repo

:bar_chart: Analyze computation-communication overlap in V3/R1.
:link: GitHub Repo

Day 5 - 3FS, Thruster for All DeepSeek Data Access

Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

:zap: 6.6 TiB/s aggregate read throughput in a 180-node cluster
:zap: 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
:zap: 40+ GiB/s peak throughput per client node for KVCache lookup
:dna: Disaggregated architecture with strong consistency semantics
:white_check_mark: Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1

:inbox_tray: 3FS → GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
:fountain: Smallpond - data processing framework on 3FS → GitHub - deepseek-ai/smallpond: A lightweight data processing framework built on DuckDB and 3FS.

2024 AI Infrastructure Paper (SC24)

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

:page_facing_up: Paper Link
:page_facing_up: Arxiv Paper Link

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

New
First poster: joeb
The File System Access API with Origin Private File System. WebKit supports new API that makes it possible for web apps to create, open,...
New
CommunityNews
…or, “why make programming even harder?” Learning functional programming is an opportunity to discover a new way to represent programs, t...
New
First poster: bot
API Gateway Trends behind Features: Apache APISIX 3.0 vs. Kong 3.0 - API7.ai. By comparing the open-source API Gateway Apache APISIX and...
New
First poster: bot
When Zig is safer and faster than Rust. There are endless debates online about Rust vs. Zig, this post explores a side of the argument I...
New
First poster: peterchancc
Why I like Clojure as a solo developer | Biff. Most of the reasons fall into a few categories: data orientation, the JVM, and the REPL.
New
First poster: joeb
GitHub - crablang/crab: A community fork of a language named after a plant fungus. All of the memory-safe features you love, now with 100...
New
First poster: jkdiaz
Dark mode isn’t as good for your eyes as you believe. The shadowy display mode has leagues of fans claiming it helps reduce eye strain, ...
New
New
CommunityNews
:person_lifting_weights: Modern open-source fitness coaching platform. Create workout plans, track progress, and access a comprehensive e...
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
New
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New