QUICfeed now faster with pipes: https://quicfeed.net Nicolas Weil suggested that we will need to take direct encoder input rather than reading from files, so I added support for piped fragmented CMAF via GPAC. Thank you, Romain Bouqueau for your tips on how to make this work!
More Relevant Posts
-
🤖 Post #637: arXiv:2603.03043 **IoUCert: Formal Robustness Verification for Anchor-Based Object Detectors** Until now, formal robustness guarantees for object detectors like YOLO were out of reach — IoUCert introduces the first verified bounds on IoU for realistic anchor-based models. Key contributions: • Coordinate transform eliminates precision-degrading non-linear relaxations • Novel IBP method derives tight optimal IoU bounds • First verification of SSD, YOLOv2, and YOLOv3 under input perturbations • Scales formal verification from classifiers to full detection pipelines #MachineLearning #ComputerVision #ObjectDetection #AIResearch
To view or add a comment, sign in
-
For frontier MoE inferencing, GB300 NVL72 FP4 framemogs h100 even when both blackwell ultra & hopper have all the optimizations including disagg & wide expert parallelism enabled. We see similar trends when comparing GB300 FP8 to H100 FP8. For pretraining, blackwell & rack scale only offers 2-4x performance uplifts while inference is where blackwell shines.
To view or add a comment, sign in
-
The path forward for storage is no longer a debate. Robert Terlizzi charted 70 years of storage evolution—from washing-machine-sized disks to fabric-native speed—in his blog series. Lightbits was designed with NVMe/TCP from day one because at 400G and 800G, every "translation layer" becomes a liability. Read the blog finale: The End of the Beginning 👉 https://ow.ly/WIVk50XRbU7 #NVMeTCP #ITInfrastructure #StorageEvolution
To view or add a comment, sign in
-
🚀 Final version of our ICLR 2026 paper is now available! I’m excited to share that the final version of our paper, “Batch Pruning by Activation Stability,” is now available. 📄 Paper: https://lnkd.in/gjJcmP9T 💻 Code: https://lnkd.in/g_4Hk9U7 In this work, we propose a dynamic data pruning framework that reduces training cost by leveraging activation stability as an internal signal to discard less informative batches, achieving significant savings in data usage and GPU node-hours while preserving accuracy. #ICLR2026 #MachineLearning #DeepLearning
To view or add a comment, sign in
-
-
Nvidia's Nemotron 3 Super model released, 120B (12B active) pretrained on NVFP4 Mamba2 + GQA + latent MoE + MTP 1M context, 25T pretraining tokens the closest thing you can get to a true "open source" model (weights, code, partial data, recipe all open) Frontier performance in this weight class with unmatched throughput and inference speed. Awesome! https://lnkd.in/d-Juy6zM
To view or add a comment, sign in
-
-
Claude is very helpful. I wondered if my homegrown GP toolkit would be able to explore NN architectures. It does. In one day, I used Claude to write a toy grammar whose typed expressions produce Torch modules. The GP engine is able to combine module-producing operators, using the type system to match tensor dimensions (And what a stress test for polymorphic type matching routines!). Results are promising when compared to SOTA. More runs with more seeds will be needed, of course. Claude seems good at analyzing results too, assuming it does not hallucinates. All this needs to be thoroughly checked and it is a task that sadly cannot be delegated to a LLM :-/.
To view or add a comment, sign in
-
-
At 167 tok/s/user interactivity on Deepseek 670B MoE at 8k context length, it would cost $0.96 per million output tokens on GB200 NVL72 FP4 verus $2.3 per million output tokens on B200 even with DeepSeek system optimizations like disaggregrated PD & wide EP enabled.
To view or add a comment, sign in
-
-
This is one of those things that people don’t really understand when they buy compute based on early specs and data sheets designed for c suite instead of engineering.
At 167 tok/s/user interactivity on Deepseek 670B MoE at 8k context length, it would cost $0.96 per million output tokens on GB200 NVL72 FP4 verus $2.3 per million output tokens on B200 even with DeepSeek system optimizations like disaggregrated PD & wide EP enabled.
To view or add a comment, sign in
-
-
FLUX.2 [klein] 9B just got 2x faster at image editing, especially with multiple reference images. Same quality, no price increase. The update introduces KV-caching and FP8 quantized weights built with NVIDIA - faster inference, less VRAM, and the speedup grows with every reference image you add. Already on Klein 9B via API? Free upgrade, faster, same price. On Klein 4B and want better quality? 9B is now closer in speed.
To view or add a comment, sign in
-
-
New Post: Dynamic Layer‑wise Quantization Scaling for FP8 Inference of Generative Pre‑trained Transformers: A Practical Approach Toward 10× Energy Savings - https://lnkd.in/grkhRnFw ### Abstract We present a novel *Dynamic Layer‑wise Quantization Scaling* \(DLQS\) framework that enables efficient FP8 inference of large transformer models with negligible loss in accuracy. DLQS adaptively selects per‑layer scaling factors and precision switches for feed‑forward, attention, and output projection sub‑graphs, guided by a lightweight *Range‑Aware Loss* \(RAL\) signal derived from forward activations. \[…\]
To view or add a comment, sign in
Siden•3K followers
2moAs part of the little HLS load testing tool, there is also Nix config to create a deterministic HLS origin server with ffmpeg + nginx. So basically you can do "nix run .#test-origin-4k-abr" which will build and run a microvm with the origin. It's just the good old ffmpeg test pattern https://github.com/randomizedcoder/go-ffmpeg-hls-swarm/blob/main/nix/test-origin/ffmpeg.nix