The best Hacker News stories from All from the past day

Go back

Latest posts:

Princeton mandates proctoring for in-person exams, upending 133 year precedent

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.<p>We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.<p>Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).<p>Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)<p>You can test it right now and finetune on your Mac/PC: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a><p>The full writeup on the architecture is here: <a href="https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md" rel="nofollow">https://github.com/cactus-compute/needle/blob/main/docs/simp...</a><p>We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.<p>While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.<p>This is part of our broader work on Cactus (<a href="https://github.com/cactus-compute/cactus" rel="nofollow">https://github.com/cactus-compute/cactus</a>), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: <a href="https://news.ycombinator.com/item?id=44524544">https://news.ycombinator.com/item?id=44524544</a><p>Everything is MIT licensed. Weights: <a href="https://huggingface.co/Cactus-Compute/needle" rel="nofollow">https://huggingface.co/Cactus-Compute/needle</a> GitHub: <a href="https://github.com/cactus-compute/needle" rel="nofollow">https://github.com/cactus-compute/needle</a>

Starship V3

Leaving GitHub for Forgejo

Restore full BambuNetwork support for Bambu Lab printers

Linux gaming is faster because Windows APIs are becoming Linux kernel features

Linux gaming is faster because Windows APIs are becoming Linux kernel features

I moved my digital stack to Europe

Why senior developers fail to communicate their expertise

Why senior developers fail to communicate their expertise

EU to crack down on TikTok, Instagram's 'addictive design' targeting kids

Screenshots of Old Desktop OSes

Bambu Lab is abusing the open source social contract

If AI writes your code, why use Python?

Googlebook

<a href="https://www.reddit.com/r/Android/comments/1tb8xls/introducing_googlebook_a_new_category_of_laptops/" rel="nofollow">https://www.reddit.com/r/Android/comments/1tb8xls/introducin...</a>

Googlebook

<a href="https://www.reddit.com/r/Android/comments/1tb8xls/introducing_googlebook_a_new_category_of_laptops/" rel="nofollow">https://www.reddit.com/r/Android/comments/1tb8xls/introducin...</a>

Ratty – A terminal emulator with inline 3D graphics

Ratty – A terminal emulator with inline 3D graphics

Mythos Finds a Curl Vulnerability

GitLab announces workforce reduction and end of their CREDIT values

< 1 2 3 ... 17 18 19 20 21 ... 780 781 782 >