Pick a model, type a prompt, hit run. Output streams from the cheapest healthy GPU within your latency tolerance. Cost meter updates in real time. Settle on Solana when done.
◇ programmatic · same thing, from code
The SDK wraps a standard fetch with signature + payment. No SDK lock-in — the underlying HTTP is documented and verifiable.
# install $ pnpm add @nimbut/sdk @solana/web3.js import { nimbut } from "@nimbut/sdk"; import { Keypair } from "@solana/web3.js"; const agent = Keypair.fromSecretKey(process.env.AGENT_KEY); const out = await nimbut.infer({ model: "qwen-2.5-72b", prompt: "summarise this page", agent, rail: "solana-usdc", });
# streaming const stream = nimbut.stream({ model: "deepseek-r1", prompt, agent, }); for await (const tok of stream) { process.stdout.write(tok.text); } // settled on solana in 1 slot · receipt id // → cost: 0.00412 USDC · ttft 142ms · 168 tok
No subscription. No platform fee. Just inference.