netoperationalslot297,841,022nodes412 / 487utilization68%p5014.2 ms/tok
llama-3.1-70b24.1ms/tokqwen-2.5-72b18.7ms/tokmixtral-8x22b31.4ms/tokdeepseek-r152.3ms/tokphi-48.1ms/tokcommand-r-plus22.9ms/tokgemma-2-27b14.6ms/tokllama-3.1-70b24.1ms/tokqwen-2.5-72b18.7ms/tokmixtral-8x22b31.4ms/tokdeepseek-r152.3ms/tokphi-48.1ms/tokcommand-r-plus22.9ms/tokgemma-2-27b14.6ms/tokllama-3.1-70b24.1ms/tokqwen-2.5-72b18.7ms/tokmixtral-8x22b31.4ms/tokdeepseek-r152.3ms/tokphi-48.1ms/tokcommand-r-plus22.9ms/tokgemma-2-27b14.6ms/tok
buildv0.7.4-rc.2
◇ playground · live · mainnetauto-routed to nearest healthy nodeno signup · 1k free tokens / day

Run inference.

Pick a model, type a prompt, hit run. Output streams from the cheapest healthy GPU within your latency tolerance. Cost meter updates in real time. Settle on Solana when done.

model
region · auto-route
◇ prompt31 chars
◇ outputidle
# output will stream here ◇◇◇
tokens 0elapsed 0scost $0.00000 USDCnode nyc-3

◇ programmatic · same thing, from code

Six lines from your agent's runtime.

The SDK wraps a standard fetch with signature + payment. No SDK lock-in — the underlying HTTP is documented and verifiable.

# install
$ pnpm add @nimbut/sdk @solana/web3.js

import { nimbut } from "@nimbut/sdk";
import { Keypair } from "@solana/web3.js";

const agent = Keypair.fromSecretKey(process.env.AGENT_KEY);
const out = await nimbut.infer({
model:  "qwen-2.5-72b",
prompt: "summarise this page",
agent,
rail:   "solana-usdc",
});
# streaming
const stream = nimbut.stream({
model:  "deepseek-r1",
prompt,
agent,
});

for await (const tok of stream) {
process.stdout.write(tok.text);
}

// settled on solana in 1 slot · receipt id
// → cost: 0.00412 USDC · ttft 142ms · 168 tok

Free up to 1k tokens / day.
Then pay per token.

No subscription. No platform fee. Just inference.