netoperationalslot297,841,022nodes412 / 487utilization68%p5014.2 ms/tok

◇llama-3.1-70b24.1ms/tok◇qwen-2.5-72b18.7ms/tok◇mixtral-8x22b31.4ms/tok◇deepseek-r152.3ms/tok◇phi-48.1ms/tok◇command-r-plus22.9ms/tok◇gemma-2-27b14.6ms/tok◇llama-3.1-70b24.1ms/tok◇qwen-2.5-72b18.7ms/tok◇mixtral-8x22b31.4ms/tok◇deepseek-r152.3ms/tok◇phi-48.1ms/tok◇command-r-plus22.9ms/tok◇gemma-2-27b14.6ms/tok◇llama-3.1-70b24.1ms/tok◇qwen-2.5-72b18.7ms/tok◇mixtral-8x22b31.4ms/tok◇deepseek-r152.3ms/tok◇phi-48.1ms/tok◇command-r-plus22.9ms/tok◇gemma-2-27b14.6ms/tok

buildv0.7.4-rc.2

◇ release · v0.7.4-rc.2 · mainnet-betaoperational · 412 / 487 nodesbrooklyn · zurich · singapore

Open inference,
routed in milliseconds.

Nimbut is a decentralized inference network for autonomous AI agents on Solana. Open-source frontier models served from 412 nodes across 51 regions. Pay-per-token in USDC. Attested execution. No rate limits, no waitlists, no platform lock-in.

Run inference →Operate a node

◇ current route — sample optimal

> POST /v1/infer
  model:  qwen-2.5-72b
  region: auto
  pay:    usdc-spl

  ┌─ routed ────────────────────┐
  │  node      nyc-3.gpu-h100   │
  │  rtt       8 ms             │
  │  ttft      142 ms           │
  │  rate      18.7 ms / token  │
  │  cost      $0.00038 / tok   │
  └─────────────────────────────┘

◇ 01 · the network

A planet-scale GPU fabric, addressable from one endpoint.

Every request is routed to the cheapest healthy node within your tolerance window. No region pinning, no manual load-balancing. Failover is automatic and counter-signed.

FIG. A · COMPUTE TOPOLOGY · LIVE412 nodes / 14 regions / ◇ pulsing = high load

◇ 02 · the loop

Four hops from prompt to paid output.

No API key juggling. No model API to learn. No payment integration. One POST. Settled on Solana in one slot.

Sign the request

Your agent signs an inference request with its Solana keypair. The signature pre-authorizes spend up to a cap.

ii.

Route to the best node

The protocol matches your model + region tolerance to the cheapest available GPU at this instant. ~10ms.

iii.

Stream the output

Tokens stream back as they generate. The operator counts; the contract debits. Pay-per-token, settled in one slot.

iv.

Attest & settle

The output is signed by the operator and committed to the ledger. Disputes resolve from one query.

Repeat

The next request reuses your route. Hot models stay warm. Cold ones spin up in ~800ms.

◇ for agents

Spin up tooling without spinning up an API account.

Your agent doesn't need an OpenAI key, an Anthropic invoice, a rate-limit dashboard. It signs, it pays, it consumes. Per token. In USDC. On Solana.

┌Open-source frontier models · 47 in catalog
├Pay-per-token in USDC · settled per request
├Attested by TEE or N-of-M operator consensus
└No rate limits, no waitlists, no plat-fee

◇ for operators

Monetize your idle GPUs at agent-scale.

You have H100s. We have a queue of autonomous agents that want frontier inference, right now, in your region. Stake $NIMBUT, register your fleet, get paid per token served.

┌Reference fleet: 4× H100 SXM5 ≈ $84k/mo revenue
├Paid in USDC, settled hourly
├Choose your models; warm hot ones for premium routing
└Slash exposure proportional to misattested output

◇ 03 · right now

The network, as we serve.

A small slice of what's happening on mainnet-beta right now. All events are signed; all settle in one Solana slot.

◇ network log · live tail● streaming

◇ tokens / second · 24h32.8k

◇ median ttft · 7d142 ms

◇ revenue · operators · 7d$2.1M USDC

Open inference,
without the open invoice.

Mainnet-beta is live. Devnet is free. The SDK is six lines. You can be running inference in five minutes.

Try the playground →Run a node

Open inference,routed in milliseconds.