Section C · Economics & operations

Trust, Verification & Networking

A marketplace with thousands of independent providers needs systems to make supply legible to demand. DLPerf benchmarks, reliability scores, bandwidth measurements, and instance isolation are the load-bearing infrastructure of the platform.

The trust problem

The marketplace model works only if buyers can trust supply. With thousands of independent providers and no central vetting, that trust has to be system-generated, not human-curated.

The specific things buyers need to trust:

  • Hardware matches the listing. A listing claiming "H100" is actually an H100, not an A100 misrepresented.
  • Performance is consistent. The instance delivers within an expected range of throughput.
  • Uptime is reliable. The instance won't randomly disappear mid-job.
  • The host won't access the workload. The instance is isolated from the host's reach (or, when it isn't, the buyer knows that and acts accordingly).
  • Networking will support the workload. Bandwidth is as advertised.

Vast addresses each through different mechanisms.

DLPerf benchmarking

DLPerf is Vast's proprietary deep-learning benchmark score. It's a normalized metric — a single number that represents how the instance performs on a representative set of ML workloads.

Mechanically:

  • Vast periodically runs a benchmark suite on each listing (typically when the listing first goes live, and on an ongoing rolling basis).
  • The suite includes operations representative of training and inference — matrix multiplications, transformer-like patterns, kernels common in PyTorch.
  • Results are normalized against a reference baseline so different listings are comparable.
  • The number surfaces in the search UI as a key sortable field.

Why this matters: an "H100" listing with DLPerf 30 is meaningfully faster than an "H100" listing with DLPerf 22. The slower one might be thermally throttled, on a constrained PCIe lane, in a system with weak CPU/memory, or otherwise compromised. Buyers can see this before renting.

DLPerf isn't perfect. It captures average ML workload performance but doesn't reflect workload-specific bottlenecks. For genuinely memory-bandwidth-bound workloads or workloads with unusual kernel patterns, the right next step is a short test run on the listing before committing to a long job.

Reliability scoring

The reliability score tracks how often the host machine is online and accepting rentals, and how often rentals end abnormally (crashes, disconnects, etc.).

A host with consistent uptime over months builds a high reliability score. A host that has frequent disconnects, hard reboots, or other interruptions has a lower score.

Buyers filter on this. A 99%+ reliability score means the listing has been up nearly all the time. An 80% score is a real warning. A new listing with no track record gets a default placeholder while data is collected.

The reliability score is the closest thing to an SLA Vast offers. There's no money-back guarantee on uptime, but the score is the platform's mechanism for routing demand toward reliable supply.

Networking

Network capabilities vary more than any other dimension on Vast. A given listing has:

  • Internet bandwidth (up/down): Anywhere from residential 100 Mbps to datacenter 10 Gbps+.
  • Latency to common endpoints: Useful if you're calling APIs (OpenAI, S3, etc.) from inside the instance.
  • Public IP: Most listings give the renter a public IP for inbound traffic, but some are behind NAT and offer port forwarding instead.
  • Inter-GPU bandwidth (within a multi-GPU instance): NVLink, PCIe topology, etc.

What Vast doesn't have, materially:

  • InfiniBand between machines. Vast instances are not in a high-bandwidth cluster fabric. Cross-instance bandwidth is whatever the public internet provides.
  • VPC peering / private networking. If you need an instance to talk to your AWS VPC privately, you set up a VPN yourself.
  • SLAs on bandwidth. The advertised bandwidth is what the host reports; actual sustained throughput can vary.

These gaps shape the use cases. Single-node ML works fine on Vast. Multi-node distributed training across many GPUs doesn't.

Instance isolation

How is your workload protected from the host?

Vast instances run inside containers (Docker) on the host system. The container provides standard Linux container isolation — namespaces, cgroups, etc. The host root user, however, has access to the underlying machine and could in principle observe activity in the container.

What this means practically:

  • A malicious host could log network traffic from your container.
  • A malicious host could access files in your container's filesystem if they tried.
  • A malicious host could potentially exfiltrate workload data (model weights, datasets, etc.).
  • A malicious host could NOT trivially read GPU memory mid-computation, but with effort might extract residual data.

The platform's protection against this is reputational, not architectural. Hosts caught doing this get banned and lose access to the marketplace. But there's no cryptographic guarantee.

The practical implications for buyers:

  • Don't store API keys, AWS credentials, or other secrets in the image or filesystem.
  • Don't run regulated workloads (HIPAA / PCI / SOC-bound data).
  • Treat the instance as a semi-trusted execution environment.
  • For sensitive data, encrypt at rest and only decrypt at use.
  • Use Vast's "secure cloud" tier (more on this below) for higher-trust workloads.

Some hosts opt into a secure/verified tier that involves stronger isolation (TEE-style hardware features when available, attestation, more curated provider populations). The available list of such instances is smaller and pricier but closes some of the trust gap.

Security implications

Stepping back, the Vast security model is:

  • Buyer responsible for workload security. Encrypt secrets, sanitize logs, use ephemeral state. The platform provides isolation primitives but doesn't enforce them at a regulatory level.
  • Vast responsible for marketplace integrity. Banning bad actors, maintaining ranking systems, mediating disputes.
  • Provider responsible for their own host security. Patching, monitoring, etc.

This is a layered model with shared responsibility. It works for the customer base Vast serves; it breaks for customers whose compliance regime requires single-throat-to-choke security guarantees.

The implicit tier system

Reading across DLPerf + reliability + verified-host status, Vast effectively has an implicit tier system, even though it's not formally branded that way:

  • Tier 1 (best). Verified hosts with datacenter-grade infrastructure, high DLPerf, 99%+ reliability. Priced higher; competitive with mid-tier cloud pricing but with marketplace flexibility.
  • Tier 2 (typical). Small fleet operators with good infrastructure. 95-99% reliability, DLPerf within 20% of peak. Sweet spot for most users.
  • Tier 3 (budget). Hobbyist and residential hosts. Lower reliability, more variation. Cheap; suitable for fault-tolerant workloads.

Pricing reflects the tier — within the same GPU model, you can find a 3x spread from cheapest budget listing to most-expensive verified one. The buyer's choice is a tradeoff: pay more for less variance, or pay less and absorb variance.

Takeaway

Vast's trust and verification systems do the work that enterprise account managers and SLAs do at traditional clouds — they make heterogeneous supply legible to buyers and route demand to reliable supply. The systems are imperfect and don't close every gap. Where they fall short (compliance, multi-node networking, regulated workloads), Vast simply doesn't serve those segments.

The next chapter does the unit economics — how Vast's take-rate, provider margins, and customer pricing fit together.