The ATBF inference engine · v1

We make models fast, cheap, and reliable to run.

ATBF Labs is an inference-engine company. We build the runtime, kernels, and serving layer that turn model weights into a production endpoint, then hold its latency and cost down under real load.

View open roles Read the approach

Tail latency

p99 < 200ms

Throughput

3.4× baseline

Cost / token

−60%

Uptime

99.95%

01 The engine

An inference stack, built from first principles.

Inference decides whether a model is usable in production. We own the whole path, from the kernel to the cluster, so nothing in the middle is someone else's problem.

The runtime

One engine, every model.

A purpose-built runtime: continuous batching, paged KV-cache, and a scheduler that keeps GPUs full. Bring weights, get a production endpoint.

Performance

Win at the metal.

Custom GPU kernels, low-precision numerics, and speculative decoding. We chase latency and throughput down to the last microsecond.

Reliability

Holds under real load.

Multi-GPU, multi-node serving that degrades gracefully and recovers on its own. Tail latency stays flat when traffic does not.

02 Performance

The fine print, large.

Every number here is a design target we hold ourselves to, and measure on every release. No vibes, just the survey point.

Latency

p99 under 200ms

Tail latency held flat across bursty production traffic, not just the happy path.

Throughput

3.4× tokens / sec

Continuous batching and fused kernels keep every GPU saturated.

Cost

60% lower per token

Quantization and scheduling that turn the same hardware into more serving.

Scale

Multi-node, zero-downtime

Rolling capacity across nodes with no cold starts and no dropped requests.

03 Careers

We're hiring the first ten.

Small team, high bar, real ownership. If you want to build inference from the metal up, these are the seats. Don't see yours? Write to us anyway.

001Member of Technical Staff, Inference EngineOwn the runtime that turns model weights into a fast, reliable production endpoint.EngineeringSan Francisco, CAOpenView role →002Member of Technical Staff, Performance & GPU KernelsWin latency and throughput at the lowest layer of the stack, one kernel at a time.EngineeringSan Francisco, CAOpenView role →003Member of Technical Staff, ResearchResearch efficient inference, then ship it into the engine that runs production traffic.ResearchSan Francisco, CAOpenView role →004Chief of StaffBe the founders' force multiplier across strategy, operations, and execution.OperationsSan Francisco, CAOpenView role →