Member of Technical Staff, Research

San Francisco, CAFull-time · On-site / HybridOpen

You will push the boundary of efficient inference and carry the result all the way into production. We care about research that moves a real metric: latency, throughput, cost, or quality held constant while one of the others improves. You will collaborate closely with the engine team to turn a prototype into something serving live traffic.

About ATBF Labs

ATBF Labs builds the inference engine for production AI. Every token a model serves in production runs through an inference stack, and that stack decides the latency, the cost, and the reliability of the product sitting on top of it. We build ours from first principles: custom GPU kernels, a purpose-built runtime, and a distributed serving layer that holds its tail latency under real load. We are a small team with a high bar, shipping to production from day one.

What you'll do

Key responsibilities

Conduct research on efficient inference: speculative and parallel decoding, quantization, sparsity, and KV-cache compression.
Design, implement, and evaluate new methods against rigorous, reproducible benchmarks.
Partner with the engine team to transition prototypes into production-grade systems.
Analyze empirical results, find the bottleneck, and iterate quickly to improve model quality and speed.
Track emerging work and bring the high-impact ideas into our roadmap.

Minimum qualifications

Research background in ML, systems, or a quantitative field, with a bias for empirical work.
Strong coding ability in Python and a systems language (C++/CUDA a plus).
Experience designing experiments and communicating results to engineers and researchers alike.
Fluency with at least one ML framework (PyTorch, JAX).

Preferred qualifications

PhD in CS, ML, or a related field, or equivalent research experience.
First-author publications at peer-reviewed venues (NeurIPS, ICML, MLSys, ICLR).
Hands-on work with LLM/VLM inference internals.
A history of shipping research into a real system, not just a paper.

Example projects

Design a draft-model strategy for speculative decoding and prove the win on production traces.
Develop a KV-cache compression scheme and characterize the quality curve.
Build an evaluation harness that finds the optimal serving config for a class of models.

Compensation

$215,000 – $285,000 + equity

Base salary plus meaningful equity. The range is a guideline; final numbers reflect experience, skills, and location. Full health, dental, and vision coverage included.

Why ATBF Labs

Solve hard problems

Inference is a systems problem from the kernel up. You will work on the parts that decide whether a model is usable in production: latency, throughput, and cost.

Own the whole stack

Small team, large surface area. You will have real ownership across kernels, runtime, and serving, and your work ships to customers, not a backlog.

Measure everything

We make decisions on numbers, not vibes. Every change is benchmarked, every regression is caught, and the survey point marks exactly where we are.

Learn from the best

Work alongside people who have built and operated inference at scale, and who care more about a clean result than a clever one.

Apply for this role Or get in touch with a note and your work.

ATBF Labs is an equal-opportunity employer. We celebrate diversity and are committed to an inclusive environment for everyone who builds with us.