Member of Technical Staff, Research
You will push the boundary of efficient inference and carry the result all the way into production. We care about research that moves a real metric: latency, throughput, cost, or quality held constant while one of the others improves. You will collaborate closely with the engine team to turn a prototype into something serving live traffic.
ATBF Labs builds the inference engine for production AI. Every token a model serves in production runs through an inference stack, and that stack decides the latency, the cost, and the reliability of the product sitting on top of it. We build ours from first principles: custom GPU kernels, a purpose-built runtime, and a distributed serving layer that holds its tail latency under real load. We are a small team with a high bar, shipping to production from day one.
Key responsibilities
- Conduct research on efficient inference: speculative and parallel decoding, quantization, sparsity, and KV-cache compression.
- Design, implement, and evaluate new methods against rigorous, reproducible benchmarks.
- Partner with the engine team to transition prototypes into production-grade systems.
- Analyze empirical results, find the bottleneck, and iterate quickly to improve model quality and speed.
- Track emerging work and bring the high-impact ideas into our roadmap.
- Research background in ML, systems, or a quantitative field, with a bias for empirical work.
- Strong coding ability in Python and a systems language (C++/CUDA a plus).
- Experience designing experiments and communicating results to engineers and researchers alike.
- Fluency with at least one ML framework (PyTorch, JAX).
- PhD in CS, ML, or a related field, or equivalent research experience.
- First-author publications at peer-reviewed venues (NeurIPS, ICML, MLSys, ICLR).
- Hands-on work with LLM/VLM inference internals.
- A history of shipping research into a real system, not just a paper.
- Design a draft-model strategy for speculative decoding and prove the win on production traces.
- Develop a KV-cache compression scheme and characterize the quality curve.
- Build an evaluation harness that finds the optimal serving config for a class of models.
Base salary plus meaningful equity. The range is a guideline; final numbers reflect experience, skills, and location. Full health, dental, and vision coverage included.
Solve hard problems
Inference is a systems problem from the kernel up. You will work on the parts that decide whether a model is usable in production: latency, throughput, and cost.
Own the whole stack
Small team, large surface area. You will have real ownership across kernels, runtime, and serving, and your work ships to customers, not a backlog.
Measure everything
We make decisions on numbers, not vibes. Every change is benchmarked, every regression is caught, and the survey point marks exactly where we are.
Learn from the best
Work alongside people who have built and operated inference at scale, and who care more about a clean result than a clever one.
ATBF Labs is an equal-opportunity employer. We celebrate diversity and are committed to an inclusive environment for everyone who builds with us.