AI&

AI&

Member of Technical Staff - Inference Serving

Japan (Hybrid)RemotePosted 27 days ago¥7,000,000 – ¥7,000,000
Full TimeSeniorRemoteJP

See how this job matches your profile

Sign in for an AI-powered fit score, breakdown, and a tailored resume.

Sign in

Job Description

As an inference & serving engineer, your objective is to build a high-performance, multi-tenant serving stack that squeezes maximum utilization out of heterogeneous hardware. This involves navigating

Key Highlights

  • Runtime Selection & Deep Optimization: Lead the evaluation, integration, and continuous tuning of diverse inference frameworks to ensure best-in-class performance across LLM, Video, and Multimodal workloads.
  • Latency & Throughput Engineering: Own the end-to-end performance profile of the model lifecycle, implementing advanced strategies such as disaggregated prefill/decode, speculative decoding, and continuous batching to minimize TTFT and maximize tokens-per-second.
  • Scalable Systems Evolution: Design and implement serving architectures that function seamlessly on small experimental clusters while providing a clear, robust path to massive-scale, multi-node deployments.
  • Advanced Memory & Cache Orchestration: Implement and optimize memory management techniques to maximize KV-cache reuse and minimize redundant computations in multi-turn or high-concurrency scenarios.
  • Day 0 Model Support: Working with the ecosystem, craft a Day 0 model support strategy ensuring our stack provides stable, high-performance support for new models when they are released.

Qualifications

Required Qualifications

  • Inference Engine: Deep experience with the internals of modern runtimes. You are a prominent contributor to inference engine ecosystems, including but not limited to OSS projects or proprietary engines at top-tier AI labs.
  • Multimodal Domain Knowledge: Understanding of the specific challenges involved in serving Large Language Models alongside Video and Vision-based generative models.
  • Scale-First Engineering: A track record of building and managing distributed systems that have evolved from small-scale proofs-of-concept to large-scale production deployments.
  • Great Team Spirit: A mission-driven approach to engineering, valuing clear communication, hands-on execution, and collective success over individual silos.

Interested in this role?

Sign in or create a free account to see how this job matches your skills, apply with one click, and let our AI tailor your resume.

Sign in to apply
AI-powered resume optimization
Save and track your applications

Job Details

Employment Type

Full Time

Experience Level

Senior

Salary Range

¥7,000,000 – ¥7,000,000

Location

Japan (Hybrid)

Work Mode

Remote

Posted

27 days ago

Country

JP