We're building an AI inference service leveraging confidential computing to ensure that prompts remain encrypted end-to-end. Our core engineering stack includes Go, Kubernetes, gRPC, and vLLM, with some web development using NextJS and Svelte. Most of our code is also on Github.
We love building and have few meetings for that reason. Key challenges include scaling infrastructure, extending our AI service (e.g. file upload, new models), contributions to vLLM for secure usage (e.g., secure prompt caching: https://www.privatemode.ai/articles/secure-prompt-caching-fo...), and optimizing inference performance.
We're looking for engineers with ~2 years of work experience who have strong expertise in a subset of our stack and ideally interest in AI innovation, especially serving customers in government and healthcare sectors.
We love building and have few meetings for that reason. Key challenges include scaling infrastructure, extending our AI service (e.g. file upload, new models), contributions to vLLM for secure usage (e.g., secure prompt caching: https://www.privatemode.ai/articles/secure-prompt-caching-fo...), and optimizing inference performance.
We're looking for engineers with ~2 years of work experience who have strong expertise in a subset of our stack and ideally interest in AI innovation, especially serving customers in government and healthcare sectors.