About the position
We’re looking for a hands-on technical leader to architect, fine-tune, and deploy on-device small language models (SLMs) for consumer security at scale. You’ll lead a focused team of 3–5 senior engineers while remaining deeply involved in the code and technical architecture.
Your core responsibility is building high-performance, privacy-preserving AI models that run directly on user devices (Mac, iOS, Android, Linux). You’ll own model optimization, fine-tuning for tool-use accuracy, evaluation frameworks, and cost-aware deployment strategies. While you won’t own the agent orchestration platform itself, you’ll work closely with it to ensure models behave correctly in multi-turn conversations and make reliable tool-calling decisions.
This role sits at the intersection of edge ML, applied LLMs, and production engineering. Success requires navigating real-world tradeoffs: latency vs. capability, privacy vs. accuracy, on-device vs. cloud execution, and cost vs. performance.
This is not a traditional director role. You’ll spend 60%+ of your time on technical architecture and implementation, with the remainder focused on mentoring senior engineers and setting technical direction.
This is a Hybrid remote position located in a hub location of Frisco, TX or San Jose, CA. You will be required to be onsite on an as-needed basis, typically 1-4 days per month. We are only considering candidates within a commutable distance to this location and are not offering relocation assistance at this time.
Responsibilities
• Design and deploy small language models optimized for on-device inference (Mac, iOS, Android, Linux)
• Lead model optimization efforts including quantization, pruning, distillation, and efficient inference pipelines
• Fine-tune models to improve tool selection accuracy and conversational behavior in security-focused workflows
• Build evaluation frameworks to measure model efficacy, tool-calling accuracy, conversation quality, and safety in production
• Create synthetic data and workflow simulations to train and validate security-relevant conversations
• Partner closely with agent orchestration systems to optimize multi-turn dialogue behavior and state handling
• Implement cost-optimization strategies such as intelligent on-device vs. cloud routing, prompt caching, batching, and token efficiency
• Integrate cloud-based LLMs when deeper reasoning or broader context is required
• Build production ML systems that detect threats and protect users directly on-device
• Set technical standards and architectural direction for AI/ML across the security platform
• Mentor principal engineers and architects while remaining hands-on
Requirements
• 10+ years of software engineering experience, with 5+ years focused on ML/AI
• Proven experience shipping ML models to production with transferrable skills to deploy these on edge or mobile platforms
• Experience with conversational AI systems and tool/function-calling architectures
• Strong Python and systems programming skills (C++ or Rust) for performance-critical code
• Deep expertise in model optimization (INT4/INT8 quantization, pruning, distillation)
• Hands-on experience with PyTorch and at least one edge deployment framework (TensorFlow Lite, CoreML, ONNX Runtime, or llama.cpp)
• Experience building evaluation and benchmarking frameworks for ML systems
Nice-to-haves
• Experience applying ML systems in security, safety, or other adversarial domains
• Master’s degree in CS, ML, or a related field (or equivalent practical experience)
Benefits
• Bonus Program
• Pension and Retirement Plans
• Medical, Dental and Vision Coverage
• Paid Time Off
• Paid Parental Leave
• Support for Community Involvement