Plaform Engineer - AIOps & Infrastructure

Location: SF Bay Area / Remote.

About Kedara

Millions of families struggle to coordinate elder care. We're building an AI-powered Care Coordination System with personalized, voice-AI Assistant that becomes a trusted partner for caregivers—assisting them with managing the care coordination burden and reducing burnout.

Our founders scaled Speech AI and NLP at MindMeld and BabbleLabs (both acquired by Cisco). We're a small team of product, AI/ML, and senior care experts.

The Role

Design and build the core backend infrastructure for the application and real-time AI inference—powering our app including voice-AI conversational assistant. You will own the overall system architecture that powers the frontend and the AI orchestration layer optimizing for latency, cost, security, reliability and human-like conversational fluidity for the conversational assistant. You will work across GCP and AWS (using each for what it does best) and manage infrastructure costs across our GCP, AWS, and Nvidia Inception credit pools.

Core Infrastructure

Partner with frontend, product, data and AI/ML leads to translate requirements into end-to-end system architecture
Own end-to-end delivery from design through production for reliability and scalability. [Manage cloud credits wisely across GCP and AWS — architect for cost, performance]
Design scalable APIs serving user onboarding, data layer, identity and role based access controls
Build security and compliance: HIPAA-compliant data handling, encryption, access controls, PHI protection

AI Inference Infrastructure

Architecting real-time media transport layers (e.g., WebRTC or optimized WebSockets) to stream raw audio, video & image data between the mobile client and inference engine, for ultra-low latency in-app voice and video interactions
Integrate and orchestrate different models (ASR, TTS, LLMs/SLMs, OCR etc), a multi-modal pipeline (audio/voice, video, images) and orchestration layer for “stateful agents” to coordinate complex care tasks
Route inference between on-device models (Speech-to-Text, quick responses) and cloud LLMs (reasoning, RAG) based on latency and cost trade-offs
Establish observability for inference quality: latency, token usage, cost per request, hallucination detection

Requirements

5-8+ years building distributed backend systems at scale, especially for consumer apps preferably from 0-1 in high ambiguity and high velocity environments
AI-native, hands on with tools like Claude Code to rapidly build and expedite delivery while keeping thinking hat to ensure code quality, security and reliability for the platform.
Strong proficiency in a systems language (eg Python) for rapid AI experimentation and implementation for high-performance and conversational fluidity.