About Mentra
Mentra is building an AI therapist that helps people navigate stress, anxiety, and personal growth. We believe that support should feel human - remembering what matters, responding without lag, and reaching people at the right moment. Our objectives are clear: make users come back, and make the AI feel genuinely human.
This is a mission-critical hire. The AI/ML Engineer will own the v1 AI upgrade - prompt quality, conversation flow, contextual memory, and the push notification engine that keeps users engaged. You will sit at the intersection of model quality, product experience, and ethical responsibility in one of the most sensitive domains technology can touch.
The Problem You Will Solve
What success looks like for this role
≥ 80% of audited sessions score 4/5+ on emotional specificity, depth & closure · Session abandonment reduced by 25% · Voice & text latency ≤ 2 seconds (full response, non-streaming) · ≥ 60% of returning users experience natural contextual memory · D7 retention improved to 28–30%
Today, roughly 40% of returning users feel the AI does not remember previous sessions. Push notification CTR sits at 12% on generic reminders. Session abandonment before meaningful depth is 35%. You will change all of this.
What You Will Do
1. AI Therapist Quality - Own the v1 Upgrade
- Diagnose why sessions score below 4/5 on emotional specificity, depth, and session closure using NLP techniques — sentiment analysis, topic modelling, lexical diversity, and semantic similarity scoring
- Design and execute a hybrid improvement strategy: prompt engineering for fast iteration cycles and targeted fine-tuning on curated, annotated datasets that emphasize empathy, emotional tone, and safe messaging
- Build automated regression detection pipelines tracking F1 scores, emotional specificity, satisfaction ratings, and latency - with alerting before degradation reaches users
- Implement A/B testing and CI/CD integration to validate improvements before every deployment
- Work with RLHF to adapt and refine model outputs safely over time
2. Real-Time Systems - Hit ≤ 2 Seconds, Every Time
- Architect or optimize the AI response pipeline to deliver complete (non-streaming) voice and text responses within a 2-second total latency budget across STT → LLM inference → TTS → delivery
- Implement event-driven architecture patterns to replace bottlenecks caused by restful polling and synchronous processing
- Apply caching, async offloading, and efficient communication protocols (e.g. gRPC) to reduce redundant compute and minimize database round-trips
- Select and integrate optimized TTS engines and LLM providers based on speed, cost, and quality tradeoffs
- Define per-step latency budgets, implement timeouts and fallback mechanisms, and own production monitoring (P50/P99) with alerting on tail latency regressions
- Plan and execute horizontal and vertical scaling strategies to maintain performance under load
3. Contextual Memory - Make the AI Remember What Matters
- Design and implement a hybrid memory architecture combining short-term session context with long-term interaction history, using embeddings and structured metadata for efficient, low-latency retrieval
- Define what to store (and critically, what not to store) - creating a schema that is clinically safe, user-trust-building, and compliant with GDPR and HIPAA requirements
- Implement privacy-by-design from day one: data minimization, encryption of sensitive memory fields, and role-based access controls
- Build user-facing memory controls - the ability to view, correct, and delete stored memories - as a product feature, not an afterthought
- Measure memory quality continuously: reference accuracy, hallucination rate, and user-perceived naturalness
4. Push Notifications & Re-engagement - Bring Users Back
- Design and build a notification trigger system - event-driven, personalized by user engagement history, session themes, emotional state at close, and inactivity signals
- Segment users by engagement profile and tailor notification frequency, timing, and content to maximize CTR without causing notification fatigue
- Evolve rule-based triggers toward ML-powered send-time optimization and content variant selection as data matures
- Measure re-engagement lift and iterate on notification logic using engagement metrics, unsubscribe rates, and session re-entry quality
5. Responsible AI - Hold the Ethical Bar
- Implement confidence scoring and thresholding to flag uncertain or emotionally risky model outputs before they reach users
- Design and maintain human-in-the-loop fallback pathways for crisis intervention and high-sensitivity cases - AI must support, never replace, human care
- Conduct continuous bias analysis across training data and production outputs; apply mitigation through data resampling, augmentation, and fairness-aware algorithms
- Maintain GDPR and HIPAA compliance through data classification, strict access controls, and regular security audits
- Proactively flag risks to the product and clinical team before deploying changes that affect real users
What We Need From You
Must-Haves
- 4+ years of hands-on Python development with production AI/ML systems shipped and maintained
- Practical experience with pretrained models and LLM fine-tuning - Hugging Face ecosystem, prompt engineering, RLHF, or equivalent
- Demonstrated ability to design and optimize real-time AI pipelines with hard latency constraints (sub-3s or better)
- Experience with cloud infrastructure and scalable deployment - AWS (EC2, Lambda, S3, Transcribe) or equivalent
- Working knowledge of event-driven architecture, async processing, and efficient API design (REST, gRPC, WebSockets)
- Clear understanding of responsible AI principles: bias detection, confidence thresholding, human fallback design, and data privacy
- Ability to translate technical trade-offs into business impact for non-technical stakeholders - you communicate early, clearly, and with visual aids when needed
- Modular, maintainable code design - you build things that can be extended without heavy refactoring
Strong Advantages
- Experience with speech-to-text and text-to-speech pipelines (Whisper, AWS Transcribe, ElevenLabs, or similar)
- Knowledge of NLP evaluation techniques: n-gram analysis, lexical diversity, semantic similarity, sentiment scoring
- Experience with push notification systems (FCM/APNs) and personalization logic
- Familiarity with GDPR and/or HIPAA requirements in a product context - not just compliance theory
- Experience working in small, fast-moving product teams where you owned decisions end-to-end
- Background in or personal connection to mental health, psychology, or behaviour change - you understand why this mission matters
How You Work
- You protect core user value (real-time availability, accuracy, safety) and make principled trade-offs on everything else
- You think in systems - architecture diagrams, data flows, and failure modes - before writing code
- You involve stakeholders early, set honest expectations, and follow through
- You have a point of view on product decisions, not just engineering ones
- You are genuinely bothered when AI behaves poorly in sensitive contexts - and you do something about it
If you've read this and thought about how you'd solve these problems - we want to talk.
Pay: ₹95,000.00 per month
Experience:
- building and fine-tuning conversational AI systems: 3 years (Required)
- sentiment analysis, emotion AI, or behavioral AI systems: 3 years (Required)
- building human-like AI interactions : 3 years (Required)
Work Location: Remote