M1 - Infra Lead - Observability
Posted 2025-10-26
Remote, USA
Full Time
Immediate Start
We're Hiring: M1 - Infra Lead - Observability! Are you ready to redefine observability and drive system reliability across a dynamic engineering organization? Join us as an Observability Lead and lead the charge in transforming how we monitor, analyze, and optimize our systems for maximum performance and user impact! What You’ll Do: ✅ Lead and mentor a team of observability engineers, fostering a culture of ownership and continuous improvement.✅ Redesign and implement an observability architecture that connects system health metrics, logs, and traces to real business impact.✅ Define and enforce observability best practices across engineering teams, ensuring proper instrumentation and meaningful telemetry data.✅ Optimize the integration and usage of observability platforms (e.g., Datadog, Grafana, Prometheus, ELK Stack).✅ Develop a structured alerting strategy to ensure actionable responses and reduce noise.✅ Partner with SRE, engineering, and product teams to embed observability into the software development lifecycle.✅ Lead post-incident analysis to drive permanent improvements and prevent recurring issues.✅ Design and maintain clear, actionable dashboards for real-time system health and performance visibility.✅ Promote a proactive observability mindset, shifting from reactive monitoring to proactive system reliability.✅ Provide training and documentation to help engineering teams integrate observability practices.✅ Collaborate with security and compliance teams to align observability practices with regulatory requirements.✅ Stay ahead of industry trends and emerging technologies to continuously evolve our observability strategy. What We’re Looking For: 8+ years of experience in observability, SRE, or infrastructure operations. Proven leadership experience in driving accountability and engagement across engineering teams. Deep understanding of observability principles (monitoring, logging, tracing, metrics). Expertise with Datadog, Opsgenie, Grafana, OpenTelemetry, Prometheus, and similar tools. Strong analytical skills to correlate observability data with user experience and business impact. Experience designing alerting frameworks that prioritize actionable responses over noise. Ability to drive cultural and process change within engineering organizations. Strong troubleshooting skills for debugging performance issues and infrastructure failures. Excellent communication and leadership skills to mentor and influence teams. Experience in regulated environments with knowledge of security and compliance requirements. Advanced English proficiency for technical discussions and collaboration. Why Join Us? Be part of a transformative role where your leadership and expertise will shape the future of observability, driving operational excellence and system reliability across the organization. If you're ready to lead the way in observability, apply today! #Observability #PlatformEngineering #Leadership #Hiring #TechJobs Spin está comprometida con un lugar de trabajo diverso e inclusivo. Somos un empleador que ofrece igualdad de oportunidades y no discrimina por motivos de raza, origen nacional, género, identidad de género, orientación sexual, discapacidad, edad u otra condición legalmente protegida. Si desea solicitar una adaptación, notifique a su Reclutador. Apply to this Job