SRE Engineer
Posted 2025-10-26
Remote, USA
Full Time
Immediate Start
This SRE will ensure the reliability, performance, and scalability of our MarTech SaaS platform that serves millions of users running thousands of marketing campaigns daily. They'll be responsible for monitoring systems, responding to incidents, and implementing automation to improve platform reliability. About AnsiraAnsira is a leading marketing technology company dedicated to helping brands connect with customers and grow their businesses. Our platform integrates internal and external teams across channels, markets, and regions to deliver impactful brand-to-local growth strategies. At Ansira, we empower companies by optimizing marketing performance through AI-powered technology, growing partner ecosystems, cultivating brand loyalty, and ensuring profitable client growth. We serve a variety of industries, including financial services, retail, automotive, and technology. About the RoleJoin our growing organization as a Site Reliability Engineer and help ensure the reliability and performance of our SaaS platform that serves millions of users executing thousands of marketing campaigns every day. You'll be joining a lean, high-impact team where your work directly influences the experience of our customers and the success of their marketing efforts. This is a remote-first position where you'll play a crucial role in maintaining and improving the reliability, scalability, and performance of our mission-critical systems. What You'll DoMonitor & Alert: Design, implement, and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, andDataDog to ensure early detection of issues and optimal system performanceIncident Response: Lead incident response efforts, conduct root cause analyses, and implement preventive measures to reduce future occurrencesAutomation: Build and maintain automation tools and processes to reduce manual work, improve deployment reliability, and enhance system resilienceReliability Engineering: Identify and implement reliability improvements across our platform, working closely with development teams to embed best practicesCapacity Planning: Monitor system performance trends and plan for scaling needs to support our growing user base and campaign volumeDocumentation: Create and maintain runbooks, procedures, and system documentation to support the team and improve knowledge sharingWhat We're Looking ForRequired:3+ years of hands-on experience in site reliability engineering, DevOps, or similar roles with focus on monitoring and reliability improvementsStrong knowledge of SRE best practices including SLIs/SLOs, error budgets, and reliability engineering principlesCloud Platform experience with services like Compute Engine, Kubernetes, Cloud SQL, and related infrastructure componentsDataDogor similar expertise for monitoring, alerting, and observabilityBackend development experience with Java, PHP and/or Node.js to understand and troubleshoot application-level issuesIncident management skills including on-call experience, troubleshooting under pressure, and post-incident review processesAutomation mindset with experience in scripting and Infrastructure as Code principlesPreferred:SaaS platform experience, particularly in high-volume environments serving millions of usersMarTech or AdTech industry background with understanding of campaign management systemsExperience scaling systems that handle thousands of concurrent operationsCI/CD pipeline experience and deployment automationSecurity best practices knowledge for cloud environmentsWhat We OfferRemote-first culture with flexible working arrangementsHigh-impact role in a small, collaborative team where your contributions directly matterGrowth opportunities as we scale our platform and expand our engineering teamCompetitive compensation and benefits packageLearning budget for professional development and certificationsModern tech stack with opportunities to work with cutting-edgesolutionsOur EnvironmentYou'll be working with systems that process millions of user interactions daily across thousands of active marketing campaigns. Our platform operates at significant scale, requiring robust monitoring, quick incident response, and continuous reliability improvements. As part of a small cross-functional team, you'll have the opportunity to make a substantial impact on both our technical infrastructure and our growing engineering culture. Ready to Apply? We're looking for someone who thrives in a fast-paced environment, enjoys solving complex technical challenges, and wants to help build reliable systems that power successful marketing campaigns for our customers. Please submit your resumeexplaining:Your relevant SRE/reliability engineering experienceExamples of monitoring and automation improvements you've implementedWhy you're interested in joining a MarTech companyOriginally posted on Himalayas Apply To this Job