BlogPublicar vacante
Crea un CV

Buscar empleo

Limpiar filtros

Bolsa de trabajo software engineer jr en CDMX (Ciudad de México DF) tiempo completo Presencial - OCC

1 resultados

Ordenar por: Relevancia

Relevancia

Fecha

Staff Engineer - Application SRE

Sueldo no mostrado por la empresa

The Role: We are looking for an experienced Staff Software Engineer - Application SRE to join our Site Reliability Engineering (SRE) team. In this role, you will be responsible for ensuring the reli ...

nir yu
CDMX
  • 1

Hace 3 días

Staff Engineer - Application SRE

Si el reclutador te contacta podrás conocer el sueldo

nir yu en

Esta es una vacante externa, deberás completar el proceso en el sitio de la empresa.

Sobre el empleo

Categoría: Ingeniería
Subcategoría: Ingeniería electromecánica
Educación mínima requerida:

Detalles

Horario:

Tiempo completo

Espacio de trabajo:

Presencial

Descripción

The Role:

We are looking for an experienced Staff Software Engineer - Application SRE to join our Site Reliability Engineering (SRE) team. In this role, you will be responsible for ensuring the reliability, availability, and performance of our mission-critical applications. You will leverage your software engineering skills to build tools, automate processes, and collaborate with cross-functional teams to deliver high-performance, scalable systems. As a senior member of the team, you will take ownership of application reliability and guide other engineers in following best practices for maintaining high-quality services. In this senior capacity you will also be leading and mentoring teams directly and in-directly.

Responsibilities:
  • Understanding and documenting the performance and scalability non-functional requirements, including SLI/SLOs. Validating requirements with business stakeholders.
  • Manage SLI/SLOs of customer-facing interfaces as well as backend services and provide improvement plans for non-compliance.
  • Develop custom dashboards in observability platforms (New Relic/Dynatrace/Grafana etc.) to represent a holistic view of system operational health
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Support release engineering by providing automation support as well as pushing changes to production when manual intervention needed
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large distributed software applications
  • Take ownership and deliver reliability initiatives end to end leading teams directly and indirectly.


Key Responsibilities:
  • Application Reliability Engineering:
    • Lead efforts to design and implement systems that ensure the high availability, scalability, and reliability of critical applications.
  • Incident Management:
    • Drive incident response, root cause analysis, and remediation for application-related issues, ensuring rapid resolution and preventing recurrence.
  • Problem Management:
    • Conduct 5-why analysis on issues related to application design, code, and configuration to arrive at the best possible cause and solution for arresting them.
  • Automation and Tooling:
    • Develop and maintain automation tools to improve application deployment, monitoring, and scaling, minimizing manual work and reducing time-to-recovery during incidents.
  • Performance Tuning:
    • Analyze and resolve application performance bottlenecks, collaborating with developers to optimize code and infrastructure to improve response times and throughput.
  • Monitoring and Observability:
    • Architect and implement robust monitoring, logging, and alerting systems to gain deep visibility into application performance and health. Use tools such as Prometheus, Grafana, Datadog, or New Relic.
  • Service-Level Objectives (SLOs):
    • Establish and maintain service-level objectives (SLOs) and indicators (SLIs) that ensure operational excellence, working with stakeholders to balance reliability and innovation.
  • Collaboration with Engineering Teams:
    • Work closely with software development teams to embed SRE best practices into the application lifecycle, ensuring reliability is built into all stages of development.
  • Capacity Planning and Scalability:
    • Monitor application traffic and infrastructure capacity, proactively scaling systems to handle growth and ensure smooth application operation during peak loads.
  • Mentorship and Leadership:
    • Mentor junior SREs and software engineers on best practices for reliability engineering and foster a culture of continuous improvement.
  • Continuous Improvement:
    • Lead post-incident reviews and retrospectives, driving improvements to system architecture, operational practices, and incident response processes.


Required Skills and Experience:
  • 10+ years of experience in software engineering, site reliability engineering, or a related role.
  • Proficiency in at least one programming language (e.g., Java, Go, Node) and strong scripting skills (e.g., Bash, Python).
  • Hands-on experience frameworks such as Spring boot, React
  • Hands-on experience with monitoring, observability, and logging tools (e.g., Prometheus, Grafana, Datadog, New Relic) to track system performance and health.
  • Strong experience with cloud platforms (AWS, Google Cloud, Azure) and cloud-native architectures, including containers and orchestration tools (e.g., Kubernetes, Docker).
  • Expertise in building and managing CI/CD pipelines and infrastructure as code (IaC) tools such as Terraform, Ansible, or CloudFormation.
  • Troubleshooting experience, including troubleshooting complex distributed systems in production, performing root cause analysis, and developing remediation strategies.
  • Familiarity with microservice architectures and distributed systems at scale.
  • Strong understanding of containers, networking, databases, and performance optimization techniques.
  • Excellent communication and collaboration skills, with the ability to work effectively across teams and mentor others.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.


Preferred Qualifications:
  • Experience with multi-cloud or hybrid-cloud environments.
  • Knowledge of security best practices for applications and cloud infrastructure.
  • Experience leading blameless postmortems and implementing long-term fixes.
  • Familiarity with database management systems (SQL, NoSQL) and caching technologies.


Category Technology Locations Mexico City Remote status Hybrid Employment type Full-time
Recuerda que ningún reclutador puede pedirte dinero a cambio de una entrevista o un puesto. Asimismo, evita realizar pagos o compartir información financiera con las empresas.

ID: 20519582

También puedes buscar

Desarrollador Java

Ingeniero De Sistemas De Software

Ingeniero De Software Integrado

VER MÁS

También puedes buscar

Desarrollador Java

Ingeniero De Software Integrado

Ingeniero De Software Java

Ingeniero De Sistemas De Software

Ingeniero De Desarrollo De Software

Ingeniero De Tecnología

VER MÁS

Refina la ubicación de tu búsqueda

México

Refina la ubicación de tu búsqueda

México

Candidatos
Crea un CV
Inicia sesión
Preguntas frecuentes candidatos
ios
android
Empleos por clasificación
Vacantes por Estado
Vacantes por Ciudad
Vacantes por Categoría
Vacantes más buscadas
Vacantes por Contrato
Vacantes por Empresa
Buscar empleo en México y el mundo
Empresas
Busco talento / Publicar Anuncio
Ayuda para reclutadores
Preguntas frecuentes de reclutadores
OCC
Acerca de OCC
Blog
Trabaja en OCC
Ayuda

OCC D.R. © 1996-2025 Derechos reservados. Versión del sitio candy-serp@