Lead Site Reliability Engineer-GCP Ops, Terraform, Python, etc.

Número de la requisición: 2311848
Categoría de la vacante: Technology
Ubicación: Bangalore, Karnataka

Man standing and writing on a white board while presenting to coworkers in a meeting room.

Trabajos con UnitedHealth Group

Estamos creando oportunidades en cada rincón del mercado de salud para mejorar vidas mientras construimos carreras. Y eso significa oportunidades de crecimiento profesional continuo para usted. Mientras le apoyamos con las últimas herramientas, capacitación avanzada y la fuerza unida de los compañeros de trabajo de alto calibre, usted puede continuar siguiendo el camino del mejor trabajo de su vida. SM

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

 

Position Overview:

We are seeking a motivated and detail-oriented Site Reliability Engineer (SRE) to help us improve the reliability, scalability, and performance of our systems. As an SRE, you will collaborate with cross-functional teams to design, build, and maintain the infrastructure and tools that support our applications. This is an excellent opportunity for someone who is passionate about DevOps, automation, and cloud-native technologies.

 

Primary Responsibilities:

  • Design, deploy, and maintain Kubernetes-based infrastructure to ensure high availability and scalability of applications
  • Build and manage CI/CD pipelines using GitHub Actions to enable fast and reliable deployments
  • Use Terraform to provision and manage infrastructure in Google Cloud Platform (GCP)
  • Manage and optimize Apache Kafka-based systems to ensure reliable message streaming and data processing.
  • Monitor and improve system performance and reliability using Prometheus and Grafana
  • Collaborate with developers to automate workflows and implement best practices for infrastructure-as-code (IaC)
  • Write Python scripts for automation and tooling to enhance operational efficiency
  • Troubleshoot and resolve system issues to minimize downtime and impact on users
  • Participate in on-call rotations and incident response to ensure high service reliability

     

  • Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so

Required Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent work experience)
  • 1+ years of experience in DevOps, SRE, or related roles (internships and project experience are acceptable for entry-level candidates)
  • Hands-on experience with Kubernetes for deploying and managing containerized applications
  • Experience with Apache Kafka for building, maintaining, and troubleshooting message-driven systems
  • Experience using Prometheus and Grafana for monitoring and observability
  • Familiarity with Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, and Cloud Storage
  • Understanding of GitHub Actions for creating and maintaining CI/CD pipelines
  • Basic to intermediate knowledge of Terraform for infrastructure provisioning and management
  • Proficiency in Python for scripting, automation, and tooling
  • Proven solid problem-solving skills and an eagerness to learn new technologies
  • Proven excellent communication and teamwork skills

 

Preferred Qualifications:

  • Experience with debugging and optimizing distributed systems
  • Experience with Golang for developing infrastructure tools or cloud-native applications
  • Familiarity with other cloud providers (e.g., AWS or Azure)
  • Knowledge of Helm for Kubernetes package management
  • Exposure to security best practices for cloud infrastructure
  • Knowledge of Java for developing and troubleshooting backend systems
  • Familiarity with DataHub or similar data cataloging and metadata management platforms
  • Understanding of Artificial Intelligence (AI) concepts and tools, such as building or managing machine learning pipelines, integrating AI models, or working with ML platforms like TensorFlow, PyTorch, or Vertex AI

 

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes – an enterprise priority reflected in our mission.

 

#NIC

Información adicional sobre la vacante

Número de la requisición 2311848

Segmento de negocio Optum Global Advantage

Disponibilidad para viajar No

País IN

Estado de horas extras Exempt

Vacante de teletrabajo No