![]()
WELCOME TO SITA
At SITA, we keep airports moving, airlines flying smoothly, and borders open. Our technology and communication innovations power the success of the global air travel industry.
Youll find us in 95 of international airports, working closely with over 2,500 transportation and government clients. Each partnership brings unique challenges, and we thrive on delivering fresh solutions and cutting-edge tech to keep operations running like clockwork. We dont just move the world forward-were proud to be recognized as a Great Place to Workยฎ by 79 of our employees and certified in most of our growing locations. Here, we feel empowered, supported, and inspired to grow.
Are you ready to love your job?
The adventure begins right here, with you, at SITA.
ABOUT THE ROLE & TEAM
Lead Site Reliability Engineer/ Expert/ Specialist you will be responsible for the proactive support of products so that there is high product performance that is continuously improved. Responsible for identifying and resolving the root causes of operational incidents, implementing solutions to improve stability and prevent recurrence. Manages the creation and maintenance of the event catalog to trigger events and develop both manual remediation approaches and automated workflows to resolve alerts. Oversee the deployment of IT services and solutions ensuring successful integration with minimal disruption. Focuses on operational automation and integration to enhance efficiency and collaboration between development and operations within service operations.
WHAT YOU WILL DO:
Define, build, and maintain support systems to ensure high availability and performance.
Handle complex cases for the PSO.
Implement automation for system provisioning, self-healing, auto-recovery, deployment, and monitoring.
Perform incident response and root cause analysis (RCA) for critical system failures.
Monitor system performance and establish Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs).
Collaborate with Development and Operations to integrate reliability best practices, including zero-downtime architecture.
Proactively identify and remediate performance issues.
Work closely with Product T&E, ICE, and Service Architects for new product productization as SGS technical expert.
Coordinate with internal and external stakeholders to improve service performance and ensure high availability.
Ensure Operations readiness to support new products.
Accountable within SGS for in-scope product availability and performance.
Problem Management
Conduct thorough problem investigations and root cause analyses to diagnose recurring incidents and service disruptions.
Coordinate with Incident Management teams and collaborate with PSOs and Engineering/Product teams to implement permanent solutions.
Monitor effectiveness of problem resolution activities and provide regular reporting to ensure continuous improvement.
Event Management
Define, build, and maintain an event catalog specifying active events, thresholds, and remediation actions; optimize it for efficiency.
Develop event response protocols, provide training, and ensure efficient incident handling.
Customer & Operational Support
Collaborate with Customer Success Managers to implement initiatives that enhance customer satisfaction and retention.
Prepare reports, documentation, and communication materials covering customer metrics, updates, and product changes.
Identify and implement improvements in internal processes and workflows.
Contribute to knowledge management resources such as FAQs and training materials.
Data Steward Responsibilities
Implement data governance policies defined by the Data Owner and ensure adherence to standards.
Monitor data quality, consistency, and compliance on an ongoing basis.
Act as a Subject Matter Expert (SME) for data within the assigned area, providing guidance and answering queries.
Qualifications
EXPERIENCE:
- Bachelorโs degree in Computer Science, Information Technology, Engineering, or a related field.
- 10+ years of experience in IT operations, service management, or infrastructure management, including roles such as Site Reliability Engineer, Problem Manager, or DevOps Manager.
- Proven experience managing high-availability systems and ensuring operational reliability.
- Extensive experience in root cause analysis (RCA), incident management, and developing permanent solutions for recurring service disruptions.
- Hands-on experience with CI/CD pipelines, automation, system performance monitoring, and infrastructure as code (IaC).
- Strong background in collaborating with cross-functional teams (Development, Operations, Engineering, etc.) to improve operational processes and service delivery.
- Experience managing deployments, conducting risk assessments, and optimizing event and problem management processes.
- Familiarity with cloud technologies, containerization, and scalable architectures, including zero-downtime deployment strategies.
Technical Skills (Must-to-Have):
- Strong AKS & On prem K8s skills and experience,
- Scripting (Ansible & Bash, Python - combination of anything would be great),
- Automation,
- CI/CD pipeline,
- Terraform exposure,
- Azure (or) AWS skill.
- Basic DB skills.
- Strong problem-solving skills & quick learner.
- SRE mindset.
WHAT WE OFFERWere all about diversity. We operate in 200 countries and speak 60 different languages and cultures. Were really proud of our inclusive environment. Our offices are comfortable and fun places to work, and we make sure you get to work from home too. Find out what its like to join our team and take a step closer to your best life ever.
๐ก Flex Week: Work from home up to 2 days/week (depending on your teams needs)
โฐ Flex Day: Make your workday suit your life and plans.
๐ Flex-Location: Take up to 30 days a year to work from any location in the world.
๐ฟ Employee Wellbeing: We have got you covered with our Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year. We also offer Champion Health - a personalized platform that supports a range of wellbeing needs.
๐ Professional Development: At SITA, we believe growth fuels innovation. Our learning ecosystem offers access to world-class platforms and programs designed to help you thrive. From LinkedIn Learning, Microsofts Enterprise Skills Initiative, and Airport Council International -available to all employees-to specialized solutions like Pluralsight for technology upskilling, Harvard Business Publishing for people leadership, Stanford for strategic development and many others, we align learning opportunities with your Development Plan and our business priorities. Your development journey is supported every step of the way.
๐ Competitive Benefits: Competitive benefits that make sense with both your local spanet and employment status.
SITA is an Equal Opportunity Employer. We value a diverse workforce. In support of our Employment Equity Program, we encourage women, aboriginal people, members of visible minorities, and/or persons with disabilities to apply and self-identify in the application process.