Innovate to solve the worlds most important challengesThe Lead Data Engineer role will be part of a high-performing global team that delivers cutting-edge AI/ML data products for Honeywells Industrial customers, with a specific focus on IoT and real-time data processing. As a data engineer, you will architect and implement scalable data pipelines that power next-generation AI solutions, including Large Language Models (LLMs), autonomous agents, and real-time inference systems. You will work at the intersection of IoT telemetry data and modern AI technologies to create innovative industrial solutions.
KEY RESPONSIBILITIES
Data Engineering & AI Pipeline Development:
- Design and implement scalable data architectures to process high-volume IoT sensor data and telemetry streams, ensuring reliable data capture and processing for AI/ML workloads
- Architect and build data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows
- Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms
- Design and implement robust data integration solutions that combine industrial IoT data streams with enterprise data sources for AI model training and inference
DataOps & Governance:
- Define a mature DataOps strategies to ensure continuous integration and delivery of data pipelines powering AI solutions
- Lead efforts in data quality, observability, and lineage tracking to maintain high integrity in AI/ML datasets.
- Create self-service data assets enabling data scientists and ML engineers to access and utilize data efficiently
- Design and maintain automated documentation systems for data lineage and AI model provenance
- Ensure compliance with data governance policies, including security, privacy, and regulatory requirements for AI-driven applications
Technical Leadership & Innovation:
- Lead architectural discussions, establish standards and drive technical excellence across teams
- Partner with ML engineers and data scientists to implement efficient data workflows for model training, fine-tuning, and deployment
- Mentor data engineers on standards, best practices, and innovative approaches to build extensible and reusable solution
- Drive innovation, continuous improvement in data engineering practices and tooling
- Manage stakeholder expectations, aligning data engineering roadmaps with business and AI strategy
YOU MUST HAVE
- Minimum 8 years of hands-on experience in building data pipelines using large-scale distributed data processing tools, frameworks & platforms (Python, Spark, Databricks)
- 6+ years of extensive experience in data management concepts, including data modeling, CDC, ETL/ELT processes, data lakes, and data governance.
- 4+ years of hands-on experience with PySpark/Scala
- 4+ years of experience with cloud platforms (Azure/GCP/Databricks) particularly in implementing AI/ML solutions
WE VALUE
- Experience implementing RAG architectures and working with LLM-powered applications
- Expertise in real-time data processing frameworks (Kafka, Apache Spark Streaming, Structured Streaming)
- Knowledge of MLOps practices and experience building data pipelines for AI model deployment
- Experience with time-series databases and IoT data modeling patterns
- Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
- Strong background in data quality implementation for AI training data
- Experience with graph databases and knowledge graphs for AI applications
- Experience working with distributed teams and cross-functional collaboration
- Knowledge of data security and governance practices for AI systems
- Expertise in version control systems, CI/CD methodologies
- Experience working on analytics projects with Agile and Scrum Methodologies