Job Description
Seamless.AI is seeking a Principal Data Engineer to design, develop, and maintain ETL pipelines. The ideal candidate will have experience with Python, Spark, AWS Glue, and large datasets. The role involves collaborating with cross-functional teams to understand data requirements and implement data integration strategies.
Responsibilities:
- Design, develop, and maintain robust and scalable ETL pipelines.
- Collaborate with cross-functional teams to understand data requirements.
- Implement data transformation logic using Python.
- Utilize AWS Glue to manage ETL jobs and data catalogs.
- Optimize ETL processes for improved performance and scalability.
- Apply methodologies for data matching, deduplication, and aggregation.
- Implement and maintain data governance practices.
- Explore and adopt new technologies to enhance data processing efficiency.
Requirements:
- Strong proficiency in Python and experience with related libraries.
- Hands-on experience with AWS Glue or similar ETL tools.
- Solid understanding of data modeling and data warehousing principles.
- Expertise in working with large data sets and distributed computing frameworks.
- Strong proficiency in SQL.
- Familiarity with data matching, deduplication, and aggregation methodologies.
- Experience with data governance, data security, and privacy practices.
- Excellent communication and collaboration skills.
- Highly organized and self-motivated.
- Bachelor's degree in Computer Science or related field.
- 7+ years of experience as a Data Engineer.
- Professional experience with Spark and AWS pipeline development.
Seamless.AI offers:
- Opportunity to work with cutting-edge technologies.