Job Description
Gigster is seeking a Data Engineer to enhance and maintain data pipelines that feed into Natural Language Processing (NLP) models. The Data Engineer will work on cutting-edge projects, building enterprise software on cutting-edge technology.
The Role Involves:
- Designing, building, and optimizing scalable ETL/ELT data pipelines using Apache Spark, Apache Kafka, and orchestration tools such as Prefect or Airflow.
- Integrating external data sources and public APIs with internal data systems.
- Working with large-scale datasets to support NLP model training and inference.
- Analyzing existing pipelines and recommending enhancements for performance, reliability, and scalability.
- Collaborating with cross-functional teams, including data scientists and ML engineers.
- Owning the end-to-end engineering process—from planning and technical design to implementation.
- Regularly reporting progress and outcomes to client stakeholders.
Requirements:
- Proficiency in Python and experience with data transformation and data engineering best practices.
- Strong experience with Apache Spark, Apache Kafka, and Google Cloud Platform (GCP).
- Hands-on experience with workflow orchestration tools (e.g., Prefect, Airflow).
- Demonstrated experience working with large datasets and real-time data processing.
- Experience building and maintaining ETL/ELT pipelines for analytical or machine learning use cases.
- Self-motivated, with excellent communication and project ownership skills.
The Role Offers:
- Opportunity to work on cutting-edge projects.
- Remote work environment.
- Part-time, short-term contract (4-6 weeks).