Browse All Jobs
Job Description

Gigster is seeking a Data Engineer to enhance and maintain data pipelines that feed into Natural Language Processing (NLP) models. The Data Engineer will work on cutting-edge projects, building enterprise software on cutting-edge technology.

The Role Involves:

  • Designing, building, and optimizing scalable ETL/ELT data pipelines using Apache Spark, Apache Kafka, and orchestration tools such as Prefect or Airflow.
  • Integrating external data sources and public APIs with internal data systems.
  • Working with large-scale datasets to support NLP model training and inference.
  • Analyzing existing pipelines and recommending enhancements for performance, reliability, and scalability.
  • Collaborating with cross-functional teams, including data scientists and ML engineers.
  • Owning the end-to-end engineering process—from planning and technical design to implementation.
  • Regularly reporting progress and outcomes to client stakeholders.

Requirements:

  • Proficiency in Python and experience with data transformation and data engineering best practices.
  • Strong experience with Apache Spark, Apache Kafka, and Google Cloud Platform (GCP).
  • Hands-on experience with workflow orchestration tools (e.g., Prefect, Airflow).
  • Demonstrated experience working with large datasets and real-time data processing.
  • Experience building and maintaining ETL/ELT pipelines for analytical or machine learning use cases.
  • Self-motivated, with excellent communication and project ownership skills.

The Role Offers:

  • Opportunity to work on cutting-edge projects.
  • Remote work environment.
  • Part-time, short-term contract (4-6 weeks).
Apply Manually