Job Description
Softermii is seeking a part-time Data Engineering Consultant/Tech Lead to contribute to their expanding portfolio. The consultant will be responsible for technical interviews, assisting with upcoming projects, and providing hands-on support for complex development tasks, particularly in data pipeline design and solution optimization on Databricks. The role involves supervising other engineers, unblocking technical difficulties, and ensuring data quality and governance.
Responsibilities include:
- Interview and hire Data Engineers
- Supervise work of other Engineers and have hands on for the most complicated tasks from backlog, focus on unblocking other data Engineers in case of technical difficulties
- Develop and maintain scalable data pipelines using Databricks (Apache Spark) for batch and streaming use cases.
- Work with data scientists and analysts to provide reliable, performant, and well-modeled data sets for analytics and machine learning.
- Optimize and manage data workflows using Databricks Workflows and orchestrate jobs for complex data transformation tasks.
- Design and implement data ingestion frameworks to bring data from various sources (files, APIs, databases) into Delta Lake.
- Ensure data quality, lineage, and governance using tools such as Unity Catalog, Delta Live Tables, and built-in monitoring features.
- Collaborate with cross-functional teams to understand data needs and support production-grade machine learning workflows.
- Apply data engineering best practices: versioning, testing (e.g., with pytest or dbx), documentation, and CI/CD pipelines
Requirements include:
- 5+ years of experience in data engineering or big data development, with production-level work.
- Architect and develop scalable data solutions on the Databricks platform, leveraging Apache Spark, Delta Lake, and the lakehouse architecture to support advanced analytics and machine learning initiatives.
- Design, build, and maintain production-grade data pipelines using Python (or Scala) and SQL, ensuring efficient data ingestion, transformation, and delivery across distributed systems.
- Lead the implementation of Databricks features such as Delta Live Tables, Unity Catalog, and Workflows to ensure secure, reliable, and automated data operations.
- Optimize Spark performance and resource utilization, applying best practices in distributed computing, caching, and tuning for large-scale data processing.
- Integrate data from cloud-based sources (e.g., AWS S3), ensuring data quality, lineage, and consistency throughout the pipeline lifecycle.
- Manage orchestration and automation of data workflows using tools like Airflow or Databricks Jobs, while implementing robust CI/CD pipelines for code deployment and testing.
- Collaborate cross-functionally with data scientists, analysts, and business stakeholders to understand data needs and deliver actionable insights through robust data infrastructure.
- Mentor and guide junior engineers, promoting engineering best practices, code quality, and continuous learning within the team.
- Ensure adherence to data governance and security policies, utilizing tools such as Unity Catalog for access control and compliance.
- Continuously evaluate new technologies and practices, driving innovation and improvements in data engineering strategy and execution.
- Experience in designing, building, and maintaining data pipelines using Apache Airflow, including DAG creation, task orchestration, and workflow optimization for scalable data processing.
- Upper-Intermediate English level.
Softermii offers:
- Stable and highly-functioning processes with clear roles and responsibilities.
- Independence in decision-making.
- A team of like-minded experts.
- 50% coverage of the cost of courses/conferences/speaking clubs.
- Individual development plan and mentoring.
- Referral bonus system.
- 21 working days off.
- 5 sick leaves a year.