PayPay is seeking a Data Engineer to join its Data Insights department. This role is crucial for driving product improvements through engineering systems that scientifically understand user and merchant behavior. The Data Engineer will be responsible for designing, developing, and maintaining scalable data ingestion pipelines using a variety of AWS services and tools.
The Data Engineer will work with cross-functional teams, ensuring seamless data flow and integration across the organization. They will implement best practices for observability, data governance, security, and compliance. Collaboration and communication skills are essential for success in this fast-paced environment.
Responsibilities:
- Design, develop, and maintain scalable data ingestion pipelines using AWS Glue, Step Functions, Lambda, and Terraform
- Design, build, and maintain infrastructure to continuously support the improvement and deployment of ML models
- Optimize and manage large scale data pipelines to ensure high performance, reliability, and efficiency
- Implement data processing workflows using Hudi, Delta Lake, Spark, and Scala
- Maintain and enhance Lakeformation and Glue Data Catalog for effective data management and discovery
- Collaborate with cross-functional teams to ensure seamless data flow and integration across the organization
- Implement best practices for observability, data governance, security, and compliance
Requirements:
- 5+ years experience as a Data Engineer or in a similar role
- Familiarity with building machine learning systems, or data infrastructures supporting machine learning development and deployment is preferable
- Hands-on experience with Apache Hudi, Delta Lake, Spark, and Scala
- Experience designing, building, and operating a DataLake or Data Warehouse
- Knowledge of Data Orchestration tools such as Airflow, Dagster, Prefect
- Strong expertise in AWS services, including Glue, Step Functions, Lambda, and EMR
- Familiarity with change data capture tools like Canal, Debezium, and Maxwell
- Experience with data warehousing tools like AWS Athena, BigQuery, Databricks
- Proficiency in Python and SQL (any variant), preferably experience in Scala and/or Java
- Experience with data cataloging and metadata management using AWS Glue Data Catalog, Lakeformation, or Unity Catalog
- Proficiency in Terraform for infrastructure as code (IaC)
- Overall understanding of machine learning technologies and deep learning concepts
- Strong problem-solving skills and ability to troubleshoot complex data issues
- Excellent communication and collaboration skills
- Ability to work in a fast-paced, dynamic environment and manage multiple tasks simultaneously
PayPay offers:
- Full-time employment with WFA (Work From Anywhere at Anytime) arrangement.
- Super Flex Time, with no core time.
- Paid leave, including annual leave and personal leave.
- Social Insurance
- 401K
- Translation/Interpretation support
- VISA sponsor + Relocation support