Roles and Responsibilities
- Design and implement scalable, reliable, and efficient data pipelines for ingesting, processing, and storing large amounts of data from a variety of sources using cloud-based technologies, Python, and PySpark.
- Build and maintain data lakes, data warehouses, and other data storage and processing systems on the cloud.
- Write and maintain ETL/ELT jobs and data integration scripts to ensure smooth and accurate data flow.
- Implement data security and compliance measures to protect data privacy and ensure regulatory compliance.
- Collaborate with data scientists and analysts to understand their data needs and provide them with access to the required data.
- Stay up-to-date on the latest developments in cloud-based data engineering, particularly in the context of Azure, AWS and GCP, and proactively bring new ideas and technologies to the team.
- Monitor and optimize the performance of data pipelines and systems, identifying and resolving any issues or bottlenecks that may arise.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
- Minimum of 5 years of experience as a Data Engineer, with a strong focus on cloud-based data infrastructure.
- Proficient programming skills in Python, Java, or a similar language, with an emphasis on Python.
- Extensive experience with cloud-based data storage and processing technologies, particularly Azure, AWS and GCP.
- Familiarity with ETL/ELT tools and frameworks such as Apache Beam, Apache Spark, or Apache Flink.
- Knowledge of data modeling principles and experience working with SQL databases.
- Strong problem-solving skills and the ability to troubleshoot and resolve issues efficiently.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
Location: True Digital Park, Bangkok (Hybrid working)