Data Engineer

Job Description

We are seeking a Data Engineer to join our data team as we build an enterprise data platform from the ground up. This role is crucial to our mission of transforming raw data into valuable insights that drive decision-making across the organization. You will be responsible for developing, managing, and optimizing data pipelines, data models, and ensuring data quality. The ideal candidate will have a solid background in SQL, Python, and experience with AWS services. Knowledge of both open source and cloud based environments is a requirement.

Key Responsibilities:

Data Pipeline Development:
Design, develop, and maintain scalable ETL processes to support data ingestion and transformation
Utilize Apache Kafka AND AWS DMS for real-time Change Data Capture (CDC) to ensure that data streams are continuously and reliably ingested into our data lake.
Data Transformation:
Leverage Apache Spark for large-scale data processing and transformation tasks within the data lake
Optimize transformation jobs for performance and cost-efficiency, ensuring that transformations are both scalable and reliable
Data Modelling:
Use dbt to create robust and reusable data models that align with the business needs of various departments
Implement data marts and dimensional models to support analytical queries and reporting
Data Lake Management:
Manage and optimize our data lake on AWS using services like EMR, Glue, and S3, ensuring efficient data storage, retrieval, and transformation
Implement best practices for organizing and managing raw, processed, and curated data within the lake, with a focus on scalability and future growth
Change Data Capture (CDC):
Develop and manage CDC processes using Kafka, ensuring that our data lake and warehouse reflect the most up-to-date data
Work closely with database administrators and other stakeholders to ensure that CDC implementations are reliable and performant
Data Warehousing:

Develop and maintain scalable data warehouses using PostgreSQL or ClickHouse
Ensure that the warehouse architecture supports efficient querying, reporting, and analysis
Metadata Management:
- Design and maintain a comprehensive metadata layer that facilitates dynamic pipeline creation and improves data discoverability and governance
- Implement data cataloguing solutions to ensure that data assets are easily searchable and well-documented
Collaboration & Communication:
- Collaborate with multiple departments across the organization to gather data requirements and translate them into technical specifications
- Clearly communicate complex technical concepts to non-technical stakeholders, ensuring alignment on project goals and outcomes
R&D:
- Continuously explore and evaluate new technologies, tools, and methodologies to enhance the efficiency, scalability, and reliability of the data platform
- Stay updated with industry trends and best practices in data engineering, and apply this knowledge to improve our data architecture
Governance & Quality:
- Implement and enforce data governance policies to ensure data quality, security, and compliance with internal and external regulations
- Develop and maintain data quality checks, ensuring that data in the lake and warehouse is accurate, consistent, and reliable

Required Skills & Qualifications:

2-3 years of experience as a Data Engineer or in a similar role
Proficient in SQL and Python
Experience with Spark is a plus
Hands-on experience with AWS services, especially EMR, Glue, S3, CloudWatch, and AWS DMS
Dbt experience / knowledge would be a huge plus.
Experience / knowledge of building scalable applications from ground up, including server configurations and platform architecture.
Familiarity with data warehousing solutions, particularly PostgreSQL or ClickHouse
Strong understanding of ETL processes and data pipeline development
Experience with streaming data processes like kafka would be a huge plus.
Experience with dbt or a strong willingness to learn and apply it
Knowledge of Change Data Capture (CDC) processes and data organization within a data lake
Understanding of data cataloging and metadata management
Excellent communication skills, with the ability to collaborate across departments and explain complex concepts clearly
Self-motivated, with strong problem-solving skills and a passion for continuous learning and improvement

Job Description

Job Summary