Job Description

We are seeking a Data Engineer to join our data team as we build an enterprise data platform from the ground up. This role is crucial to our mission of transforming raw data into valuable insights that drive decision-making across the organization. You will be responsible for developing, managing, and optimizing data pipelines, data models, and ensuring data quality. The ideal candidate will have a solid background in SQL, Python, and experience with AWS services. Knowledge of both open source and cloud based environments is a requirement.

Key Responsibilities:
 

  • Data Pipeline Development:
  • Design, develop, and maintain scalable ETL processes to support data ingestion and transformation
  • Utilize Apache Kafka AND AWS DMS for real-time Change Data Capture (CDC) to ensure that data streams are continuously and reliably ingested into our data lake.
  • Data Transformation:
  • Leverage Apache Spark for large-scale data processing and transformation tasks within the data lake
  • Optimize transformation jobs for performance and cost-efficiency, ensuring that transformations are both scalable and reliable
  • Data Modelling:
  • Use dbt to create robust and reusable data models that align with the business needs of various departments
  • Implement data marts and dimensional models to support analytical queries and reporting
  • Data Lake Management:
  • Manage and optimize our data lake on AWS using services like EMR, Glue, and S3, ensuring efficient data storage, retrieval, and transformation
  • Implement best practices for organizing and managing raw, processed, and curated data within the lake, with a focus on scalability and future growth
  • Change Data Capture (CDC):
  • Develop and manage CDC processes using Kafka, ensuring that our data lake and warehouse reflect the most up-to-date data
  • Work closely with database administrators and other stakeholders to ensure that CDC implementations are reliable and performant
  • Data Warehousing:
  • Develop and maintain scalable data warehouses using PostgreSQL or ClickHouse
  • Ensure that the warehouse architecture supports efficient querying, reporting, and analysis
  • Metadata Management:
    • Design and maintain a comprehensive metadata layer that facilitates dynamic pipeline creation and improves data discoverability and governance
    • Implement data cataloguing solutions to ensure that data assets are easily searchable and well-documented
  • Collaboration & Communication:
    • Collaborate with multiple departments across the organization to gather data requirements and translate them into technical specifications
    • Clearly communicate complex technical concepts to non-technical stakeholders, ensuring alignment on project goals and outcomes
  • R&D:
    • Continuously explore and evaluate new technologies, tools, and methodologies to enhance the efficiency, scalability, and reliability of the data platform
    • Stay updated with industry trends and best practices in data engineering, and apply this knowledge to improve our data architecture
  • Governance & Quality:
    • Implement and enforce data governance policies to ensure data quality, security, and compliance with internal and external regulations
    • Develop and maintain data quality checks, ensuring that data in the lake and warehouse is accurate, consistent, and reliable

Required Skills & Qualifications:
 

  • 2-3 years of experience as a Data Engineer or in a similar role
  • Proficient in SQL and Python
  • Experience with Spark is a plus
  • Hands-on experience with AWS services, especially EMR, Glue, S3, CloudWatch, and AWS DMS
  • Dbt experience / knowledge would be a huge plus.
  • Experience / knowledge of building scalable applications from ground up, including server configurations and platform architecture.
  • Familiarity with data warehousing solutions, particularly PostgreSQL or ClickHouse
  • Strong understanding of ETL processes and data pipeline development
  • Experience with streaming data processes like kafka would be a huge plus.
  • Experience with dbt or a strong willingness to learn and apply it
  • Knowledge of Change Data Capture (CDC) processes and data organization within a data lake
  • Understanding of data cataloging and metadata management
  • Excellent communication skills, with the ability to collaborate across departments and explain complex concepts clearly
  • Self-motivated, with strong problem-solving skills and a passion for continuous learning and improvement

Job Summary

  • Published on:2024-08-16 5:35 am
  • Vacancy:1
  • Employment Status:Full Time
  • Experience:2 Years
  • Job Location:Lahore
  • Gender:No Preference
  • Application Deadline:2024-12-23