Design and develop robust, scalable, high-performance data pipelines and ETL processes to extract, transform, and load data from various sources into our data warehouse or data lake.
Collaborate with stakeholders to understand their data requirements, and design and implement appropriate data models and database schemas to support their needs.
Optimize data pipelines and ETL processes for performance and efficiency, ensuring timely and accurate data delivery to end-users.
Monitor, troubleshoot, and resolve issues related to data quality, data consistency, and data integrity, ensuring the reliability and correctness of our data systems.
Implement and maintain data governance practices and policies, ensuring compliance with data privacy and security regulations.
Collaborate with data scientists and analysts to provide them with the necessary data infrastructure and tools for conducting advanced analytics and deriving insights.
Stay up to date with the latest trends and technologies in data engineering and recommend innovative solutions to improve data engineering processes and systems.
Document data engineering processes, data flows, and system architectures to ensure knowledge sharing and maintain an up-to-date repository of technical documentation.
Work closely with cross-functional teams, including software engineers and infrastructure teams, to optimize data infrastructure and ensure its seamless integration with other systems.
Requirements:
5+ years of proven experience in Data warehouse.
Bachelor’s degree in computer science, Engineering, or a related field.
Proven experience as a Data Engineer or in a similar role, working with large-scale data processing and ETL pipelines.
Strong programming skills in languages such as Python, Java, or Scala, with experience in data manipulation and processing frameworks like Apache Spark.
Experience with SQL and database technologies (e.g., relational databases, SQL queries, data modeling).
Proficiency in working with big data technologies such as Hadoop, and Hive and knowledge of distributed systems and cloud computing platforms (e.g., AWS, Azure, GCP).
Familiarity with data integration and workflow management tools such as Apache Airflow.
Knowledge of data warehousing concepts and experience with data warehousing solutions is highly desirable.
Strong analytical and problem-solving skills, with the ability to analyze complex data-related issues and propose effective solutions.
Excellent communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
Attention to detail and a strong commitment to delivering high-quality work within established timelines.