Job Description

We are seeking a skilled and proactive Data Engineer to join our team. The ideal candidate will have expertise in data cleaning, advanced web scraping, and leveraging LLMs for data enhancement and extraction. This role involves building and maintaining robust ETL pipelines, ensuring data quality, implementing modern prompt engineering solutions, and maintaining data security and versioning to support dynamic business needs.

 

Key Responsibilities:

==> Data Cleaning and Preparation

  • Develop and implement processes to clean, transform, and standardize data from multiple sources.
  • Ensure high-quality and consistent data is available for analysis and modeling.

==>Web Scraping and Data Extraction

  • Design, develop, and maintain web scrapers to collect data from structured and unstructured web sources.
  • Incorporate modern LLM-based techniques for adaptive and intelligent data scraping.

==>Data Enhancement with LLMs

  • Leverage LLMs to enrich raw data by generating insights, filling gaps, and creating structured datasets.
  • Explore and implement advanced LLM capabilities to optimize data extraction and transformation workflows.

==>ETL Development and Maintenance

  • Build and manage scalable ETL pipelines to ingest, process, and store data efficiently.
  • Automate workflows to ensure timely data availability for downstream systems.

==>Data Versioning and Integrity

  • Implement data versioning strategies to track changes and maintain historical integrity.
  • Ensure data consistency across multiple versions and handle schema changes effectively.

==>Real-time Data Updates

  • Design systems to ensure that data is up-to-date and reflective of the latest information.
  • Monitor and address data latency and synchronization issues.

==>Data Security and Compliance

  • Apply industry best practices to secure sensitive data and ensure compliance with relevant regulations.
  • Proactively identify and mitigate data security risks.

==>Collaboration and Continuous Improvement

  • Collaborate with data scientists, analysts, and other engineering teams to deliver data solutions.
  • Stay updated with the latest tools and techniques in data engineering, web scraping, and LLM advancements.

 

What we are looking for:

  • Bachelor’s or master’s degree in computer science, Data Engineering, or a related field.
  • Proven experience in designing and building web scrapers, preferably with modern tools and frameworks.
  • Hands-on experience with prompt engineering and AI-driven data extraction/enhancement methodologies.
  • Strong programming skills in Python
  • Proficiency in ETL frameworks and tools (e.g., Apache Airflow).
  • Implement data governance policies to ensure data privacy and compliance with international regulations like GDPR, FERPA, and others.
  • Excellent problem-solving and communication skills.
  • Experience with cloud platforms (GCP) for data storage and pipeline management.
  • Experience with NoSQL DBs
  • Familiarity with APIs and modern search methodologies for real-time data extraction.
  • Understanding of MLOps and integrating AI/ML models into data workflows.

Job Summary

  • Published on:2024-11-22 5:47 pm
  • Vacancy:1
  • Employment Status:Full Time
  • Experience:2 Years
  • Job Location:Islamabad
  • Gender:No Preference
  • Application Deadline:2025-02-21