Job Description
Are you a budding data enthusiast eager to put your skills to the test and gain hands-on experience in the dynamic field of data engineering? This is your chance!
Our 6-month Paid Internship Data Engineer opportunity offers a chance to dive deep into the world of data with a vibrant and innovative team.
At Xref (XF1) we use a modern tech stack including:
- Data Storage - Mysql, Postgres, Redshift, Elasticsearch, Mongodb, Dynamodb, S3
- Visualisation Tools - Tableau and Salesforce CRM Analytics
- ETL - EC2 + python + airflow
- Ad hoc Analytics Analysis (Tableau), Unsupervised Learning (sklearn), Supervised Learning (sklearn, keras, pytorch), NLP (NLTK, Spacy, LLMs)
Qualifications
What We're Looking For:
- Passion for Data: We're seeking individuals with a genuine interest in data and a willingness to learn.
- Basic Python Skills: Familiarity with Python fundamentals is a plus, but a strong desire to develop these skills is just as valuable.
- Curiosity: A desire to explore data and discover insights that can drive decisions.
- Strong Work Ethic: We value commitment and a strong work ethic as you dive into the world of data engineering.
We would expect from you:
- “Real life” data may not always be clean and ready to be consumed. You’ll have to massage it. You’ll have to clean it. You may have null values, dups and who knows what more. We expect you to be able to explain quality checks and describe how to get your data ready to be consumed by data products.
- Dimensions can be scattered in different tables/objects. You should be able to describe how to acquire information from other tables and objects considering that they can be in different granularities. (which can compromise the computation of aggregated metrics)
- Aggregations… you should be able to describe how to aggregate/compose metrics given you have granular data not at the level of granularity you need. Common aggregations are sum, max, and so on.
- You should have the ability to describe the application of CTEs