Intermediate Data Engineer
Sandton, GP, South Africa
Data Engineer
Hybrid / Sandton Jhb
Job Purpose
We are seeking a talented and experienced Data Engineer to join our MLOps team which drives critical business applications. As a key member of our team, you will play a crucial role in designing, building, testing, deploying, and monitoring end-to-end data pipelines for both batch and streaming use cases. You will work closely with data scientists, actuaries, software engineers, and other data engineers to contribute to architecting our Client's modern Machine Learning ecosystem.
Areas of responsibility may include but not limited to:
Data Pipeline Development:
- Design, build, and maintain ETL pipelines for both batch and streaming use cases.
- Optimize and refactor existing ETL pipelines to improve efficiency, scalability, and cost-effectiveness.
- Data visualization and report building.
- Re-architecting data pipelines for a modern data stack leveraging modern data tools to support actuarial, machine learning, and AI use cases.
Technology Stack:
- Utilize expertise in Python and SQL for data pipeline development.
- Using Linux and shell scripting for system automation.
- Hands-on experience working with Docker and container orchestration tools is advantageous.
- Knowledge of Spark is advantageous.
Platforms and Tools:
- Experience working with ETL tools such as Azure Data Factory, dbt, Airflow, Step Functions, etc.
- Using Databricks, Kafka and Spark Streaming for big data processing across multiple data sources.
- Working with both relational and NoSQL databases. Knowledge of and experience with high-performance in-memory databases is advantageous.
DevOps and Automation:
- Working with Azure DevOps to automate workflows and collaborate with cross-functional teams.
- Familiarity with Terraform for managing infrastructure as code (IaC) is advantageous.
- Experience working on other big data platforms could be advantageous.
- Create and maintain documentation of processes, technologies, and code bases.
Collaboration:
- Collaborate closely with data scientists, actuaries, software engineers, and other data engineers to understand and address their data needs.
- Contribute actively to the architecture of our Client's modern Machine Learning data ecosystem.
Personal Attributes and Skills
- Strong proficiency in Python, SQL, and Linux shell scripting.
- Experience with Spark is advantageous.
- Previous exposure to ETL tools, relational and NoSQL databases and big data platforms, with experience in Databricks and Azure Data Factory being highly beneficial.
- Knowledge of DevOps practices and tools, with experience in Azure DevOps being highly beneficial.
- Familiarity with Terraform for infrastructure automation.
- Ability to collaborate with cross-functional tech teams as well as business/product teams.
- Ability to architect data pipelines for advanced analytics use cases.
- A willingness to embrace a strong DevOps culture.
- Excellent communication skills.
- Commitment to excellence and high-quality delivery.
- Passion for personal development and growth, with a high learning potential.
Education and Experience
- Bachelor's or Masters degree in Computer Science, Engineering or a related field. Other qualifications will be considered if accompanied by sufficient experience in data engineering.
- At least 3 years of proven experience as a Data Engineer.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Airflow Architecture Azure Big Data Computer Science Databricks Data pipelines Data visualization dbt DevOps Docker Engineering ETL Kafka Linux Machine Learning MLOps NoSQL Pipelines Python Shell scripting Spark SQL Step Functions Streaming Terraform Testing
Perks/benefits: Career development
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Business Intelligence Engineer jobs
- Open Data Engineer II jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Science Intern jobs
- Open Junior Data Scientist jobs
- Open Lead Data Analyst jobs
- Open Business Intelligence Developer jobs
- Open Data Scientist II jobs
- Open Data Science Manager jobs
- Open Business Data Analyst jobs
- Open Sr. Data Scientist jobs
- Open Marketing Data Analyst jobs
- Open Principal Data Scientist jobs
- Open Research Scientist jobs
- Open Data Analytics Engineer jobs
- Open Sr Data Engineer jobs
- Open MLOps Engineer jobs
- Open Data Analyst Intern jobs
- Open Azure Data Engineer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Engineering Manager jobs
- Open Junior Data Engineer jobs
- Open ETL Developer jobs
- Open GCP-related jobs
- Open Data quality-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open PhD-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open NLP-related jobs
- Open PyTorch-related jobs
- Open Finance-related jobs
- Open TensorFlow-related jobs
- Open APIs-related jobs
- Open LLMs-related jobs
- Open Generative AI-related jobs
- Open Consulting-related jobs
- Open Snowflake-related jobs
- Open Hadoop-related jobs
- Open CI/CD-related jobs
- Open Data governance-related jobs
- Open Kubernetes-related jobs
- Open Airflow-related jobs
- Open Data warehouse-related jobs