Intern, Data Engineer
London City, London, GB
Copyright Clearance Center
Collective licensing pioneer CCC helps you integrate, access, and share information through licensing, content, software and professional services.Job Overview:
We are looking for a Data Engineer Intern that can work with the Architecture team on some internal initiatives. These initiatives encompass exploratory work, analytics as well as nascent services and products. The Data Engineer Intern will be allocated to one of these initiatives and will work with the Architecture team.
Our analytics stack includes the use of Spark / pyspark for bulk processing, Zeppelin notebooks and Airflow for process orchestration and data profiling, graph and relational databases for storage, R for visualization, and a variety of techniques for statistical analyses and machine learning.
The individual must possess oral and written English communications skills and will gain experience of working with a cross-functional engineering team.
Experience with AWS is a plus. n
Primary Responsibilities:
- Work with product owners and technical staff to integrate, profile and analyze internal and external data sets to provide data into the viability and quality of potential and existing CCC data offerings.
- Participates as a team member in analysis, development, implementation, testing and documentation of data engineering projects, setting and meeting realistic timelines and deadlines.
- Ensures that design and code review occur in a timely manner and that systems are documented.
Requirements:
- Python and/or R programming
- Experience with databases, querying, reporting and ETL
- Practiced in working with multiple data sets, creating combined views, measuring data quality, and applying insights to business problems
- Experience working with APIs to query and obtain data
- An understanding of fuzzy matching, entity matching/deduplication would be beneficial
- The ability to track and evaluate experiments, communicate findings and propose next steps based on the outcomes
- Familiar with GitHub for version control, Jira for task/issue tracking, and structured approaches to working on data-centric tasks (such as CRISP-DM)
- Ability to work both independently and collaboratively, subject to peer review
- Capable of setting and meeting deadlines
- Excellent analytical, interpretative and interpersonal skills, backed up by the ability to convey meaningful information through verbal and written communication
- May be accountable for other results and activities as assigned.
Tags: Airflow APIs Architecture AWS Data quality Engineering ETL GitHub Jira Machine Learning PySpark Python R RDBMS Spark Statistics Testing
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Business Intelligence Engineer jobs
- Open Lead Data Analyst jobs
- Open Power BI Developer jobs
- Open Data Engineer II jobs
- Open Senior Business Intelligence Analyst jobs
- Open Marketing Data Analyst jobs
- Open Data Science Manager jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open Business Intelligence Developer jobs
- Open Business Data Analyst jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Data Analytics Engineer jobs
- Open Data Analyst Intern jobs
- Open Sr Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Sr. Data Scientist jobs
- Open Senior Data Architect jobs
- Open Data Engineering Manager jobs
- Open Junior Data Engineer jobs
- Open Big Data Engineer jobs
- Open Research Scientist jobs
- Open Data Quality Analyst jobs
- Open Azure Data Engineer jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open Data quality-related jobs
- Open ML models-related jobs
- Open Business Intelligence-related jobs
- Open Data management-related jobs
- Open Privacy-related jobs
- Open PhD-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open Finance-related jobs
- Open NLP-related jobs
- Open PyTorch-related jobs
- Open TensorFlow-related jobs
- Open LLMs-related jobs
- Open APIs-related jobs
- Open Generative AI-related jobs
- Open CI/CD-related jobs
- Open Snowflake-related jobs
- Open Consulting-related jobs
- Open Kubernetes-related jobs
- Open Hadoop-related jobs
- Open Data governance-related jobs
- Open Databricks-related jobs
- Open Airflow-related jobs