Lead Data Engineer | Codersbrain
Job Description
Lead Data Engineer
Job Summary
We are looking for a highly skilled Lead Data Engineer with expertise in Azure or AWS and Databricks to join our team in Bangalore on a hybrid work model. In this role, you will lead the design, development, and implementation of scalable, secure, and efficient data engineering solutions, ensuring robust data processing pipelines. You will collaborate closely with data scientists, analysts, and business stakeholders to maintain data integrity while mentoring junior team members and enforcing best practices in data governance and security.
Responsibilities
- Design & Development: Lead the design and development of scalable and reliable data pipelines using Azure Data Services or AWS Data Services and Databricks.
- ETL/ELT Processes: Architect, implement, and optimize ETL/ELT processes for processing large volumes of structured and unstructured data.
- Data Infrastructure: Develop and maintain data models, data lakes, and data warehouses to support analytics and business intelligence initiatives.
- Collaboration: Work with data scientists, analysts, and business stakeholders to ensure data availability and accuracy.
- Data Governance & Security: Implement and enforce best practices related to data governance, security, and compliance.
- Performance Optimization: Optimize and monitor the performance of data processing frameworks (e.g., Apache Spark, Databricks).
- Workflow Automation: Automate and orchestrate data workflows with tools such as Apache Airflow, Azure Data Factory, AWS Step Functions, or Glue.
- Mentorship: Guide and mentor junior data engineers in modern data engineering techniques and best practices.
Qualifications
- Experience: 5+ years of experience in data engineering.
- Cloud Data Services: Strong expertise in Azure Data Services (Azure Data Lake, Azure Synapse, Azure Data Factory) or AWS Data Services (S3, Redshift, Glue, Lambda, Step Functions, EMR).
- Databricks & Spark: Proficiency in Databricks and practical experience with Apache Spark for large-scale data processing.
- Programming: Strong programming skills in Python.
- Database Skills: Experience working with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB, CosmosDB).
- Data Governance & Compliance: Solid understanding of data governance, security, and compliance standards (such as GDPR and HIPAA).
- Real-Time Streaming: Experience with real-time streaming technologies like Kafka, Kinesis, or Event Hubs is a plus.
- Problem-Solving: Excellent analytical and problem-solving skills to thrive in a fast-paced, agile environment.
Preferred Skills
- Experience with machine learning pipelines and MLOps.
- Familiarity with data visualization and business intelligence tools such as Power BI, Tableau, or Looker.
- Strong leadership and communication skills to drive best practices across the team.
Experience
- Minimum 5+ years of professional experience in data engineering, with a proven track record of managing large-scale data processing pipelines.
Environment
- Location: Bangalore (Hybrid work model)
- Collaborative and agile work environment with both remote and on-site presence to facilitate teamwork and flexibility.
Salary
- 22 to 24 LPA
Tools
file_search
// Tool for searching files uploaded by the user. // // To use this tool, you must send it a message. To set the tool as the recipient for your message, include this in the message header: to=file_search.<function_name> // // For example, to call file_search.msearch, you would use: // <|im_start|>assistant to=file_search.msearch code<|im_sep|>{"queries": ["first query", "second query"]}<|ghissue|> // // Note that the above must match exactly. // // You must provide citations for your answers. Each result will include a citation marker that looks like this: fileciteturn7file4