Machine Learning Engineer | Codersbrain
Job Description
Machine learning Engineer
Company Overview
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.
Job Summary
The Data Engineer will be involved in various data engineering aspects, including data collection, cleaning, and preprocessing, to training models and deploying them to production. The ideal candidate will possess strong technical and interpersonal skills, along with specific machine learning skills. Collaboration with multi-functional teams to achieve product milestones as agreed with stakeholders is a key expectation of this role.
Responsibilities
- Understand business objectives and develop models that help achieve them, along with metrics to track progress.
- Analyze machine learning algorithms to solve problems and rank them by success probability.
- Explore and visualize data to understand it and identify differences in data distribution affecting model performance.
- Verify data quality and ensure it through data cleaning processes.
- Define validation strategies and preprocessing or feature engineering for datasets.
- Establish data augmentation pipelines and find available datasets for training.
- Train models and tune their hyperparameters.
- Analyze model errors and design strategies to overcome them.
- Deploy models to production.
- Work independently and collaboratively on a multi-disciplinary project team in an Agile development environment.
- Engage in design, development, and testing activities for big data products.
- Provide feedback to development teams on code and architecture optimization.
Qualifications
- Hands-on experience developing in Python and PySpark.
- Experience with Apache Spark is preferred.
- Strong foundation in statistics and ability to utilize statistical methods to derive insights from data.
- Familiarity with Azure Databricks or similar platforms.
- Proficiency with deep learning frameworks such as TensorFlow, PyTorch, or Keras.
- Skilled in machine learning libraries such as scikit-learn and pandas.
- Expertise in visualizing and manipulating large datasets.
- Knowledge of selecting hardware to run machine learning models with the required latency.
- Familiarity with Azure services.
- Proven experience with Continuous Integration/Continuous Deployment (CI/CD).
- Experience with version control systems, such as GitHub or Bitbucket.
- Familiarity with Linux OS and concepts.
- Strong written and verbal communication skills.
- Self-motivated and able to work well in a team.
Preferred Skills
- Knowledge of additional machine learning frameworks or libraries would be a plus.
- Experience with big data tools and platforms not specified.
Experience
- Previous experience in data engineering or related fields is required.
Environment
- The typical work setting is expected to be collaborative, possibly in a hybrid model, with Agile development practices. Specific location details are not provided.