Browse All Jobs
Job Description

Xometry is seeking a Principal Data & ML Scientist to join their Generative AI team. This role involves providing technical leadership, strategic planning, and the development of generative AI models and large language models (LLMs). The Principal Data Scientist will focus on multimodal document processing, extracting structured data from technical drawings, and developing innovative text and image-based data processing solutions.

The ideal candidate will collaborate with cross-functional teams to align AI solutions with business needs and mentor team members, ensuring industry standards are followed in AI and ML development. They will also stay updated with the latest research in generative AI and deep learning to incorporate best practices into model development.

Responsibilities:

  • Provide technical leadership to the Generative AI team.
  • Lead strategic planning and roadmap development for generative AI initiatives.
  • Develop and deploy generative AI models and large language models (LLMs).
  • Lead the exploration and development of innovative text and image-based data processing solutions.
  • Design and implement efficient workflows for data preparation, cleaning, and augmentation.
  • Utilize cloud platforms (e.g., Amazon Web Services) for large-scale data processing, model training, and deployment.
  • Collaborate with cross-functional teams, including engineering and business teams.
  • Mentor and guide team members on advanced machine learning techniques.
  • Continuously experiment and iterate on model performance.
  • Stay updated with the latest research in generative AI, deep learning, and multimodal data processing.

Requirements:

  • A bachelor’s degree is required, but an advanced degree (M.S. or PhD) in computer science, machine learning, AI, or a related field is highly preferred.
  • 7+ years of experience in data science and machine learning.
  • Expertise in large-scale language and vision models (e.g., Transformers, GPT, VLMs).
  • Experience with multimodal data processing (e.g., combining text, image, and 3D data).
  • Proficient in Python, including key libraries such as PyTorch, TensorFlow, pandas, and numpy.
  • Strong background in probability, statistics, and optimization techniques relevant to generative modeling.
  • Familiarity with cloud computing resources and tools for model training and deployment (e.g., AWS SageMaker).
  • Familiar with software engineering principles, including version control, reproducibility, and continuous integration.
  • Experience in the manufacturing, supply chain, or similar industries is a plus.
Apply Manually