Job Description
PLOS is seeking a Data Scientist to contribute to their R&D project. This role involves providing insights into the nature and structure of PLOS's data and content, including published content and internal datasets. The Data Scientist will lead the development of models to improve the processing, access, understanding, and use of this data, working closely with Subject Matter Experts, Product Managers, Software Engineers, and Product Designers.
The Data Scientist will be responsible for the large-scale analysis of PLOS's scholarly content, which includes research articles and associated datasets, as well as line of business data and information. This will require working with structured and unstructured data, a large corpus of scholarly articles, and using programmatic techniques such as statistical analysis, natural language processing, information retrieval, and machine learning.
Responsibilities:
- Creating and using machine learning models, statistical analysis, natural language processing to improve scientific content workflows, enhance discoverability, and support Open Science initiatives.
- Collecting, cleaning, and analyzing large datasets of scientific content and related information from various sources, ensuring data quality and integrity.
- Building and testing predictive models and machine learning algorithms.
- Visualizing and presenting findings in a clear, concise, and compelling manner.
- Working as part of a cross-functional team, contributing insights, models and code and deploying production services that improve our use of data.
- Collaborating with editorial, marketing, product, and colleagues across PLOS to understand data needs and translate business requirements into analytical solutions that enable new open science capabilities.
- Contributing to the development of data strategies and best practices within the organization and identify opportunities for workflow optimization and automation.
- Engaging with the latest research and trends in data science, Open Science, and scholarly publishing.
- Considering the ethical implications of all data techniques as applied to our data.
Requirements:
- Extensive experience in statistical modeling, machine learning, and data mining techniques.
- Proficiency in programming languages such as Python, R, and SQL.
- Strong knowledge of machine learning frameworks and libraries.
- Experience with NLP techniques.
- Demonstrated ability to communicate complex technical findings clearly and effectively.
- Strong analytical and problem-solving skills.
- Experience working with large datasets and database systems.
- Familiarity with the scientific research environment, scholarly literature, and open science principles are an advantage.
- Able to develop hypotheses based on quantitative and qualitative evidence
- Experience working with solid development practices, git, CI etc.
- Ability to work effectively both independently and collaboratively within a remote, agile team environment.
- A Master's degree in a relevant field is preferred.
- Relevant work experience in a data science role within scientific publishing, research, or a related field is desirable.
PLOS offers:
- 401k with employer match
- Employee sponsored health, dental and vision insurance (Dental and Vision 100% employer paid)
- Paid Vacation, 11 public holidays and sick leave
- Parental leave
- Birthday and three winter holidays days off
- Short term and long term disability insurance
- 2 days paid time off for volunteering per year
- Fully remote work environment with stipend on joining for home office