Data Scientists for the Princeton Catalysis Initiative - Schmidt Data X Project

Princeton University, Princeton
Apr 22, 2019
May 22, 2019
Princeton University - Department of Chemistry:

The Princeton Catalysis Initiative at Princeton University seeks two Data Scientists in three-year term positions, at the rank of Professional Specialist, as part of the recently-announced Schmidt Data X project.  The Princeton Catalysis Initiative is spearheaded by the Department of Chemistry, with strong involvement from other departments including Chemical Engineering, Molecular Biology, and Computer Science.  With the aid of Data Scientists, the catalysis initiative at Princeton can lead in building tools for data collection and curation, the descriptions of chemical space, and predictive and interpretable algorithms, thereby accelerating the pace of molecular discovery essential to outstanding problems of societal concern. 


  • Specific responsibilities of the two Data Scientists in this group include a combination of the following:
  • Provide expertise to the research group on evolving statistical and machine learning methods and applications necessary to exploit data for predictive purposes, including working on high-throughput experimentation to optimize algorithms.
  • Build user-friendly interfaces and new machine learning platforms to directly serve the chemical community and make advances in computer-assisted synthesis accessible, including the development of Web-based applications that can enable chemists to identify optimal conditions for valuable synthetic transformations in advance of experimentation.
  • Evaluate whether machine learning tools that employ previously underutilized dense information encoded in three-dimensional chemical structure can be combined with heterogeneous information from different publication and patent databases to enable predictions of synthesis routes or reaction conditions.
  • Create software to interface and extract information on reaction conditions and outcomes from electronic notebook platforms utilized on campus and in the chemical industry.

While the specific responsibilities will vary by research project, all Data Scientists will create opportunities to educate, train, convene, and support a broad community of researchers on campus in how to best leverage data science in their research and teaching.  They will also contribute to new graduate-level courses on data science as well as mini courses, workshops, and office hours.  In all three areas, the Data Scientists must demonstrate expertise in researching, designing, and implementing algorithms and techniques to exploit the connections between data analysis/machine learning and the fundamental research questions explored by each group.


  • PhD required in computer science, data/computational science, or related disciplinary field or equivalent combination of educational training and relevant experience;
  • 5 - 9+ years working in data analysis/scientific computing role required
  • Knowledge of mathematical modeling and computational methods
  • Demonstrated experience applying artificial intelligence and machine learning concepts and tools to research questions and projects, including modeling and simulation work
  • Strong coding and algorithm prototyping skills, as well as the ability to explain and document this work in accessible ways; expert knowledge of general purpose, dynamically typed object-oriented language such as Ruby or Python
  • Proficiency in SQL and database design and building data-driven web applications
  • Experience excelling in a highly collaborative, multi-disciplinary research environment
  • Experience determining strategy and executing interdisciplinary projects strongly preferred
  • Experience as a Principal Investigator is preferred
  • Demonstrated innovative technical achievements and/or extensive managerial experience preferred.

These positions are subject to the University's background check policy. Applicants must apply online at  and submit a cover letter, CV and contact information for 3 references.

