Faculty Candidate Seminar

DeepDive: A Data Management System for Machine Learning Workloads

Ce ZhangPost DocStanford
SHARE:

Many pressing questions in science are macroscopic: they require
scientists to consult information expressed in a wide range of
resources, many of which are not organized in a structured relational
form. Knowledge base construction (KBC) is the process of
populating a knowledge base, i.e., a relational database storing
factual information, from unstructured inputs. KBC holds the promise of
facilitating a range of macroscopic sciences by making information
accessible to scientists. One key challenge in building a high-quality
KBC system is that developers must often deal with data that are both
diverse in type and large in size. Further complicating the scenario
is that these data need to be manipulated by both relational operations and
state-of-the-art machine-learning techniques.

My research focuses on building a data management system for machine
learning workloads with the goal to help this complex process of building
KBC systems. The system I build is called DeepDive, whose ultimate goal is
to allow scientists to build a KBC system, and machine learning systems in general,
by declaratively specifying domain knowledge without worrying about any
algorithmic, performance, or scalability issues. DeepDive has been
used by users without machine learning expertise in a number of domains from paleobiology to genomics to anti-human trafficking. In this talk, I will describe
the DeepDive framework, its applications, and underlying techniques we developed
to speed up a range of machine learning workloads by up to two orders of magnitude.
Ce is a postdoctoral researcher in Computer Science at Stanford University. He is working with Christopher Ré on data management and database systems. With the indispensable help of many collaborators, his PhD work produced the system DeepDive, a trained data system for automatic knowledge-base construction. As part of his PhD thesis, he led the research efforts that won the 2014 SIGMOD Best Paper Award and was invited to the "Best of VLDB 2015" special issue; PaleoDeepDive, a machine-reading system for paleontologists, was featured in Nature magazine, and he also led the Stanford team that produced the top-performing machine-reading system for TAC-KBP 2014 slot-filling evaluations using DeepDive. Ce obtained his PhD from the University of Wisconsin-Madison, advised by Christopher Ré, and his Bachelor of Science degree from Peking University, advised by Bin Cui.

Sponsored by

CSE