DotData Eases Feature Engineering with AutoML 2.0

DotData, an AI startup spun off from NEC recently raised $23M, and $43M in total. The startup has experienced a 300% revenue growth over the last 18 months. They describe their service as an “end-to-end data science automation” platform, offering AutoML 2.0 with capabilities that’ll accelerate “data preparation, feature engineering, model training, and rapid model deployment.”

CEO Ryohei Fujimaki and a team of PhDs are leading the startup’s efforts in expanding the platform feature set. AutoML is supposed to simplify the process of working with AI, automating time-consuming tasks like those in data preparation and feature engineering. However, as Ryohei indicated “in our experience, there is still a lot of work you have to do before machine learning” comes into play. One metric quoted by the AI industry is that data scientists spend 80% of their time on data preparation vs actual ML work. This problem has created a big opportunity for the AI startup ecosystem.   

Feature Engineering

Ryohei explained the challenge of feature engineering, mentioning a specific project he worked on in the past involving a New Zealand telco. In that project, which focused on churn prediction, Ryohei and another data scientist spent months writing 3,400 queries across 10 tables to create just one feature. Some projects might require one feature or hundreds, depending on the requirements. If that telco project would have needed dozens of features, a large team of data scientists would have been required for the job. The telco project illustrates the challenge of feature engineering. Here is a list the Dotdata team put together of the task involved in feature engineering.

Data Engineering

  • Feature cleansing
  • Missing value imputation
  • Outlier filter
  • Data normalization
  • Feature selection
  • Feature transformation
  • One-hot encoding
  • Polynomial transformation
  • Temporal aggregation

Feature Engineering

  • Algorithm selection
  • Accuracy
  • Interpretability
  • Hyper-parameter search
  • Grid search
  • Bayesian optimization
  • Accuracy validation
  • Hold-out validation
  • Cross-validation

While speaking at the Ai4 conference, Ryohei talks about “the gap between AutoML and real data” and how significant it is. One reason is that data is usually sitting in multiple tables and different databases within an organization. Thus, collecting “dirty” data from all these sources and putting it together into a table ML can use presents a major problem, not only for technical reasons but also because domain experts are needed in this phase. Who are the domain experts? Busy executives who are running the company.

source: AutoML and Beyond Part 1

DotData has made progress in tackling the feature engineering problem by creating AI that can automate a large chunk of the tasks. With AutoML 2.0, feature engineering that would have taken months now takes days.

In the illustration below, the top section represents the old way of tackling data science projects, from data collection to production. The bottom section represents the modern way of accelerating data science projects.

source: AutoML and Beyond Part 1

DotData’s mission is to “democratize” AI so data scientists, IT engineers, and business executives may use machine learning in their operations to improve productivity, cut costs, increase revenue, automate redundant tasks, and get predictive insight. The biggest problem the industry faces, there aren’t enough data scientists to go around for every organization. Thus, putting AI in the hands of a non-data scientist will dawn a new era of machine learning.

Background

  • Company: DotData
  • Founded: 2018
  • HQ: San Mateo, CA
  • # of Employees: 31
  • Raised: $43M
  • Founders: Ryohei Fujimaki (CEO), Yusuke Muraoka (Data Scientist), Masato Asahara (Architect), and Yukitaka Kusumara (Research)
  • Product: Data science automation platform (AutoML)