Machine Learning API Comparison

February 10, 2016

Methodology

Creating a prediction requires building models based on existing data. Thus, machine learning essentially has two phases: training and prediction. The training phase consists in using a set of input-output examples. From there, machine learning will “learn” from the data received and create a model. The model is an interpretation of the dataset and the relationships between the attribute we want to predict and other attributes. The prediction phase consists in a combination of using the created model along with new data on new inputs to get predictions of the associated outputs.

Before creating a model in machine learning, it is important to consider the dataset that is being used. The most time consuming part of machine learning is identifying the problem and creating the dataset before inputting into the API. To have predictive data come out with the highest possible accuracy, users should gather as much data as they can around a given context without making several assumptions about the output.

The quantity and quality of the dataset is important to ensure what is needed to predict or classify. This is important to avoid unpredictable correlations between the input data and the target value or an extremely high inaccuracy rate.

Choosing a Machine Learning API

Although Amazon, Google, IBM, and Microsoft are leading the growing machine learning cloud services market, there are a variety of Machine Learning API options. However, all vary depending on the desired result you wish to achieve. Here are several Machine Learning APIs used for comparison:

- Google Predictive API – Cloud-based machine learning and pattern matching tool for the upsell of opportunity analysis, customer sentiment analysis, churn analysis, spam detection, document classification, purchase prediction, recommendations, intelligent routing and more. Uses classifiers for programming the API service to make predictions, so users are only required to have basic programming background without the working knowledge of AI. Reads data from BigQuery and Google Cloud Storage.
- Amazon Machine Learning – Service makes it possible to build intelligent applications that feature machine learning capabilities such as pattern recognition and prediction. Developers can use Amazon ML APIs to build applications that feature fraud detection, content personalization, document classification, customer churn prediction, and more.
- Microsoft Azure Machine Learning – Provides capabilities such as natural language processing, recommendation engine, pattern recognition, computer vision, and predictive modeling. Azure Machine Learning makes it easy to use predictive models in IoT applications by providing APIs for fraud detection, text analytics, recommendation systems and several other business scenarios. API is built on the machine learning abilities that are available in Microsoft products such as Bing and Xbox.
- BigML – Features anomaly detection, cluster analysis, SunBurst visualization for decision trees, text analysis, and more. The BigML API allows applications to access predictive models and other BigML resources. Using the API, applications can perform CRUD operations on BigML resources using standard HTTP methods. Creates predictive models easily due to its powerful “1 Click” feature. BigML API also provides 3 important modes: Command Line Interface, Web Interface and a RESTful API.
- IBM AlchemyAPI – Provides more than a dozen APIs that developers can use to add machine learning-powered features to applications such as sentiment analysis, entity extraction, concept tagging, image tagging, and facial detection/recognition. AlchemyAPI provides nicely designed, comprehensive API documentation that includes code samples, SDKs, demos, and a getting started page.

Below is a features comparison chart to help discern features between the APIs:

	Google Predictive API	Amazon Machine Learning	MS Azure Machine Learning	BigML	IBM AlchemyAPI
Dataset Max Size	text file: 2.5 GB HTTP Request: 2MB	100 GB	10 GB	No limit depending on credits	No Limit
Algorithms	Unknown	Linear	Linear & Nonlinear	Linear & Nonlinear	Linear
Batch Training	Yes	Yes	Yes	Yes	No
Incremental Training	Yes	No	No	Yes	No
Real-Time Training	Yes	Yes	No	No	Yes
C++ Source Code	Yes	Yes	Yes	Yes	Yes
Open Source	Yes	No	No	Yes	No
Model Exporting	No	No	No	Yes	No
Data Visualization	No	Table	Table, Histogram, Stat Summary	Table, Histogram, Stat Summary	Table, Histogram, Stat Summary
Cost	$6.50 + $10/month + Google Cloud Storage fees (negligible)	Data Analysis and Model Building Fees $0.42/hour	$9.99/month, plus $1/hour for model training, $2/compute hour to feed results out to APIs for application integration, plus 50 cents/1,000 API transactions.	$6 assuming predictions are made using an offline model; additional $11.50 if using online predictions (through the website or API)	3 Packages: Free, Small Business ($250.00) and Basic ($800.00)

		Batch Predictions: $0.10/1,000 predictions, rounded up to the next 1,000

		Real-Time Predictions: $0.0001/prediction, rounded up to the nearest penny

Takeaway

While a great tool used for predictive analysis, Machine Learning APIs are not perfect by any means. The results will vary depending the quantity and quality of the data that was fed into the algorithm. All of the Machine Learning APIs mentioned above have features targeted for specific scenarios, e.g. image recognition, opportunity analysis, document declassification, etc.

Thus, selecting the right Machine Learning API first and foremost requires having a clean set of data that can be interpreted easily by the API. Fortunately, there are great tools such as pandas or Openrefine for data pharsing. Also, the pricing structures vary between the services, so the larger concern should be looking into the details to determine which one will be the cheapest based on expected usage.

Machine Learning API Comparison

Categories

Methodology

Choosing a Machine Learning API

Takeaway