There is a free version of Alteryx

Machine learning platforms: Dataiku vs. Alteryx vs. Sagemaker vs. Datarobot vs. Databricks

What are machine learning platforms?

Code is only a small part of machine learning solutions. In order to manage a machine learning solution, a number of different tools and services must be used, including:

  • Computing capacity to prepare data and train machine learning models;
  • Data management software to clean, edit, track and secure data;
  • Software development tools to write and manage code;
  • Dashboarding tools to interact with the solution and present results.

The goal of machine learning platforms is to integrate these four components into a single overall solution.

However, not all machine learning services can be directly compared with one another. Tools like AWS Sagemaker help to make machine learning solutions less complex, but you still need developers with programming experience in the team. Such tools concentrate primarily on the provision of scalable computing capacity. Vendors like Alteryx tend to focus on that presentation. With these tools, thanks to a no-code user interface, you can use simple machine learning functions even without much programming experience.

In general, these platforms often contain the dashboarding and / or workflow orchestration tools that we have presented and compared in previous articles.

Tools like Alteryx can therefore be seen as a higher level of abstraction. Several functions are simplified summarized, but compared to more specialized tools the flexibility restricts.

We compared the most popular platforms so you can make an informed decision about which is best for you.

What should i choose?

As always: "It depends" - but if you are looking for a quick answer, you can roughly orientate yourself here:

  • Dataiku is suitable if you are not already working with your own development, orchestration and machine learning tools, but are looking for a ready-made all-in-one solution. The team should have some basic technical knowledge, but does not have to consist mainly of software developers.
  • Alteryx if you focus on marketing and analytics and want access to machine learning and data management without writing code yourself.
  • Knime is a cheaper, not quite as sophisticated, but more flexible alternative to Alteryx.
  • Sagemaker if your team has technical knowledge but would like to abstract the machine learning infrastructure and accept a loss in flexibility.
  • Datarobot if you have tabular data and are looking for the easiest way to quickly train ML models.
  • Databricks if you are already working with Apache Spark and are looking for an easier way to use this platform.

If you are just considering which approach is best for you and your team, please book a free (and independent) consultation with us.

Book a free consultation

Short overview

Before we turn to the detailed comparisons, here is a brief overview of the individual platforms.

  • Dataiku is a cross-platform desktop application that covers a wide range of tools, such as notebooks (similar to Jupyter Notebook), workflow management (similar to Apache Airflow) and automated machine learning. In general, Dataiku aims to use many of the tools you already have replacethan to integrate with them.
  • Alteryx is an analysis-oriented platform that is more comparable to dashboarding solutions such as Tableau, but also provides machine learning components. The focus is on no-code alternatives to machine learning, advanced analytics, and other applications that typically require code.
  • Knime is comparable to Alteryx, but has an open source option for self-hosting and is cheaper in the paid version. It contains machine learning components and modular analysis integrations.
  • Datarobot concentrates on a small part of machine learning solutions: Automated training of machine learning models. You upload data in tabular form and the tool automatically finds a suitable model with parameters to predict certain columns.
  • Databricks is primarily an Apache Spark environment that can be integrated with tools such as MLFlow for workflow orchestration.
  • Sagemaker focuses on simplifying machine learning infrastructure for training and providing models. Meanwhile there is also in Sagemaker Autopilot (similar to Datarobot) and Sagemaker Studio (similar to Dataiku).

We rated each of these libraries based on the following criteria:

  • Stage of development: How long has the library been around and how reliable is it?
  • Notoriety: how often is it searched for on google.
  • Width: does the tool have a specific focus or is it more broadly based.

This is less about strict standards than about a rough overview of the points in which the tools are similar or different. You can find more details here, in our head-to-head comparisons:

Dataiku vs. Alteryx

Dataiku and Alteryx are both machine learning platforms, but Dataiku mainly focuses on technical aspects, while Alteryx focuses on analytics and presentation.

Dataiku includes Data Science Studio (DSS), a cross-platform desktop application that includes a notebook (similar to Jupyter Notebook) for developers to write code and a workflow orchestration tool (similar to Apache Airflow). While it offers a few user interfaces, the focus is clearly on developing code. In contrast, Alteryx offers a better dashboarding experience, but less flexibility: In Alteryx you use a user interface to create no-code machine learning components.

  • Choose Dataiku if your team is tech savvy and you want data scientists, developers, and analysts to use the same tool.
  • Choose Alteryx if your team is less technically experienced and you want to carry out sophisticated analyzes with ready-made components.

Dataiku vs. Databricks

Both Dataiku and Databricks aim to enable data scientists, developers and analysts to use a unified platform. Dataiku relies on its own software, while Databricks integrates existing tools. Databricks forms the central interface to connect Apache Spark, AWS or Azure, and MLFlow with each other.

Dataiku includes integrations for machine learning libraries such as Tensorflow and an AutoML interface that can perform machine learning on data in table format.

  • Choose Dataiku if you would like to manage your own infrastructure, but need a platform for your machine learning pipelines and analyzes.
  • Choose Databricks if you are looking for a platform that will manage your infrastructure for you and find your way around with Apache Spark.

Dataiku vs. Datarobot

Datarobot and Dataiku both offer AutoML: a no-code machine learning platform on which you upload your data as a table and select a target variable; the platform then selects a suitable machine learning model for predicting the target variable and optimizes it accordingly.

This Auto ML function is the central component of Datarobot. Dataiku, on the other hand, offers a lot more: a comprehensive selection of data science tools, including an IDE, a task orchestrator and visualization tools.

  • Choose Datarobot if you already have cleaned data sets and want to use predefined machine learning models for data analysis without the need for developer knowledge.
  • Choose Dataiku if you are looking for something more flexible with which you can develop your own bespoke machine learning models.

Dataiku vs. Sagemaker

Dataiku focuses on providing software development and analysis tools for data scientists and developers; Sagemaker focuses more on the underlying infrastructure: the servers that run and provide these models. Dataiku offers integration with Sagemaker, but Sagemaker also provides tools that are in direct competition with Dataiku: Sagemaker Studio and Sagemaker Autopilot.

You can either use these platforms in combination, using Dataiku to develop and manage your models and Sagemaker for training and deployment. Alternatively, however, you can also use Sagemaker for each of these functions.

  • Choose Dataiku if you are looking for a more mature platform with a focus on user interface and user experience that can be used by developers as well as analysts.
  • Choose Sagemaker if your team has more developers than analysts, you need more flexibility and you have nothing against interfaces that are not yet fully developed.

Alteryx vs. Datarobot

Alteryx is a more comprehensive solution that offers analysis, data management and dashboarding components as well as no-code machine learning. Datarobot has a more limited focus on just no-code machine learning.

  • Choose Alteryx if your focus is on data and analytics and you are looking for a platform for the entire organization.
  • Choose Datarobot if you want to analyze an existing data set with predefined machine learning models.

Alteryx vs. Knime

Alteryx and Knime are very similar tools that largely overlap in their capabilities. Alteryx is more commercial and only offers a paid platform, while Knime also provides a free, open source option. Knime lacks a bit of Alteryx's fine-tuning, but offers more flexibility.

  • Choose Alteryx if you have more business analysts than developers on your team and want to create sophisticated reports and dashboards.
  • Choose Knime if you are looking for an inexpensive option and flexibility is more important to you than presentation.

Sagemaker vs. Databricks

Sagemaker offers the possibility to deploy and use machine learning models on AWS infrastructure using various machine learning frameworks. Databricks lets you run Jupyter notebooks on Apache Spark clusters (which in turn run on AWS).

Databricks focuses on analyzing large amounts of data by allowing its code to run on compute clusters. Sagemaker focuses on tracking experiments and using models. With both tools, data scientists can write code in their familiar notebook environment and execute it on a scalable infrastructure.

  • Choose Sagemaker if you are looking for a universal platform to develop, train and use your machine learning models.
  • Choose Databricks if you want to specifically use Apache Spark and MLFlow to manage your machine learning pipeline.

Sagemaker vs. Datarobot

Sagemaker includes Sagemaker Autopilot, which is very similar to Datarobot. With both tools you can upload a simple data set in tabular form and select a target variable. The platform automatically runs experiments and selects the most suitable machine learning model for the data.

Since this so-called "AutoML" is the core focus of Datarobot, Datarobot has a more comprehensive model library than Sagemaker. Sagemaker lags a bit behind Datarobot in this special application, but Sagemaker is equipped with more functions overall (e.g. to develop models or track experiments).

  • Choose Sagemaker when you need a more flexible platform that includes AutoML.
  • Choose Datarobot if you're looking for a simpler platform specializing in AutoML and with more pre-built models.

Notes at the end

If you look at the websites of these platforms, you will quickly come across claims about how powerful and easy to use the respective tools are. However, it is important to remember that these services all try to solve fairly complex problems. Getting started on these platforms is therefore in most cases a long and costly process that you can hardly avoid.

All of these tools and services aim to provide an acronym for data processing, machine learning, and analytics. But that also means that they can be very restrictive in some parts. If machine learning is a central component for your company, then building your own pipeline is often still the best option. There are excellent, mature, open source platforms that one can use to build a completely bespoke solution.

The presented machine learning platforms sell the concept that people without developer experience can develop machine learning solutions. In practice, however, in the end it is usually the case experienced machine learning developerswho use these tools and services most effectively. Professionals with a sound understanding of the underlying concepts can use ML platforms as a shortcut to create proofs of concept; because they understand the underlying process that the platform is designed to simplify and how to use it properly. But those lacking this experience often find that ML platforms are too limited to meet their exact needs. At the same time, they are still too complicated to be used properly by non-technical team members.

If you are just considering which approach is best for you and your team, please book a free (and independent) consultation with us.