Comparing OSS Data Viz solutions
X-posting from https://www.linkedin.com/pulse/comparing-oss-data-viz-solutions-micheal-benedict-xvqfc
Disclaimer: This is an informal post, followup on this question. It primarily captures my findings for an OSS solution (Metabase, Superset, Redash and Querybook) that works for my specific needs. Note: I was an avid Tableau user in the past, then at Pinterest I had a chance to use Querybook/Presto and recently Redash (which got acquired by Databricks and was integrated into the product).
TL;DR
The best platform depends on your specific needs and priorities:
- For simplicity and quick insights: Choose Metabase.
- For complex visualizations and customization: Choose Superset.
- For SQL-focused workflows and automation: Choose Redash.
- For ad-hoc querying, collaboration, and Presto integration: Consider Querybook.
I ended up going with Redash.
How did I evaluate these tools?
My evaluation method wasn't scientific but based on firsthand experience and a subjective overview focusing on five critical areas. Each tool received a score reflecting its features, ease of use, and big data handling capabilities. One other thing to add here is that this is for my personal project which uses Google's Big Query as the primary data warehouse.
- Charts and Visualizations: How many different ways can you show your data? The range and diversity of chart types offered to represent data effectively.
- Query Editors: How easy is it to write and edit those data queries?
- Backend Query Processing: Compatibility with different database systems and the ability to handle data volume and complexity.
- Dashboard Flexibility: Can you customize your dashboards and are updates smooth?
- Scheduled Queries and Automation: Can you schedule things and keep your data fresh automatically?
- Setup: How much work is it to get things up and running?
Results
1. Charting Your Course: Visualizations Offered
- Metabase (4): Offers a diverse range of charts including bar, line, pie, scatter plots, maps, funnels, and more, catering to various data representation needs.
- Superset (5): Boasts an extensive collection of visualizations, covering everything from basic charts to heatmaps, treemaps, box plots, and various graph types, providing immense flexibility for complex data exploration.
- Redash (3): Focuses on core chart types like bar, line, pie, and scatter plots, offering additional visualizations through plugins. Its strength lies in clear and concise data representation.
- Querybook (2): Primarily geared towards tabular data and excels in presenting query results in a clear, structured format. Built-in visualizations are limited, but integration with Jupyter notebooks allows for more advanced options.
2. Querying with Ease: Editors and Interfaces
- Metabase (3): Provides a user-friendly GUI query builder which seems very limited. I personally didn't find it useful and ended up going back to the their SQL editor. The process of saving queries & using it in dashboards was ok
- Superset (4): Features an SQL editor with helpful functionalities like code highlighting and auto-completion, making writing and editing queries efficient.
- Redash (4): Champions a SQL-centric approach with a powerful editor and snippets, catering to users comfortable with SQL and seeking a streamlined workflow.
- Querybook (3): Offers a notebook-style interface with SQL cells, allowing for iterative query development and exploration. Its includes execution history.
3. Backend Connections: Integrating with Data
- Metabase (4): Supports a wide array of databases including MySQL, Postgres, BigQuery, Snowflake, and more, ensuring compatibility with diverse data environments. Setup w/ BigQuery was very easy.
- Superset (5): Similar to Metabase in database support, with additional options like Druid and Kylin, catering to specific data processing needs.
- Redash (4): Connects to various databases like MySQL, Postgres, Redshift, BigQuery, and others, providing flexibility for different data storage solutions. Setup w/ BigQuery was ok, ran into an issue with setting the right "Processing Location".
- Querybook (2): Primarily designed for Presto, can be extended but non-trivial
4. Dashboards: Building a Story
- Metabase (4): Enables drag-and-drop dashboard creation with filtering and drill-down capabilities, allowing users to explore data dynamically. Updates are generally reliable, ensuring stability.
- Superset (4): Offers highly customizable dashboards with interactive elements and extensive layout options. However, updates can sometimes introduce breaking changes requiring adjustments.
- Redash (3): Provides dashboards with essential features, but its flexibility is less pronounced compared to Metabase and Superset. Updates are generally stable, minimizing disruption.
- Querybook (1): Not focused on dashboards, emphasizing ad-hoc querying and analysis.
5. Automation: Keeping Your Data Fresh
- Metabase (3): Allows scheduling queries and sending results via email or Slack, along with setting alerts based on data thresholds.
- Superset (3): Offers scheduled reports and alerts to keep dashboards updated with fresh data and insights.
- Redash (4): Excels in scheduling queries and setting up alerts based on various conditions, making it ideal for automated data workflows.
- Querybook (2): While not directly focused on scheduled updates, its notebook-style interface and collaboration features facilitate iterative data exploration and analysis
6. Setup: Getting Up and Running
- Metabase (5): Offers the simplest setup, requiring just Java and offering easy Docker or direct JAR file execution. Cloud deployments are straightforward with pre-built images. I could run this off on micro instance.
- Superset (3): Involves setting up a Python environment with several libraries. Needs a beefy machine
- Redash (3): Similar to Superset, needing a Python environment and offering Docker or manual configuration with a web server. Needs a beefy machine, ended up using an N2 on Google Cloud
- Querybook (2): Relies on Docker and docker-compose, demanding familiarity with these tools. Configuration involves environment variables and manually setting up database connections. Non trivial amount of configuration is required to run this right.