GOBY: An Enterprise Benchmark for Data Integration

1MIT, 2Intel Labs, 3Amazon, 4Technical University of Munich, 5University of Washington

Overview

What is GOBY?
GOBY is a benchmark dataset designed for evaluating data integration techniques specifically for enterprise data. It was derived from a real-world production workload in the event promotion and marketing domain, compiled around 2017. Unlike public benchmarks, GOBY focuses on private datasets, making it more representative of enterprise challenges.

Where was it collected?
The data was collected over several years using over 1,000 wrappers developed by professionals. These wrappers converted web pages and APIs into relational tables, creating a rich dataset.

What does it represent?
GOBY represents over 4 million rows of data corresponding to events. It includes detailed semantic labels such as event locations, organizers, and other metadata relevant to the domain. It highlights the structural and semantic complexity often found in enterprise datasets.

What does it contain?
- Source Tables: Nearly 1,200 source tables, each generated from a wrapper.
- Semantic Types: A hierarchy of semantic types developed by domain experts.
- Universal Schema: A unified schema combining all source tables.
- Statistics:
   - 4.04 million rows
   - 23,203 columns
   - Average 3,405 rows per table
   - Average 20 columns per table

GOBY is semantically richer and structurally more complex than typical public benchmarks, such as VizNet or T2Dv2, making it well-suited for enterprise data integration tasks.

Filestructure of Goby

The primary data archive, goby.tar.gz, contains the following key directories:

  • dump/: PostgreSQL dump files that include:
    • doit_categories: Data categories with record counts.
    • doit_data: Triple-based data representing (category_id, source_id, entity_id, name, value).
    • Additional mapping and result files.

Download Instructions

To access the GOBY dataset:

  1. Download the goby.zip file with the button below using the password:
    GOBY2025


  2. Extract the zip-file via the file explorer or in the terminal using a command like unzip -P your_password goby.zip -d /path/to/extract/.

Contact

Your support in improving this dataset is greatly appreciated! If you have any questions or feedback, please send an email to Moe Kayali.