Link Search Menu Expand Document

Benchmark Datasets for Structured Knowledge Grounding

In progress

We present a comprehensive collection of datasets of Structured Knowledge Grounding.

Table of contents

Summary

Dataset Knowledge User Input Output Keywords Contain in UnifiedSKG
Spider Database Question SQL Fully supervised; Cross-domain; Single turn v1
Single Domain Text2SQL Database Question SQL Fully supervised; Single turn  
Squall Database(converted from table) Question SQL Fully supervised from 80% of WikiTQ  
KaggleDBQA Database Question SQL Fully supervised; Realistic  
Spider-Syn Database Question SQL Fully supervised; Robustness  
Spider-DK Database Question SQL Fully supervised; Annotation of types of domain knowledge text-to-SQL needed.  
SEDE Database Question SQL Fully supervised; Real usage on the Stack Exchange website  
Break          
GrailQA Knowledge Graph Question S-Expression Large, 64k; Test generalization: i.i.d./compositional/zero-shot v1
WebQSP Knowledge Graph Question S-Expression - v1
Russ API Question Query -  
MTOP API Question TOP-representation Spoken Language Understanding; TOP representation v1
WebAPI          
WikiSQL Table Question Answer(adopted)/SQL Fully/weakly supervised semantic parsing(SQL provided); Large data v1
WikiTableQuestion Table Question Answer Weakly supervised semantic parsing(using question-answer pairs as supervision); Row sensative(some qa related to row order) v1
CompWebQ Knowledge Graph Question Answer Weakly supervised; Multihop v1
HybridQA Table + Text passages Question Answer Multi-hop; Short-form entity/extractive v1
OTT-QA Table + Text passages Question Answer More open table/text; Extractive ans v1
MultiModalQA Table + Text + Images Question Answer Short-form entity/extractive v1
FeTaQA Table Question Free-Form Answer Free-form answer v1
TAT-QA Table + Text Question Answer(diverse form, including single span, multiple spans and free-form) Context hybrid; Numerical reasoning; Financial  
AIT-QA Complex Table Question Answer Airline industry; Complex table  
HiTab_qa Hierarchical Table Question Answer Hierarchical table; TableQA & Table-to-Text  
WikiSQL-TS_WikiTQ-TS          
FinQA Table Question Answer Numerical reasoning; Financial data  
MULTIHIERTT Multiple Hierarchical Table & Text Question Answer Large-scale; Build from financial reportes  
WebQA          
CFQ          
E2E Table None Text Text generation; Restaurant domain  
WebNLG Knowledge Graph(triples) None Text Text generation; RDF Triples  
DART Triples None Text Text generation; Large data; E2E and WebNLG contained v1
ToTTo Highlighted Table None Text Highlighted Table; Text generation v1
LogicNLG Table None Logical Natural Language Generation Logical NL generation  
HiTab_NLG Hierarchical Table None Text Hierarchical table; TableQA & Table-to-Text  
MultiWoZ Ontology Dialogue Dialogue State Dialog system v1
KVRET(SMD) Table Statement Boolean Dialogue system; Each dialogue has a seperate table as kb v1
SParC Database Multi-turn query SQL Fully supervised semantic parsing; Multi-turn v1
CoSQL Database Dialog SQL Fully supervised semantic parsing; Dialogue v1
SQA(MSR SQA) Table Multi-turn query Answer Weakly supervised semantic parsing; Sequential v1
SMCALFLOW          
HybirdDial          
TabFact Table Statement Boolean NL inference; Large data v1
FEVEROUS Table + Text Statement Boolean NL inference; Large data v1
SQL2Text Optional Database SQL Text High-fidelity NLG v1
Logic2Text Table Schema Python-like program Text High-fidelity NLG v1

Semantic Parsing

Spider

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. EMNLP-18

Single Domain Text2SQL

Improving Text-to-SQL Evaluation Methodology. ACL-18

Squall

On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries. EMNLP-20

KaggleKBQA

KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers. ACL-21

Spider-Syn

Towards Robustness of Text-to-SQL Models against Synonym Substitution. ACL-21

Spider-DK

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization. EMNLP-21

SEDE

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data. NLP4Prog-21

Break

Break It Down: A Question Understanding Benchmark. TACL-20

GrailQA

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases. WWW-21

WebQSP

The Value of Semantic Parse Labeling for Knowledge Base Question Answering. ACL-2016

Russ

Grounding Open-Domain Instructions to Automate Web Support Tasks. NAACL-2021

MTOP

MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark. EACL-21

WebAPI

Compositional Generalization for Natural Language Interfaces to Web APIs. arxiv-21


Question Answering

WikiSQL

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arixv-17

WikiTableQuestion

Compositional Semantic Parsing on Semi-Structured Tables. ACL-15

Comments The 5-fold validation evaluation in origianl dataset is depracated by latest works. The 1st fold of train set and dev set are used as train set and dev set.

CompWebQ

The Web as a Knowledge-base for Answering Complex Questions. NAACL-18

HybridQA

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. EMNLP-20

OTT-QA

Open Question Answering over Tables and Text. ICLR-21

MultiModalQA

MultiModalQA: Complex Question Answering over Text, Tables and Images. ICLR-21

FeTaQA

FeTaQA: Free-form Table Question Answering. ICLR-18

TAT-QA

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance. ACL-21

AIT-QA

AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry. arxiv-21

HiTab

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation. arxiv-21

WikiSQL-TS_WikiTQ-TS

Topic Transferable Table Question Answering. EMNLP-21

Fin-QA

FinQA: A Dataset of Numerical Reasoning over Financial Data. EMNLP-21

WebQA

WebQA: Multihop and Multimodal QA. arxiv-21

MULTIHIERTT

MULTIHIERTT: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data. ACL-22

CFQ

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data. ICLR-20


Data-to-Text

E2E

The E2E Dataset: New Challenges For End-to-End Generation. SIGDIAL-17

Comments There is another E2E cleaned version released by authors.

WebNLG

The WebNLG Challenge: Generating Text from RDF Data. INLG-17

Comments WebNLG challenge has many datasets available。 There is a useful link for summarization of this.

Table-to-Text

Table-to-Text: Describing Table Region with Natural Language. AAAI-18

DART

DART: Open-Domain Structured Data Record to Text Generation. ICLR-21

ToTTo

ToTTo: A Controlled Table-To-Text Generation Dataset. ICLR-21

LogicNLG

Logical Natural Language Generation from Open-Domain Tables. ACL-20

HiTab

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation. arxiv-21


Conversational

MultiWoZ

MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. EMNLP-18

KVRET(SMD)

Key-Value Retrieval Networks for Task-Oriented Dialogue. SIGdial-17

Comments KVRET is also called SMD(Stanford Multi-Domain task-oriented dialogue dataset). The de-facto widely used version of this dataset is the pre-processed verison in Mem2seq.

SParC

SParC: Cross-Domain Semantic Parsing in Context. ACL-19

CoSQL

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases. EMNLP-19

SQA(MSR SQA)

Search-based Neural Structured Learning for Sequential Question Answering. ACL-17

SMCALFLOW

Task-Oriented Dialogue as Dataflow Synthesis. TACL-20

HybirdDial

HYBRIDIALOGUE: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data. ACL-22


Fact Verification

TabFact

TabFact: A Large-scale Dataset for Table-based Fact Verification. ICLR-20

FEVEROUS

The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) Shared Task. EMNLP-21


Formal-Language-to-Text

SQL2Text

Logic-Consistency Text Generation from Semantic Parses.ACL-21

Logic2Text

Logic2Text: High-Fidelity Natural Language Generation from Logical Forms. EMNLP-20