Benchmark Datasets for Structured Knowledge Grounding
In progress
We present a comprehensive collection of datasets of Structured Knowledge Grounding.
Table of contents
- Semantic Parsing
- Question Answering
- Data-to-Text
- Conversational
- Fact Verification
- Formal-Language-to-Text
- Other Related Datasets
Summary
Dataset | Knowledge | User Input | Output | Keywords | Contain in UnifiedSKG |
Spider | Database | Question | SQL | Fully supervised; Cross-domain; Single turn | v1 |
Single Domain Text2SQL | Database | Question | SQL | Fully supervised; Single turn | |
Squall | Database(converted from table) | Question | SQL | Fully supervised from 80% of WikiTQ | |
KaggleDBQA | Database | Question | SQL | Fully supervised; Realistic | |
Spider-Syn | Database | Question | SQL | Fully supervised; Robustness | |
Spider-DK | Database | Question | SQL | Fully supervised; Annotation of types of domain knowledge text-to-SQL needed. | |
SEDE | Database | Question | SQL | Fully supervised; Real usage on the Stack Exchange website | |
Break | |||||
GrailQA | Knowledge Graph | Question | S-Expression | Large, 64k; Test generalization: i.i.d./compositional/zero-shot | v1 |
WebQSP | Knowledge Graph | Question | S-Expression | - | v1 |
Russ | API | Question | Query | - | |
MTOP | API | Question | TOP-representation | Spoken Language Understanding; TOP representation | v1 |
WebAPI | |||||
WikiSQL | Table | Question | Answer(adopted)/SQL | Fully/weakly supervised semantic parsing(SQL provided); Large data | v1 |
WikiTableQuestion | Table | Question | Answer | Weakly supervised semantic parsing(using question-answer pairs as supervision); Row sensative(some qa related to row order) | v1 |
CompWebQ | Knowledge Graph | Question | Answer | Weakly supervised; Multihop | v1 |
HybridQA | Table + Text passages | Question | Answer | Multi-hop; Short-form entity/extractive | v1 |
OTT-QA | Table + Text passages | Question | Answer | More open table/text; Extractive ans | v1 |
MultiModalQA | Table + Text + Images | Question | Answer | Short-form entity/extractive | v1 |
FeTaQA | Table | Question | Free-Form Answer | Free-form answer | v1 |
TAT-QA | Table + Text | Question | Answer(diverse form, including single span, multiple spans and free-form) | Context hybrid; Numerical reasoning; Financial | |
AIT-QA | Complex Table | Question | Answer | Airline industry; Complex table | |
HiTab_qa | Hierarchical Table | Question | Answer | Hierarchical table; TableQA & Table-to-Text | |
WikiSQL-TS_WikiTQ-TS | |||||
FinQA | Table | Question | Answer | Numerical reasoning; Financial data | |
MULTIHIERTT | Multiple Hierarchical Table & Text | Question | Answer | Large-scale; Build from financial reportes | |
WebQA | |||||
CFQ | |||||
E2E | Table | None | Text | Text generation; Restaurant domain | |
WebNLG | Knowledge Graph(triples) | None | Text | Text generation; RDF Triples | |
DART | Triples | None | Text | Text generation; Large data; E2E and WebNLG contained | v1 |
ToTTo | Highlighted Table | None | Text | Highlighted Table; Text generation | v1 |
LogicNLG | Table | None | Logical Natural Language Generation | Logical NL generation | |
HiTab_NLG | Hierarchical Table | None | Text | Hierarchical table; TableQA & Table-to-Text | |
MultiWoZ | Ontology | Dialogue | Dialogue State | Dialog system | v1 |
KVRET(SMD) | Table | Statement | Boolean | Dialogue system; Each dialogue has a seperate table as kb | v1 |
SParC | Database | Multi-turn query | SQL | Fully supervised semantic parsing; Multi-turn | v1 |
CoSQL | Database | Dialog | SQL | Fully supervised semantic parsing; Dialogue | v1 |
SQA(MSR SQA) | Table | Multi-turn query | Answer | Weakly supervised semantic parsing; Sequential | v1 |
SMCALFLOW | |||||
HybirdDial | |||||
TabFact | Table | Statement | Boolean | NL inference; Large data | v1 |
FEVEROUS | Table + Text | Statement | Boolean | NL inference; Large data | v1 |
SQL2Text | Optional Database | SQL | Text | High-fidelity NLG | v1 |
Logic2Text | Table Schema | Python-like program | Text | High-fidelity NLG | v1 |
Semantic Parsing
Spider
Single Domain Text2SQL
Improving Text-to-SQL Evaluation Methodology. ACL-18
Squall
On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries. EMNLP-20
KaggleKBQA
KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers. ACL-21
Spider-Syn
Towards Robustness of Text-to-SQL Models against Synonym Substitution. ACL-21
Spider-DK
Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization. EMNLP-21
SEDE
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data. NLP4Prog-21
Break
Break It Down: A Question Understanding Benchmark. TACL-20
GrailQA
Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases. WWW-21
WebQSP
The Value of Semantic Parse Labeling for Knowledge Base Question Answering. ACL-2016
Russ
Grounding Open-Domain Instructions to Automate Web Support Tasks. NAACL-2021
MTOP
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark. EACL-21
WebAPI
Compositional Generalization for Natural Language Interfaces to Web APIs. arxiv-21
Question Answering
WikiSQL
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arixv-17
WikiTableQuestion
Compositional Semantic Parsing on Semi-Structured Tables. ACL-15
Comments The 5-fold validation evaluation in origianl dataset is depracated by latest works. The 1st fold of train set and dev set are used as train set and dev set.
CompWebQ
The Web as a Knowledge-base for Answering Complex Questions. NAACL-18
HybridQA
HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. EMNLP-20
OTT-QA
Open Question Answering over Tables and Text. ICLR-21
MultiModalQA
MultiModalQA: Complex Question Answering over Text, Tables and Images. ICLR-21
FeTaQA
FeTaQA: Free-form Table Question Answering. ICLR-18
TAT-QA
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance. ACL-21
AIT-QA
AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry. arxiv-21
HiTab
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation. arxiv-21
WikiSQL-TS_WikiTQ-TS
Topic Transferable Table Question Answering. EMNLP-21
Fin-QA
FinQA: A Dataset of Numerical Reasoning over Financial Data. EMNLP-21
WebQA
WebQA: Multihop and Multimodal QA. arxiv-21
MULTIHIERTT
MULTIHIERTT: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data. ACL-22
CFQ
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data. ICLR-20
Data-to-Text
E2E
The E2E Dataset: New Challenges For End-to-End Generation. SIGDIAL-17
Comments There is another E2E cleaned version released by authors.
WebNLG
The WebNLG Challenge: Generating Text from RDF Data. INLG-17
Comments WebNLG challenge has many datasets available。 There is a useful link for summarization of this.
Table-to-Text
Table-to-Text: Describing Table Region with Natural Language. AAAI-18
DART
DART: Open-Domain Structured Data Record to Text Generation. ICLR-21
ToTTo
ToTTo: A Controlled Table-To-Text Generation Dataset. ICLR-21
LogicNLG
Logical Natural Language Generation from Open-Domain Tables. ACL-20
HiTab
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation. arxiv-21
Conversational
MultiWoZ
MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. EMNLP-18
KVRET(SMD)
Key-Value Retrieval Networks for Task-Oriented Dialogue. SIGdial-17
Comments KVRET is also called SMD(Stanford Multi-Domain task-oriented dialogue dataset). The de-facto widely used version of this dataset is the pre-processed verison in Mem2seq.
SParC
SParC: Cross-Domain Semantic Parsing in Context. ACL-19
CoSQL
SQA(MSR SQA)
Search-based Neural Structured Learning for Sequential Question Answering. ACL-17
SMCALFLOW
Task-Oriented Dialogue as Dataflow Synthesis. TACL-20
HybirdDial
HYBRIDIALOGUE: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data. ACL-22
Fact Verification
TabFact
TabFact: A Large-scale Dataset for Table-based Fact Verification. ICLR-20
FEVEROUS
Formal-Language-to-Text
SQL2Text
Logic-Consistency Text Generation from Semantic Parses.ACL-21
Logic2Text
Logic2Text: High-Fidelity Natural Language Generation from Logical Forms. EMNLP-20