The Switchboard Corpus of Transcribed Dialogue, distributed through the Linguistic Data Consortium, contains 95 hours of sound recordings of 1,155 conversations composed of 115,000 speaker turns. It was previously annotated for dialogue acts according to the DAMSL scheme. In this project, it was re-annotated according to the new international standard for dialogue act annotation certified by the International Organization for Standardization (ISO), of which Alex Fang is one of the authors.
The project was carried out in order to construct an appropriately annotated corpus of spoken dialogues in order to identify the semantics and communicative functions of the spoken utterance, to understand better the interactive mechanism of human speech, and to help design a computational model of speech-mediated interaction. More specifically, the resource will generate useful knowledge to enhance the naturalness of human-machine dialogues, to reduce dysfluencies in human–computer dialogue, to produce a better design of the dialogue manager of interactive dialogue systems, and to bring greater user satisfaction.
Linguistic insight was applied automatically to the transcriptions and achieved 94.2% coverage of the corpus. The remainder was annotated manually.
Top-15ISO DAs | Tokens | Token% | Cum% | |
---|---|---|---|---|
Inform | 1,265,448 | 82.88 | 82.88 | |
AutoPositive | 66,501 | 4.35 | 87.23 | |
PropositionalQuestion | 39,602 | 2.60 | 89.83 | |
SetQuestion | 15,841 | 1.04 | 90.87 | |
Answer | 11,784 | 0.77 | 91.64 | |
CheckQuestion | 11,053 | 0.72 | 92.36 | |
InitialGoodbye | 9,442 | 0.62 | 92.98 | |
Question | 5,068 | 0.33 | 93.31 | |
ChoiceQuestion | 4,502 | 0.29 | 93.60 | |
Completion | 3,188 | 0.21 | 93.81 | |
Stalling | 2,998 | 0.20 | 94.01 | |
Disconfirm | 1,660 | 0.11 | 94.12 | |
AutoNegative | 798 | 0.05 | 94.17 | |
Offer | 590 | 0.04 | 94.21 | |
AcceptApology | 358 | 0.02 | 94.23 | |
1,438,833 | 94.23 | |||
The fact that over 94% of the transcriptions can be dealt with automatically in the project demonstrates that linguistic knowledge, when properly applied through computer technologies, translates into a powerful instrument that helps to reduce time and labour and to increase accuracy and efficiency, hence maximising productivity.