Difference between revisions of "Parallel Annotation of Speech and Text"
(→Description of the material) |
(→Description of the material) |
||
Line 20: | Line 20: | ||
<flashmp3>PSTA01.mp3</flashmp3> | <flashmp3>PSTA01.mp3</flashmp3> | ||
− | + | <br> | |
− | File download for viewing in the Praat: | + | <br> |
+ | File download for viewing in the Praat([[#Downloading Help|Downloading Help]]): | ||
*[[File:PSTA01.mp3|Sound]], | *[[File:PSTA01.mp3|Sound]], | ||
*[[Media:PSTA01.txt| TextGrid]] | *[[Media:PSTA01.txt| TextGrid]] | ||
+ | |||
Line 30: | Line 32: | ||
<flashmp3>PSTA02.mp3</flashmp3> | <flashmp3>PSTA02.mp3</flashmp3> | ||
+ | <br> | ||
+ | <br> | ||
+ | File download for Praat ([[#Downloading Help|Downloading Help]]): | ||
− | + | * [[Media:PSTA02.mp3|Sound]], | |
− | + | * [[Media:PSTA02.txt|TextGrid]] | |
− | + | ||
− | + | <Phrase>10905</Phrase> | |
+ | <flashmp3>PSTA03.mp3</flashmp3> | ||
+ | <br> | ||
+ | <br> | ||
+ | File download for Praat ([[#Downloading Help|Downloading Help]]): | ||
− | + | *[[Media:PSTA03.mp3|Sound]], | |
+ | *[[Media:PSTA03.txt|TextGrid]] | ||
− | |||
− | + | ==Downloading Help== | |
==Speaker Dialect: Trondheim== | ==Speaker Dialect: Trondheim== |
Revision as of 21:51, 18 March 2010
This page is under construction
Contents
Project Description
Goal of this short pilot has been parallel sound and text annotation. The study has been conducted by Professor Wim van Dommelen and Assc.Professor Dorothee Beermannat the Institute of Languages and Communication Studies at the Norwegian University of Science and Technology.Scientific assistant for the project was Asger Hagerup. The project has been funded by the SSTL.
The pilot investigated integrated presentations of linguistically annotated audio and text material, combining Praat and TypeCraft.
Praat is a signal analysis software developed by Paul Boersma and David Weenink from the University of Amsterdam. It is a tool widely used for the annotation of sound objects. For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object.Specifying a sentence tier allowed us easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories.
Description of the material
For our study we selected 10 sentences from the phonetic database of the Sound to Sense project.
Sentences 1 to 3
Speaker dialect: Bergen
Jeg |
e |
1SG |
PN |
ser | |
se: | r |
see | PRES |
V |
bildet | |
bild | e |
picture | DEFSG |
N |
kan |
kan: |
canPRES |
V |
du |
ʉ |
2SG |
CL |
si |
si: |
sayINF |
V |
litt |
lit: |
a.little |
ADVm |
på |
po |
onDIR |
PREP |
skrått | |
skro: | t |
diagonal | ADJ>ADV |
ADVm |
ned |
ned |
downDIR |
ADVm |
ovenifra |
ovenifra |
from.aboveDIRSRC |
ADVm |
File download for viewing in the Praat(Downloading Help):
Det |
de |
3SGNEUT |
PN |
dekker | |
dek: | er |
cover | PRES |
V |
omtrent |
umtrent |
approximately |
ADVm |
hele | |
he:l | e |
whole | DEF |
ADJ |
det |
de |
DEFSGNEUT |
ART |
venstre |
venstre |
left |
ADVm |
mest |
mest |
mostSUP |
ADJ |
altså |
aso |
that.isDM |
ADVm |
venstreste | ||
venstre | st | e |
left | SUPMU | DEF |
ADJ |
kortsiden | ||
kort | sid | en |
short | side | DEFSG |
N |
File download for Praat (Downloading Help):
Hun |
hun |
3SGFEM |
PN |
står | |
sto: | r |
stand | PRES |
V |
med |
med |
withMNR |
PREP |
ryggen | |
ryɡ: | en |
back | DEFSG |
N |
mot |
mut |
againstDIR |
PREP |
veggen | |
veɡ: | en |
wall | DEFSG |
N |
opp |
up |
upDIRMU |
PREP |
og |
o |
and |
CONJC |
ser | |
se: | r |
see | PRES |
V |
på |
po |
atDIR |
PREP |
han |
han |
3SGMASC |
PN |
som |
som |
PNrel |
skal |
skal: |
shallPRES |
V |
kaste | |
kast | e |
throw | INF |
V |
ballen | |
bal: | en |
ball | DEFSG |
N |
som |
som |
PNrel |
står | |
sto: | r |
stand | PRES |
V |
utenfor |
ʉtenfor |
outside |
ADVm |
og |
o |
and |
CONJC |
peker | |
pe:k | er |
point | PRES |
V |
på |
po |
atDIR |
PREP |
boksene | |
boks | ene |
box | DEFPL |
N |
File download for Praat (Downloading Help):
Downloading Help
Speaker Dialect: Trondheim
Parallel Processing of Speech and Text Data - Part 2
Speaker Dialect:
Parallel Processing of Speech and Text Data - Part 3
About the TextGrid files
The TextGrid files are opened together with the matching sound files for viewing in the Praat application. The TextGrid files consist of three tiers, 'Word' (rendered in Bokmål orthography) 'Phoneme' (shows underlying segments) and 'Note' (shows surface realisation with IPA symbols, and other notes).
Here is a list of glosses used in the 'Note' tier:
Phonology/Phonetics:
BrV = Segent realised with breathy voice
CrV = Segent realised with creaky voice
DV = Underlying voiced segment realised devoiced
EPN = Epenthesis
RD = Reduction of segment (e.g. corner vowel realised as schwa or plosive as fricative).
V = Underlying non-voiced segment realised voiced
Morphophonology/Syntax
CL = Clitic
Other
ERR = The speaker errs and corrects himself
HES = (Audible) hesitation from speaker