Abstract:
Labeling of speech corpora is an integral part of spoken language
research where reliable and accurate automated labeling is a highly
desirable yet very difficult goal. In this paper we describe some of
the different methods that we have developed and applied to this
problem area. One method is based on using seed databases so that a
given orthographic (textual) input is used to find the best diphone
contexts in a seed database and then to apply feature-based
match/alignment to find the most probable phone boundaries. Other
approaches are based on specialized neural networks which are trained
on the same seed databases to implicitly learn segment boundaries.
Different post-processing methods, based on rules and search
strategies, are used to obtain the forced final labeling. The
different methods are compared to one another in terms of accuracy
and computational terms, as well as to the typical performance
achieved with standard hidden Markov model based labeling systems.