Fonetiikan Päivät 2002

ABSTRACT

A new database project has been launched at the Institute of Cybernetics this year. It aims the collection of telephone speech from a large number of speakers for speech and speaker recognition purposes. At least 1000 speakers are expected to participate in recordings. SpeechDat databases (http://www.speechdat.org), especially Finnish SpeechDat, has been chosen as a prototype for the Estonian database. It means that principles of corpus design, file formats, recording and labeling methods implemented by SpeechDat consortium will be followed as closely as possible. The automatic recording system and labeling software developed by SpeechDat partners have been adopted for Estonian, as well.

The main characteristics of the Estonian SpeechDat database will be as follows:

Sampling rate: 8 kHz

Signal format: 8-bit A-law, mono

Signal source: calls from fixed and cellular phones

Calling environment: home/office, public place

Speakers: at least 1000 (500 female, 500 male)

Speech items: isolated digits, connected digits, natural numbers, money amounts, spelled words, time phrases, date phrases, yes/no questions, names, application words, phonetically rich words, application phrases, phonetically rich sentences.

The duration of the project has been planned for 24 months divided into four main stages:

Preparatory activities (6 months)

Recordings (4-6 months)

Segmentation and labeling (10-12 months)

Completion (4-6 months)

Currently, the preparatory activities are completed and intensive recording period is about to begin.

In our presentation different aspects of the first stage of the project will be discussed and up to date progress will be reported.