This month you can read about an exciting inverse problem on the new issue of Physics World. Professor Samuli Siltanen from University of Helsinki was interviewed regarding the current research in Finland for synthesizing speech and creating artificial voice. This study is part of a big research project in Finland, connecting University of Helsinki, University of Tampere, Aalto University and University of Eastern Finland, funded by the Academy of Finland. I have decided to publish a series of posts explaining the motivation and the state of the art.
If you have read my blog for a while, you would know that I like research projects that show a potential impact on society, especially health care. In Finland alone, there are 60.000 people affected by speech impairments and every year, often due to strokes, cancer or circulatory problems, about 3.000 lose their ability to speak. To give another figure, the annual cost for teachers absent as temporarily affected by voice-related problems, is 600 million dollars. In the US, NIDCD has estimated that 6 to 8 million people have some speech impairment (2,4% of all population) and 1 million out of those is affected by aphasia, being unable to speak. About 80.000 people acquire aphasia every year in the US. (*)
Artificial voice exists already (everybody has heard of Stephen Hawking's voice simulator device) but so far, there are no efficient systems to simulate women and children voices, thus forcing to speak with a grown man's voice (at best sounding high on helium), not to mention the problem of embedding emotions in the tone. Voice is a fundamental aspect of our personal identity and such research projects aim at creating a more complete mathematical model for human speech. The medical application is my favourite, but not the only one. Think how such achievement may improve personal assistant apps, information announcements or automatised call centers.
The approach of this research is to analyse and then model mathematically the human speech, so that it can be efficiently reproduced by computer programmes. The alternative approach, that is widely used but often sounds very unnatural, is to collect a huge amount of data (i.e. real people pronouncing a lot of words, in different languages).
In the next posts, I will try to explain how our body reproduces our voice and how mathematics enters the picture.
(*) Source: NIDCD statistics.