…
W!0+ 那個世代,每個小朋友都會寫程式,
祇是所用的語言工具,
不同於這個年代。
─── Mrphs
聽聞今年牛津年度『風雲字』是
,一個喜極而泣的表情符號。若問這是一個『字』嗎?或許凡是『傳情達意』之符號,都可為『文』。既然天下能識能用,為什麼不可以說是個『字』的呢!曾經『米雕』風光一時,
米雕是一種在大米上寫字,畫畫並配飾成的飾品。米雕興旺於2000年後,由哈爾濱一民間藝人創立(以前街頭藝人稱之為米上刻字,源起何時不詳)。民間有傳說,但無正史可考。傳說宋徽宗年間有一趕窮考秀才進京趕考,名落孫山,盤纏用完了饑渴之極突發奇想 ,當街用糯米在其上寫人名和一個福字,不想求字之人甚多,收入頗豐。一年後居然成為一名米糧商人,富錦還鄉。
,那『米里乾坤』是否為藝術的耶?!就像有人說『歲末』,有人講『跨年』,莫非這『歲』與『年』竟成了異物,說講不到一塊去了乎!?
所以『形聲字譜』能形大千世界之狀,能象自然萬有之聲,也能譜寰宇眾生之情。將之用於『溝通彼此』豈非備矣哉。
若想了解『 gnuspeech 』是什麼?最好聽聽官網怎麼講︰
gnuspeech makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. gnuspeechsa is a cross-platform module of gnuspeech that allows command line, or application-based speech output. The software has been released as two tarballs that are available in the project Downloads area of http://savannah.gnu.org/projects/gnuspeech. Those wishing to contribute to the project will find the OS X (gnuspeech) and CMAKE (gnuspeechsa) sources in the Git repository on that same page. The gnuspeech suite still lacks some of the database editing components (see the Overview diagram below) but is otherwise complete and working, allowing articulatory speech synthesis of English, with control of intonation and tempo, and the ability to view the parameter tracks and intonation contours generated. The intonation contours may be edited in various ways, as described in the Monet manual. Monet provides interactive access to the synthesis controls. TRAcT provides interactive access to the underlying tube resonance model that converts the parameters into sound by emulating the human vocal tract.
The suite of programs uses a true articulatory model of the vocal tract and incorporates models of English rhythm and intonation based on extensive research that sets a new standard for synthetic speech.
The original NeXT computer implementation is complete, and is available from the NeXT branch of the SVN repository linked above. The port to GNU/Linux under GNUStep, also in the SVN repository under the appropriate branch, provides English text-to-speech capability, but parts of the database creation tools are still in the process of being ported.
Credits for research and implementation of the gnuspeech system appear the section Thanks to those who have helped below. Some of the features of gnuspeech, with the tools that are part of the software suite, tools include:
It is a play on words. This is a new (g-nu) “event-based” approach to speech synthesis from text, that uses an accurate articulatory model rather than a formant-based approximation. It is also a GNU project, aimed at providing high quality text-to-speech output for GNU/Linux, Mac OS X, and other platforms. In addition, it provides comprehensive tools for psychophysical and linguistic experiments as well as for creating the databases for arbitrary languages.
The goal of the project is to create the best speech synthesis software on the planet.
由於作者沒有 MAC OSX 的環境,此處僅僅依據 gnuspeechsa-0.1.5.tar.gz 內之INSTALL 文件,驗證樹莓派上的安裝如下︰
mkdir gnuspeech cd gnuspeech/ # 取得軟體 wget http://ftp.gnu.org/gnu/gnuspeech/gnuspeechsa-0.1.5.tar.gz tar -zxvf gnuspeechsa-0.1.5.tar.gz # 編譯及安裝 cd gnuspeechsa-0.1.5/ pkg_dir=pkg_dir make sudo make install sudo ldconfig # 測試 ./gnuspeech_sa -c $pkg_dir/data/en -p /tmp/test_param.txt -o /tmp/test.wav "He llo world." && aplay -q /tmp/test.wav
有關這個程式的簡介,讀者可以參考
README
GnuspeechSA (Stand-Alone)
==========================
GnuspeechSA is a port to C++/C of the TTS_Server in the original Gnuspeech (http://www.gnu.org/software/gnuspeech/) source code written for NeXTSTEP.
It is a command-line program that converts text to speech.
This project is based on code from Gnuspeech SVN, rev. 672, downloaded in 2014-08-02. The source code was obtained from the directories:
nextstep/trunk/ObjectiveC/Monet.realtime
nextstep/trunk/src/SpeechObject/postMonet/server.monet
This software is part of Gnuspeech.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the COPYING file for more details.
雖說 gnuspeech-0.9.tar.gz 的編譯需要 MAC OSX 的環境,但是其中有些重要文件值得一讀,因此也建議讀者取得︰
wget http://ftp.gnu.org/gnu/gnuspeech/gnuspeech-0.9.tar.gz
至於要怎麼玩,尚請讀者自行方便。作者亦是新手,或容改日再談的了。
如果蟲鳴鳥叫是天生本能,那麼人類講話也是天賦自然。但是萬物發聲之『物理模型』卻很難建造。因此
Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory speech synthesizer; uses these to drive an articulatory model of the human vocal tract producing an output suitable for the normal sound output devices used by various computer operating systems; and does this at the same or faster rate than the speech is spoken for adult speech.
The synthesizer is a tube resonance, or waveguide, model that models the behavior of the real vocal tract directly, and reasonably accurately, unlike formant synthesizers that indirectly model the speech spectrum.[1] The control problem is solved by using René Carré’s Distinctive Region Model[2] which relates changes in the radii of eight longitudinal divisions of the vocal tract to corresponding changes in the three frequency formants in the speech spectrum that convey much of the information of speech. The regions are, in turn, based on work by the Stockholm Speech Technology Laboratory[3] of the Royal Institute of Technology (KTH) on “formant sensitivity analysis” – that is, how formant frequencies are affected by small changes in the radius of the vocal tract at various places along its length.[4]
或許代表一種『聲音合成』的未來。其中『聲道』
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source (larynx in mammals; syrinx in birds) is filtered.
In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of the esophagus, and the beak. In mammals it consists of the laryngeal cavity, the pharynx, the oral cavity, and the nasal cavity.
The estimated average length of the vocal tract in adult male humans is 16.9 cm and 14.1 cm in adult females.[1]
Sagittal section of human vocal tract
模型就是型塑萬物音聲特色的基礎。欣聞
From: | David Hill <drh-AT-firethorne.com> | |
To: | Gnu Announce <info-gnu-AT-gnu.org> | |
Subject: | First release of gnuspeech project software | |
Date: | Mon, 19 Oct 2015 18:41:22 -0700 | |
Message-ID: | <AD48546B-E89C-4F7C-A2C5-D45D5C3C46A3@firethorne.com> | |
Archive-link: | Article, Thread |
gnuspeech-0.9 and gnuspeechsa-0.1.5 first official release
Gnuspeech is new approach to synthetic speech as well as a speech research tool. It comprises a true articulatory model of the vocal tract, databases and rules for parameter composition, a 70,000 word plus pronouncing dictionary, a letter-to-sound fall-back module, and models of English rhythm and intonation, all based on extensive research that sets a new standard for synthetic speech, and computer-based speech research.
There are two main components in this first official release. For those who would simply like speech output from whatever system they are using, including incorporating speech output in their applications, there is the gnuspeechsa tarball (currently 0.1.5), a cross-platform speech synthesis application, compiled using CMake.
For those interested in an interactive system that gives access to the underlying algorithms and databases involved, providing an understanding of the mechanisms, databases, and output forms involved, as well as a tool for experiment and new language creation, there is the gnuspeech tarball (currently 0.9) that embodies several sub-apps, including the interactive database creation system Monet (My Own Nifty Editing Tool), and TRAcT (the Tube Resonance Access Tool) — a GUI interface to the tube resonance model used in gnuspeech, that emulates the human vocal tract and provides the basis for an accurate rendition of human speech.
This second tarball includes full manuals on both Monet and TRAcT. The Monet manual covers the compilation and installation of gnuspeechsa on a Macintosh under OS X 10.10.x, and references the related free software that allows the speech to be incorporated in applications. Appendix D of the Monet manual provides some additional information about gnuspeechsa and associated software that is available, and details how to compile it using CMake on the Macintosh under 10.10.x (Yosemite).
The digitally signed tarballs may be accessed at
http://ftp.gnu.org/gnu/gnuspeech/
There is a list of mirrors at http://www.gnu.org/order/ftp.html and the site http://ftpmirror.gnu.org/gnuspeech will redirect to a nearby mirror
A longer project description and credits may be found at: http://www.gnu.org/software/gnuspeech/
which is also linked to a brief (four page) project history/component description, and a paper on the Tube Resonance Model by Leonard Manzara.
Signed: David R Hill
———————–
drh@firethorne.com
http://www.gnu.org/software/gnuspeech/
http://savannah.gnu.org/projects/gnuspeech
https://savannah.gnu.org/users/davidhill
,不過眼前恐得了解編譯安裝之法。
什麼是『語音合成』 Speech Synthesis 的呢?維基百科上說︰
語音合成是將人類語音用人工的方式所產生。若是將電腦系統用在語音合成上,則稱為語音合成器,而語音合成器可以用軟/硬體所實現。文字轉語音(text-to-speech,TTS)系統則是將一般語言的文字轉換為語音,其他的系統可以描繪語言符號的表示方式,就像音標轉換至語音一 樣。
而合成後的語音則是利用在資料庫內的許多已錄好的語音連接起來 。系統則因為儲存的語音單元大小不同而有所差異,若是要儲存phone以及 diphone的話,系統必須提供大量的儲存空間,但是在語意上或許會不清楚。而用在特定的使用領域上,儲存整字或整句的方式可以達到高品質的語音輸出。 另外,包含了聲道模型以及其他的人類聲音特徵參數的合成器則可以創造出完整的合成聲音輸出。
一個語音合成器的品質通常是決定於人聲的相似度以及語意是否能被了解。一個清晰的文字轉語音程式應該提供人類在視覺受到傷害或是得到失讀症時,能夠聽到並且在個人電腦上完成工作。從80年代早期開始,許多的電腦作業系統已經包含了語音合成器了。
Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1]
Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely “synthetic” voice output.[2]
The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s.
A text-to-speech system (or “engine”) is composed of two parts:[3] a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),[4] which is then imposed on the output speech.
想起最早接觸是在『盲人電腦系統』中之『螢幕閱讀器』軟體裡。依稀記得那時聲音聽起來像 R2-D2 般的有電腦味。就像是『標音』調校的不好的
eSpeak is derived from the “Speak” speech synthesizer for British English for Acorn RISC OS computers which was originally written in 1995 by Jonathan Duddington.
A rewritten version for Linux appeared in February 2006 and a Windows SAPI 5 version in January 2007. Subsequent development has added and improved support for additional languages.
Because of infrequent updates for last few years several espeak forks had emerged on github.[3] After discussions on espeak’s discussion list,[4][5] espeak-ng fork managed by Reece Dunn was decided as a new canonical place of espeak further development.
Because of its small size and many languages, it is included as the default speech synthesizer in the NVDA open source screen reader for Windows, and on the Ubuntu and other Linux installation discs.
The quality of the language voices varies greatly. Some have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.
據聞卡內基美隆大學 Carnegie Mellon University 的『歡宴』
Welcome to festvox.org |
This project is part of the work at Carnegie Mellon University’s speech group aimed at advancing the state of Speech Synthesis.
The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Specifically we offer:
The documentation, tools and dependent software are all free without restriction (commercial or otherwise). Licencing of voices built by these techniques are the responsibility of the builders.This work is firmly grounded within Edinburgh University’s Festival Speech Synthesis System and Carnegie Mellon University’s small footprint Flite synthesis engine This work has been supported be various groups including, Carnegie Mellon University, the US National Science Foundation (NSF), and US Defense Advanced Research Projects Agency (DARPA). |
Requirements for building a voice |
Note the techniques and processes described here do not guarantee that you’ll end up with a high quality acceptable voice, but with a little care you can likely build a new synthesis voice in a supported language in a few days, or in a new language in a few weeks (more or less depending on the complexity of the language, and the desired quality).You will need:
|
語音合成軟體,已經很有人情味的了。這兩套軟體 raspbian jessie 都有,有興趣的讀者可以自行安裝玩玩 。
作者一路追蹤 M♪o 的步伐,終達探尋『關節‧接合‧清晰發音』
Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue, jaw, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.
『形聲字譜』合成系統之玄機矣!!??