CMU Sphinx 4: A Good Speech Recognition Utility

Sphinx-4 is a state-of-the-art speech recognition system written entirely in the JavaTM programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).

Sphinx-4 started out as a port of Sphinx-3 to the Java programming language, but evolved into a recognizer designed to be much more flexible than Sphinx-3, thus becoming an excellent platform for speech research.

How to use it??

Required Software

Sphinx-4 has been built and tested on the Solaris TM Operating Environment, Mac OS X, Linux and Win32 operating systems. Running, building, and testing Sphinx-4 requires additional software. Before you start, you will need the following software available on your machine.

Java SE 6 Development Kit or better. Go to java.sun.com, and select “J2SE” from popular downloads. At the time of writing, the latest release version is JDK 6 Update 14, which is the one we recommend.

Ant 1.6.0 or better, available at ant.apache.org. The site has a manual with instructions on how to download, install, and use ant. You will only need ant if you wish to build Sphinx-4 from the source distribution.
Subversion (svn), but only if you want to interact directly with the svn tree (which we recommend). The canonical places to get it is subversion.tigris.org. If you are using Windows, your best choice is to install cygwin, which will give you a linux-like environment in a command prompt window. Make sure to choose “svn” when you install cygwin.

Downloading Sphinx-4
Instructions for retrieving code from a release package.
Sphinx-4 has two packages available for download:

sphinx4-{version}-bin.zip: provides the jar files, documentation, and demos
sphinx4-{version}-src.zip: provides the sources, documentation, demos, unit tests and regression tests.

After you have downloaded the distribution, unjar the ZIP files using the jar command which is in the bin directory of your Java installation:

jar xvf sphinx4-{version}-bin.zip
jar xvf sphinx4-{version}-src.zip

For both downloads, a directory called “sphinx4-{version}” will be created.

There are also the RM1 acoustic model, and HUB4 acoustic and language models, available for download at the same location on SourceForge. Download them only if you want to run the regression tests for RM1 and HUB4.

Instructions for retrieving code from the svn repository
If you want to be able to get the latest updates from the svn repository, you should retrieve the code from the repository on SourceForge. The Sphinx-4 code is located at sourceforge.net as open source. Please follow the instructions below to retrieve it.

Get the code from sourceforge.net:

% svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx4

Building Sphinx-4

Since the sphinx4-{version}-bin.zip distribution does not contain the source code, you must download the sphinx4-{version}-src.zip, or retrieved the code from SourceForge using svn, in order to be able to build from the sources. The software required for building Sphinx-4 are listed in the Required Software section.

Setup JSAPI 1.0

Before you build Sphinx-4, it is important to setup your environment to support the Java Speech API (JSAPI), because a number of tests and demos rely on having JSAPI installed.

Run ant

To build Sphinx-4, at the command prompt change to the directory where you installed Sphinx-4 (usually, a simple “cd sphinx4” will do). Set required environment variables. JAVA_HOME to the location of JDK, ANT_HOME to the location of ant and and PATH to include both bin subfolder of JDK and bin subfolder of ant variables. For example:

export JAVA_HOME=/usr/local/jdk1.6.0_14
export ANT_HOME=/usr/local/apache-ant-1.8.0
export PATH=/usr/local/jdk1.6.0_10/bin:/usr/local/apache-ant-1.8.0/bin:$PATH

Then type the following:

ant

This executes the Apache Ant command to build the Sphinx-4 classes under the bld directory, the jar files under the lib directory, and the demo jar files under the bin directory.

To delete all the output from the build to give you a fresh start:

ant clean

Create Javadocs

The javadocs have already been built if you downloaded the sphinx4-{version}-bin.zip. In order to build the javadocs yourself, you must download the sphinx4-{version}-src.zip distribution instead. To build the javadocs, go to the top level directory (“sphinx4-{version}”), and type:

ant javadoc

This will build javadocs from public classes, displaying only the public methods and fields. In general, this is all the information you will need. If you need more details, such as private or protected classes, you can generate the corresponding javadoc by doing, for example:

ant -Daccess=private javadoc


Demos

Sphinx-4 contains a number of demo programs. If you downloaded the binary distribution (sphinx4-{version}-bin.zip), the JAR files of the demos are already built, so you can just run them directly. However, if you downloaded the source distribution (sphinx4-{version}-src.zip or via svn), you need to build the demos. Click on the links below for instructions on how to build and run the demos.

Simple demos to start with sphinx4

Hello World Demo: a command line application that recognizes simple phrases.

Hello N-Gram Demo: a command line application using an N-gram language model for speech recognition

Demos for audio file transcription

Transcriber Demo: a simple demo program showing how to transcribe a continuous audio file that has multiple utterances separated by silences.
Confidence Demo: a simple demo program showing how to obtain confidence scores for result.

Lattice Demo: a simple demo program showing how to extract lattices from recognition results.

Class-Based Language model Demo: a simple demo of the class based language model.

Aligner Demo: aligns audio file to transcription and get times of words. Can be useful for closed captioning.

Dialog demos to write advanced dialog system

ZipCity Demo: a Java Web Start technology application that recognizes spoken zip codes and locates the associated city and state.
JSGF Demo: a simple demo program showing how a program can swap between multiple JSGF grammars.
Dialog Demo: a demo program showing how a program can swap between multiple JSGF and dictation grammars.
Action Tags Demo: a demo program showing how to use action tags for post-processing of RuleParse objects obtained from JSGF grammars.
There is also a live-mode test program (this link only works if you downloaded the source distribution), which is available if you download the sphinx-src-{version}.zip file but not available in the sphinx-bin-{version}.zip file.

The AudioTool is a visual tool that records and displays the waveform and spectrogram of an audio signal. It is available in both the binary and source releases.

Use it Enjoy…..

Advertisements
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: