[
Japanese
|
English
|
Back to top
]
Last update: "2005/09/29 04:03:21"
JULIAN(1) JULIAN(1)
NAME
Julian - grammar based continuous speech recognition
parser
SYNOPSIS
julian [-C jconffile] [options ...]
DESCRIPTION
Julian is a high-performance, multi-purpose, free speech
recognition parser based on finite state grammar. It is
capable of performing real-time recognition of continuous
speech with over thousands of vocabulary.
Julian is a derived version of Julius, and almost all com-
ponents are the same except language model related part.
To execute a recognition, it needs an acoustic model and a
finite state grammar that describes sentence patterns to
be recognized. The grammar format is an original one, and
tools to create a recognirion grammar are included in the
distribution. For acoustic model, standard format (i.e.
HTK) with any word/phone units and sizes are supported.
So users can build a recognition system customized for
specific tasks using own task grammar and acoustic models.
For details about models and how to write a grammar,
please see the documents contained in this package.
Julian can perform recognition on audio files, live micro-
phone input, network input and feature parameter files.
The maximum size of vocabulary is 65,535 words.
RECOGNITION MODELS
Julian supports the following models.
Acoustic Models
Same as Julius: Sub-word HMM (Hidden Markov
Model) in HTK format are supported. Phoneme
models (monophone), context dependent phoneme
models (triphone), tied-mixture and phonetic
tied-mixture models of any unit can be used.
When using context dependent models, interword
context is also handled. You can further use a
tool mkbinhmm to convert the ascii HMM defini-
tion file to binary format, for speeding up the
startup (this format is incompatible with that
of HTK).
Language model
The grammar format is an original one, and tools
to create a recognirion grammar are included in
the distribution. A grammar consists of two
files: one is a 'grammar' file that describes
sentence structures in a BNF style, using word
'categories' as terminate symbols. Another is a
'voca' file that defines word with its pronunci-
ations (i.e. phoneme sequences) for each cate-
gory. They should be converted by mkdfa.pl(1)
to a deterministic finite automaton file (.dfa)
and a dictionary file (.dict), respectively.
SPEECH INPUT
Same as Julius: Both live speech input and recorded speech
file input are supported. Live input stream from micro-
phone device, DatLink (NetAudio) device and tcpip network
input using adintool is supported. Speech waveform files
(16bit WAV (no compression), RAW format, and many other
format will be acceptable if compiled with libsndfile
library). Feature parameter files in HTK format are also
supported.
Note that Julian itself can only extract MFCC_E_D_N_Z fea-
tures from speech data. If you use an acoustic HMM
trained by other feature type, only the HTK parameter file
of the same feature type can be used.
SEARCH ALGORITHM OVERVIEW
Recognition algorithm of Julian is based on a two-pass
strategy. In the first pass, a high-speed approximate
search is performed using weaker constraints then the
given grammar. Here a LR beam search using only inter-
category constraints extracted from the grammar is per-
formed. The second pass re-searches the input, using the
original grammar rules and intermediate results from the
first pass, to gain a high precision result quickly. In
the second pass the optimal solution is theoretically
guaranteed using the A* search.
When using context dependent phones (triphones), interword
contexts are taken into consideration. For tied-mixture
and phonetic tied-mixture models, high-speed acoustic
likelihood calculation is possible using gaussian pruning.
For more details, see the related document or web page
below.
OPTIONS
The options below specify the models, system behaviors and
various search parameters. These option can be set all at
once at the command line, but it is recommended that you
write them in a text file as a "jconf file", and specify
the file with "-C" option.
Most are the same as Julius.
Options only in Julian: -gram, -gramlist, -dfa, -penalty1,
-penalty2, -looktrellis
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2,
-transp, -silhead, -siltail, -spdur, -sepnum, -sepa-
ratescore
Speech Input
-input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
Select speech data input source. 'rawfile' is
waveform file, and specified after startup from
stdin). 'mic' means microphone device, and 'adin-
net' means receiving waveform data via tcpip net-
work from an adinnet client. 'netaudio' is from
DatLink/NetAudio input, and 'stdin' means data
input from standard input.
WAV (no compression) and RAW (noheader, 16bit,
BigEndian) are supported for waveform file input.
Other format can be supported using external
library. To see what format is actually supported,
see the help message using option "-help". For
stdin input, only WAV and RAW is supported.
(default: mfcfile)
-filelist file
(With -input rawfile|mfcfile) perform recognition
on all files listed in the file.
-adport portnum
(with -input adinnet) adinnet port number (default:
5530)
-NA server:unit
(with -input netaudio) set the server name and unit
ID of the Datlink unit.
-zmean -nozmean
With speech input, this options enable/disable
whether to remove DC offset using zero mean source.
(default: disabled (-nozmean))
-nostrip
Julian by default removes zero samples in input
speech data. In some cases, such invalid data may
be recorded at the start or end of recording. This
option inhibit this automatic removal.
-record directory
Auto-save input speech data successively under the
directory. Each segmented inputs are recorded to a
file each by one. The file name of the recorded
data is generated from system time when the input
starts, in a style of "YYYY.MMDD.HHMMSS.wav". File
format is 16bit monoral WAV. Invalid for mfcfile
input. With input rejection by "-rejectshort", the
rejected input will also be recorded even if they
are rejected.
-rejectshort msec
Reject input shorter than specified milliseconds.
Search will be terminated and no result will be
output. In module mode, '<REJECTED REASON="..."/>'
message will be sent to client. With "-record",
the rejected input will also be recorded even if
they are rejected. (default: 0 = off)
Speech Detection
Options in this section is invalid for mfcfile input.
-cutsilence
-nocutsilence
Force silence cutting (=speech segment detection)
to ON/OFF. (default: ON for mic/adinnet, OFF for
files)
-lv threslevel
Level threshold (0 - 32767) for speech triggering.
If audio input amplitude goes over this threshold
for a period, Julius begin the 1st pass recogni-
tion. If the level goes below this level after
triggering, it is the end of the speech segment.
(default: 2000)
-zc zerocrossnum
Zero crossing threshold per a second (default: 60)
-headmargin msec
Margin at the start of the speech segment in
milliseconds. (default: 300)
-tailmargin msec
Margin at the end of the speech segment in mil-
liseconds. (default: 400)
Acoustic Analysis
-smpFreq frequency
Set sampling frequency of input speech in Hz. Sam-
pling rate can also be specified using "-smpPe-
riod". Be careful that this frequency should be
the same as the trained conditions of acoustic
model you use. This should be specified for micro-
phone input and RAW file input when using other
than default rate. Also see "-fsize", "-fshift",
"-delwin".
(default: 16000 (Hz = 625ns))
-smpPeriod period
Set sampling frequency of input speech by its sam-
pling period (nanoseconds). The sampling rate can
also be specified using "-smpFreq". Be careful
that the input frequency should be the same as the
trained conditions of acoustic model you use. This
should be specified for microphone input and RAW
file input when using other than default rate.
Also see "-fsize", "-fshift", "-delwin".
(default: 625 (ns = 16000Hz))
-fsize sample
Analysis window size in number of samples.
(default: 400).
-fshift sample
Frame shift in number of samples (default: 160).
-delwin frame
Delta window size in number of samples (default:
2).
-lofreq frequency
Enable band-limiting for MFCC filterbank computa-
tion: set lower frequency cut-off.
(default: -1 = disabled)
-hifreq frequency
Enable band-limiting for MFCC filterbank computa-
tion: set upper frequency cut-off.
(default: -1 = disabled)
-sscalc
Perform spectral subtraction using head part of
each file. With this option, Julius assume there
are certain length of silence at each input file.
Valid only for rawfile input. Conflict with
"-ssload".
-sscalclen
With "-sscalc", specify the length of head part
silence in milliseconds (default: 300)
-ssload filename
Perform spectral subtraction for speech input using
pre-estimated noise spectrum from file. The noise
spectrum data should be computed beforehand by
mkss. Valid for all speech input. Conflict with
"-sscalc".
-ssalpha value
Alpha coefficient of spectral subtraction. Noise
will be subtracted stronger as this value gets
larger, but distortion of the resulting signal also
becomes remarkable. (default: 2.0)
-ssfloor value
Flooring coefficient of spectral subtraction. The
spectral parameters that go under zero after sub-
traction will be substituted by the source signal
with this coefficient multiplied. (default: 0.5)
GMM-based Input Verification and Rejection
-gmm filename
GMM definition file in HTK format. If specified,
GMM-based input verification will be performed con-
currently with the 1st pass, and you can reject the
input according to the result as specified by
"-gmmreject". Note that the GMM should be defined
as one-state HMMs, and their training parameter
should be the same as the acoustic model you want
to use with.
-gmmnum N
Number of Gaussian components to be computed per
frame on GMM calculation. Only the N-best Gaus-
sians will be computed for rapid calculation. The
default is 10 and specifying smaller value will
speed up GMM calculation, but too small value (1 or
2) may cause degradation of identification perfor-
mance.
-gmmreject string
Comma-separated list of GMM names to be rejected as
invalid input. When recognition, the log likeli-
hoods of GMMs accumulated for the entire input will
be computed concurrently with the 1st pass. If the
GMM name of the maximum score is within this
string, the 2nd pass will not be executed and the
input will be rejected.
Language Model (Finite State Grammar)
The recognition grammar can be specified in three ways:
"-gram", "-gramlist" or combination of "-dfa" and "-v".
Multiple grammars can be specified by using "-gram" and
"-gramlist". When you use these options several times,
all of them will be read at startup. Note that this is a
different behavior from other options (last one override
previous ones). You can use "-nogram" to reset the
already specified grammars at that point.
-gram gramprefix1[,gramprefix2[,gramprefix3,...]]
Comma-separated list of grammars to be used. the
argument should be prefix of a grammar, i.e. if you
have "foo.dfa" and "foo.dict", you can specify them
by single argument "foo". Multiple grammars can be
specified as comma-separated list.
-gramlist listfile
Specify a grammar list file that contains list of
grammars to be used. The list file should contain
the prefixs of grammars, each per line. A relative
path in the list file will be treated as relative
to the list file, not the current path or
configuration file.
-dfa dfa_filename
Finite state automaton grammar file.
-v dictionary_file
Word dictionary file (required)
-nogram
Remove the current list of grammars already speci-
fied by the options above.
-penalty1 float
Word insertion penalty for the first pass.
(default: 0.0)
-penalty2 float
Word insertion penalty for the second pass.
(default: 0.0)
-spmodel {WORD|WORD[OUTSYM]|#num}
Name of short pause model as defined in the
hmmdefs. In Julian, a word whose pronunciation
consists of only this short pause model is called
'short pause word', and handled especially in
recognition: even if its appearance in a sentence
is explicitly specified in the grammar, it can be
skipped while parsing. This behavior is for deal-
ing with insertion and deletion of short pause that
often appear unintensionally in user utterances.
They can be specified in a style as shown below
(default: "sp").
Example
Word_name <s>
Word_name[output_symbol] <s>[silB]
#Word_ID #14
(Word_ID is the word position in the dictionary
file starting from 0)
-forcedict
Ignore dictionary errors and force running. Words
with errors will be dropped from dictionary at
startup.
Acoustic Model (HMM)
-h hmmfilename
HMM definition file to use. Format (ascii/binary)
will be automatically detected. (required)
-hlist HMMlistfilename
HMMList file to use. Required when using triphone
based HMMs. This file provides a mapping between
the logical triphones names genertated from the
phonetic representation in the dictionary and the
HMM definition names.
-iwcd1 {best N|max|avg}
When using a triphone model, select method to han-
dle inter-word triphone context on the first and
last phone of a word in the first pass.
best N: use average likelihood of N-best scores
from the same
context triphones
max: use maximum likelihood of the same
context triphones
avg: use average likelihood of the same
context triphones (default)
-force_ccd / -no_ccd
Normally Julius determines whether the specified
acoustic model is a context-dependent model from
the model names, i.e., whether the model names con-
tain character '+' and '-'. You can explicitly
specify by these options to avoid mis-detection.
These will override the automatic detection result.
-notypecheck
Disable checking of the input parameter type.
(default: enabled)
Acoustic Computation
Gaussian Pruning will be automatically enabled when using
tied-mixture based acoutic model. It is disabled by
default for non tied-mixture models, but you can activate
pruning to those models by explicitly specifying
"-gprune". Gaussian Selection needs a monophone model
converted by mkgshmm.
-gprune {safe|heuristic|beam|none}
Set the Gaussian pruning technique to use.
(default: 'safe' (setup=standard), 'beam'
(setup=fast) for tied mixture model, 'none' for non
tied-mixture model)
-tmix K
With Gaussian Pruning, specify the number of Gaus-
sians to compute per mixture codebook. Small value
will speed up computation, but likelihood error
will grow larger. (default: 2)
-gshmm hmmdefs
Specify monophone hmmdefs to use for Gaussian Mix-
ture Selectio. Monophone model for GMS is gener-
ated from an ordinary monophone HMM model using
mkgshmm. This option is disabled by default. (no
GMS applied)
-gsnum N
When using GMS, specify number of monophone state
to select from whole monophone states. (default:
24)
Inter-word Short Pause Handling
-iwsp (Multi-path version only) Enable inter-word con-
text-free short pause handling. This option
appends a skippable short pause model for every
word end. The added model will be skipped on
inter-word context handling. The HMM model to be
appended can be specified by "-spmodel" option.
Search Parameters (First Pass)
-b beamwidth
Beam width (number of HMM nodes) on the first pass.
This value defines search width on the 1st pass,
and has great effect on the total processing time.
Smaller width will speed up the decoding, but too
small value will result in a substantial increase
of recognition errors due to search failure.
Larger value will make the search stable and will
lead to failure-free search, but processing time
and memory usage will grow in proportion to the
width.
default value: acoustic model dependent
400 (monophone)
800 (triphone,PTM)
1000 (triphone,PTM, setup=v2.1)
-1pass Only perform the first pass search.
-realtime
-norealtime
Explicitly specify whether real-time (pipeline)
processing will be done in the first pass or not.
For file input, the default is OFF (-norealtime),
for microphone, adinnet and NetAudio input, the
default is ON (-realtime). This option relates to
the way CMN is performed: when OFF CMN is calcu-
lated for each input independently, when the real-
time option is ON the previous 5 second of input is
always used. Also refer to -progout.
-cmnsave filename
Save last CMN parameters computed while recognition
to the specified file. The parameters will be
saved to the file in each time a input is recog-
nized, so the output file always keeps the last CMN
parameters. If output file already exist, it will
be overridden.
-cmnload filename
Load initial CMN parameters previously saved in a
file by "-cmnsave". This option enables Julian to
recognize the first utterance of a live microphone
input or adinnet input with CMN.
Search Parameters (Second Pass)
-b2 hyponum
Beam width (number of hypothesis) in second pass.
If the count of word expantion at a certain length
of hypothesis reaches this limit while search,
shorter hypotheses are not expanded further. This
prevents search to fall in breadth-first-like sta-
tus stacking on the same position, and improve
search failure. (default: 30)
-n candidatenum
The search continues till 'candidate_num' sentence
hypotheses have been found. The obtained sentence
hypotheses are sorted by score, and final result is
displayed in the order (see also the "-output"
option).
The possibility that the optimum hypothesis is cor-
rectly found increases as this value gets
increased, but the processing time also becomes
longer.
Default value depends on the engine setup on com-
pilation time:
10 (standard)
1 (fast, v2.1)
-output N
The top N sentence hypothesis will be Output at the
end of search. Use with "-n" option. (default: 1)
-cmalpha float
This parameter decides smoothing effect of word
confidence measure. (default: 0.05)
-sb score
Score envelope width for enveloped scoring. When
calculating hypothesis score for each generated
hypothesis, its trellis expansion and viterbi oper-
ation will be pruned in the middle of the speech if
score on a frame goes under [current maximum score
of the frame- width]. Giving small value makes the
second pass faster, but computation error may
occur. (default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value may
give more stable results, but increases the amount
of memory required. (default: 500)
-m overflow_pop_times
Number of expanded hypotheses required to discon-
tinue the search. If the number of expanded
hypotheses is greater then this threshold then, the
search is discontinued at that point. The larger
this value is, The longer Julius gets to give up
search (default: 2000)
-lookuprange nframe
When performing word expansion on the second pass,
this option sets the number of frames before and
after to look up next word hypotheses in the word
trellis. This prevents the omission of short
words, but with a large value, the number of
expanded hypotheses increases and system becomes
slow. (default: 5)
-looktrellis
Expand only the words survived on the first pass
instead of expanding all the words predicted by
grammar. This option makes second pass decoding
slightly faster especially for large vocabulary
condition, but may increase deletion error of short
words. (default: disabled)
-graphrange nframe
When graph output is enabled (--enable-graphout),
merge same words at neighbor position. If the
position of same words differs smaller than this
value, they will be merged. The default is 0 (no
merging) and specifying larger value will result in
smaller graph output.
Forced Alignment
-walign
Do viterbi alignment per word units from the recog-
nition result. The word boundary frames and the
average acoustic scores per frame are calculated.
-palign
Do viterbi alignment per phoneme (model) units from
the recognition result. The phoneme boundary
frames and the average acoustic scores per frame
are calculated.
-salign
Do viterbi alignment per HMM state from the recog-
nition result. The state boundary frames and the
average acoustic scores per frame are calculated.
Server Module Mode
-module [port]
Run Julian on "Server Module Mode". After startup,
Julian waits for tcp/ip connection from client.
Once connection is established, Julian start commu-
nication with the client to process incoming com-
mands from the client, or to output recognition
results, input trigger information and other system
status to the client. The multi-grammar mode is
only supported at this Server Module Mode. The
default port number is 10500. jcontrol is sample
client contained in this package.
-outcode [W][L][P][S][C][w][l][p][s]
(Only for Server Module Mode) Switch which symbols
of recognized words to be sent to client. Specify
'W' for output symbol, 'L' for grammar entry, 'P'
for phoneme sequence, 'S' for score, and 'C' for
confidence score, respectively. Capital letters
are for the second pass (final result), and small
letters are for results of the first pass. For
example, if you want to send only the output sym-
bols and phone sequences as a recognition result to
a client, specify "-outcode WP".
Message Output
-multigramout
Enable multiple grammar output. Usually, Julian
will search for the best hypothesis among the gram-
mars. This options will change the search to find
the best result one by one for each grammar.
-quiet Omit phoneme sequence and score, only output the
best word sequence hypothesis.
-progout
Enable progressive output of the partial results on
the first pass.
-proginterval msec
set the output time interval of "-progout" in mil-
liseconds.
-demo Equivalent to "-progout -quiet"
-charconv from to
Enable output character set conversion. "from" is
the source character set used in the language
model, and "to" is the target character set you
want to get.
On Linux, the arguments should be a code name. You
can obtain the list of available code names by
invoking the command "iconv --list". On Windows,
the arguments should be a code name or codepage
number. Code name should be one of "ansi", "mac",
"oem", "utf-7", "utf-8", "sjis", "euc". Or you can
specify any codepage number supported at your
environment.
OTHERS
-debug (For debug) output enoumous internal status and
debug information.
-C jconffile
Load the jconf file. The options written in the
file are included and expanded at the point. This
option can also be used within other jconf file.
-check wchmm
(For debug) turn on interactive check mode of tree
lexicon structure at startup.
-check triphone
(For debug) turn on interactive check mode of model
mapping between Acoustic model, HMMList and dictio-
nary at startup.
-setting
Display compile-time engine configuration and exit.
-help Display a brief description of all options.
EXAMPLES
For examples of system usage, refer to the tutorial sec-
tion in the Julian documents.
NOTICE
Note about jconf files: relative paths in a jconf file are
interpreted as relative to the jconf file itself, not to
the current directory.
SEE ALSO
julius(1), jcontrol(1), adinrec(1), adintool(1), mkdfa(1),
mkbinhmm(1) mkgsmm(1), wav2mfcc(1), mkss(1)
http://julius.sourceforge.jp/en/
DIAGNOSTICS
Julian normally will return the exit status 0. If an
error occurs, Julian exits abnormally with exit status 1.
If an input file cannot be found or cannot be loaded for
some reason then Julian will skip processing for that
file.
BUGS
There are some restrictions to the type and size of the
models Julian can use. For a detailed explanation refer
to the Julius documentation. For bug-reports, inquires
and comments please contact julius@kuis.kyoto-u.ac.jp or
julius@is.aist-nara.ac.jp.
COPYRIGHT
Copyright (c) 1991-2005 Kyoto University, Japan
Copyright (c) 2000-2005 Nara Institute of Science and
Technology, Japan
Copyright (c) 2005 Nagoya Institute of Technology,
Japan
AUTHORS
Rev.1.0 (1998/07/20)
Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
University)
Rev.2.0 (1999/02/20)
Rev.2.1 (1999/04/20)
Rev.2.2 (1999/10/04)
Rev.3.1 (2000/05/11)
Development of above versions by Akinobu LEE (Kyoto
University)
Rev.3.2 (2001/08/15)
Rev.3.3 (2002/09/11)
Rev.3.4 (2003/10/01)
Rev.3.4.1 (2004/02/25)
Rev.3.4.2 (2004/04/30)
Development of above versions by Akinobu LEE (Nara
Institute of Science and Technology)
Rev.3.5 (2005/09/30)
Development of above versions by Akinobu LEE
(Nagoya Institute of Technology)
THANKS TO
From rev.3.2, Julian is released to the member of the
"Information Processing Society, Continuous Speech Consor-
tium". From rev.3.4, Julian becomes an open-source prod-
ucts incorporated with Julius.
The Windows Microsoft Speech API compatible version was
developed by Takashi SUMIYOSHI (Kyoto University).
LOCAL JULIAN(1)
$Id: julian.html.en,v 1.1.1.1 2007/01/10 08:01:57 kudravka_ Exp $