Posted by yatsu Mon, 04 Jun 2007 07:00:13 GMT
ruby-pytstのソースが行方不明になっていたので、RubyForgeに登録しました。
初めてRubyForgeに登録したのですが、WikiとSVNブラウザがエラーになります。なぜでしょうね?
ruby-pytstは、Wiki文章中のキーワード一括置換のために毎日使用していますが、半年以上、安定動作しています。
以下、READMEを載せておきます。
ruby-pytst
ruby-pytst is a port of pytst to Ruby. It is a implementation of Ternary Search Trie (TST) and it also supports scanning by the Aho-Corasick algorithm. It is built with SWIG.
This software is distributed under LGPL.
I have successfully built on the following environments.
- (1) Mac OS X 10.4, Ruby 1.8.4 (Fink)
- (2) Fedora Core 5, Ruby 1.8.4
tst_wrap.cxx in this package was generated on (1) with SWIG 1.3.19.
Author: Masaki Yatsu <yatsu at yatsu.info>
Installation
Generate the Makefile.
% ruby extconf.rb
Make it.
% make
Then, install it as root.
% make install
If you want to generate the Ruby wrapper codes from tst.i by yourself, execute the following at the beginning.
% swig -c++ -ruby -Iinclude tst.i
How to Use
There is no documentation at this point, but there is README.html of pytst in the doc directory.
There are examples in the example directory. You can run examples like this.
% ruby example/tokenize.rb
ruby-pytst supports following character sets.
- ASCII ($KCODE = ‘NONE’)
- EUC-JP ($KCODE = ‘EUC’)
- Shift_JIS ($KCODE = ‘SJIS’)
- UTF-8 ($KCODE = ‘UTF8’)
You have to specify $KCODE before you create a Pytst::TST.
There is the EUC-JP example in the example directory.
% ruby -Ke example/japanese_euc.rb
The option -Ke means $KCODE = 'EUC'.
See also
pytst:
http://nicolas.lehuen.com/download/pytst/
Ternary Search Trie:
http://en.wikipedia.org/wiki/Ternarysearchtries
Aho-Corasick algorithm:
http://en.wikipedia.org/wiki/Aho-Corasick_algorithm
