Revision 13 as of 2012-05-17 19:59:43

Clear message
Locked History Actions

Segment

Segment

Segment program is used to split text into segments (sentences, paragraphs, words). Split rules are read from file in XML based Segmentation Rules Exchange (SRX) standard format. Can be used as a programming library.
Homepage: http://sourceforge.net/projects/segment/

Documentation

For detailed info see the docs directory in the Segment package file.

Using as sentence splitter for Polish

In segment main directory:

./bin/segment -l pl -s sample.srx

This way the program will read text from stdin and write sentences to stdout - each one in separate line.

Recommended resources: