Locked History Actions

Diff for "Segment"

Differences between revisions 6 and 7
Revision 6 as of 2012-05-17 19:53:43
Size: 953
Editor: MichalLenart
Comment:
Revision 7 as of 2012-05-17 19:54:35
Size: 977
Editor: MichalLenart
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
See docs directory in the Segment package file. For detailed info see the docs directory in the Segment package file.
Line 13: Line 13:
== Using as sentence splitter == === Using as sentence splitter ===

Segment

Segment program is used to split text into segments (sentences, paragraphs, words). Split rules are read from file in XML based Segmentation Rules Exchange (SRX) standard format. Can be used as a programming library.
Homepage: http://sourceforge.net/projects/segment/

Documentation

For detailed info see the docs directory in the Segment package file.

Using as sentence splitter

In segment main directory type:

./bin/segment -l <language_code> -s <SRX_file>

For example:

./bin/segment -l pl -s sample.srx

This way the program will read data from stdin and write sentences to stdout - each one in separate line.

Recommended resources: