Attachment 'README.txt'
Download 1 Spejd 1.3.6
2
3 Copyright (C) IPI PAN, 2007-2012. All rights reserved.
4 Available under the terms of the GNU General Public License;
5 see the COPYING file for details.
6
7 ABOUT
8
9 Spejd is a shallow parser, which allows for simultaneous syntactic
10 parsing and morphological disambiguation, developed at the
11 Institute of Computer Science, Polish Academy of Sciences, Warsaw.
12
13 Spejd homepage:
14 official: http://zil.ipipan.waw.pl/Spejd/
15 sourceforge (with main bugtracker): http://sourceforge.net/projects/spejd/
16
17 Author:
18 Bartosz Zaborowski [bartosz.zaborowski@ipipan.waw.pl]
19
20 The current implementation is based on ideas and work of:
21 Aleksander Buczyński
22 Aleksander Wawer
23 Adam Przepiórkowski
24 Bartosz Zaborowski
25 Aleksander Zabłocki
26
27 REQUIREMENTS
28
29 * Windows version:
30 Windows XP SP3 or newer
31 40 MB of hard drive space
32 * Linux static binaries:
33 GNU/Linux 2.6.9 or newer (64bit for the 64bit version)
34 15 MB of hard drive space
35 * Source:
36 POSIX operating system (or POSIX libraries for Windows/MinGW)
37 Either GNU make or some other building environment (you have to set it up manually)
38 C and C++ compiler
39 Dependencies meet (see bellow)
40
41 Dependencies (in parentheses versions that were tested):
42
43 POSIX system or Windows with libraries implementing POSIX
44 (at least posix-threads and iconv needed)
45
46 zlib compression library (1.2.5)
47 ICU unicode library (4.6.1, 4.8)
48 boost library (at least regex and iostreams) >= 1.40 (1.46)
49 (1.42.0 probably has a bug in gzip compressor, don't use it)
50
51 rt library (for realtime clock)
52 libxmlpp library (2.32.0)
53 morfeusz morphological analyzer library (0.82/20110416),
54 (you can get it from http://sgjp.pl/morfeusz/dopobrania.html)
55
56 Optional dependencies:
57 google perftools (tcmalloc library) (1.6) - highly recommended
58
59 pantera tagger (you can get it from http://code.google.com/p/pantera-tagger/)
60
61 INSTALLATION
62
63 * Windows binary:
64 The Windows binary version comes with installer. You have to execute it and follow
65 the instructions.
66
67 * Linux binary:
68 Just unpack the package somewhere, it is ready for use.
69
70 * Source:
71
72 Installation from source is similar to many other packages. In the simplest
73 case the shell commands `./configure; make; make install' should
74 configure, build, and install Spejd (the last command may require
75 superuser privileges). For standard installation options see INSTALL file.
76
77 The Spejd's configure script accepts the following nonstandard options:
78
79 --enable-static-spejd
80 This option causes the spejd executable to be linked statically.
81 It probably will not be portable if the Spejd is compiled
82 with pantera support (since it installs some other data, not only the library).
83 Disabled by default.
84
85 --with[out]-pantera
86 With this option you can disable or enable the Pantera tagger support.
87 By default configure will check the system for existence of this package.
88
89 --with[out]-tcmalloc
90 This option allows to disable custom allocation methods from
91 google perftools. It is optional, but recommended. It can speedup
92 the Spejd by approx. 10% and it gives more reliable memory management/limits.
93 The default is to build with tcmalloc.
94
95 --with-libicu-prefix=DIR
96 Causes to search for the ICU library in DIR. Precisely: icu-config
97 will be searched in $PATH and also DIR/bin. It will be executed with
98 --prefix=DIR parameter. By default icu-config is searched for in
99 the PATH environment and it will look for library in its default
100 localizations. This option is useful probably only if you have
101 installed ICU in nonstandard localization, like $HOME.
102
103
104 Compilation from source on Windows:
105
106 MinGW:
107 Starting from 1.3.3 the source package is mingw-compatible. After meeting the
108 dependencies you should be able to compile it and install like under UNIX-es.
109 In some cases you have to provide manually:
110 --with-boost-prefix= argument (pointing to place where libboost_*.dll-s are)
111 PKG_CONFIG environment variable set to full path to the pkg-config.exe
112
113 Since there is no icu-config script in windows icu distribution, currently
114 path to the icu library is hardcoded (/mingw/lib).
115
116 Other environments:
117 Under Unixes the configure script generates config.h file. Under Windows the building
118 environment must be manually set and special config.h file must be used -
119 it is in the package under the name config.h.win. Just remember to rename it to config.h.
120
121 BASIC USAGE
122
123 Spejd is a command line tool. It doesn't have any graphical interface.
124
125 The basic invocation command is following:
126
127 spejd [-c <config file name>] <paths to input files or dirs>
128
129 For more detailed description of usage see doc/getting_started.txt
130
131
132 INPUT AND OUTPUT FORMATS
133
134 Spejd currently can read plain text and XCES and TEI P5 corpus xml formats
135 (as in IPIPAN Corpus and National Corpus of Polish respectively).
136 There is a configuration option for setting input format, however
137 for most cases they can be automatically recognized.
138
139 Spejd can write output in two formats: XCES and TEI. The output format is
140 set in configuration file.
141
142 For detailed description of supported formats refer to the doc/manual.pdf file.
143
144 DOCUMENTATION
145
146 The main documentation is included in the package in doc/manual.pdf file.
147 The typical usage is documented in doc/getting_started.txt file.
148 Additional documentation can be found in example files (examples directory) in
149 comments. They describe configuration, tagset definition, dictionaries and
150 rules syntax.
151
152 EXAMPLES
153
154 See examples directory in the package for an example of config.ini
155 (self explained in comments) and other files including example input.
156
157
158 FOR DEVELOPERS
159
160 Feel free to play around with the sources,
161 modify them and post patches on Spejd's bugtracker at sourceforge.
162
163 Starting from version 1.1 spejd engine is compiled as a shared library,
164 which can be used from other applications. See src/spejd.h for
165 more information (in windows binary package - devel/spejd.h).
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.