Table of Contents CITRI Last change: 1994/11/12 1 ...
Table of Contents

CITRI Last change: 1994/11/12 1

USER COMMANDS mgmerge(1)


mgmerge - update an mg system database with new documents


mgmerge [ -c ] [ -g get ] [ -s source ] [ -S ] [ -w ] collection-name


mgmerge is a csh script that executes all the appropriate programs in the correct order to completely merge a current mg(1) system database ready with some new documents, saving the need for a complete database rebuild with mgbuild. It does this by building a second mg(1) system database from the new documents and then merging this database with the old one. This program makes use of the mg_get_merge(1) script to obtain the text of the collection. mgmerge cannot edit or delete documents already in a database; only new documents can be added.


Options can occur in any order, but the collection name must be last.

This specifies whether the get program is "complex". If a get program is "complex", then it requires initialisation and cleanup with the -i and -c options.

-g get
This specifies the program to use for getting the source text for the build. If no -g option is given, the default program mg_get_merge(1) is used.

-s source
The mgmerge program consists of two parts. The first part initializes variables to default values. The second part uses these variables to control how the mg(1) database is built. This option specifies a program to execute between the first and second parts. The details of what the variables are, and how they may be changed, are in comments in the mgmerge program. If this option is used, the program should ideally be the same one that was called by mgbuild(1) to build the database since some parameters need to be consistent between mgbuild and mgmerge.

This option will cause a slow merge to be performed on the inverted files, where each inverted file entry is decoded and recoded. The default is a fast merge. Accumulated

CITRI Last change: 1994/11/12 1

USER COMMANDS mgmerge(1)

fast merges slowly degrade compression performance on the resulting inverted file so a periodic slow merge is recommended.

Adding new documents can have an effect on the weight of the previous ones. By default the weights for documents already in the collection are not recomputed since the change in their values is usually small. This option forces new weights to be recomputed. Periodic use of this option, as for the "-S" option, is recomended, otherwise query rankings may become inaccurate.

collection-name This is the collection name, as required by the mg_get_merge(1) program. It serves both as a case statement selector, and as the name of a subdirectory that holds the indexing files.


MGDATA If this environment variable exists, then its value is used as the default directory where the mg(1) collection files are. If this variable does not exist, then the directory "." is used by default. The command line option -d directory overrides the directory in MGDATA. Note that a temporary directory under the MGDATA directory is used to perform the merge. The default name for this directory is MERGE.


Inverted file.

Compressed stemmed dictionary.

*.invf.dict.blocked The `on-disk' stemmed dictionary.

The index into the inverted file.

The exact weights file.

Compressed documents.

Text statistics.

Compressed compression dictionary.

Index into the compressed documents.

Interleaved index into the compressed

CITRI Last change: 1994/11/12 2

USER COMMANDS mgmerge(1)

documents and document weights.

Approximate document weights.


mg(1), mgbuild(1), mg_get_merge(1), mg_get(1), mg_invf_merge(1), mg_text_merge(1), mg_query(1), mg_weights_build(1).

CITRI Last change: 1994/11/12 3

Table of Contents