I found your description of how the metadata and browse structures
are stored in GDBM and the search indexes in MG very interesting.
I wonder if I can take advantage of that to solve a problem I have:
I have a collection of periodical issues that use the BookPlug to
break each issue up into articles. To create browse structures on
metadata that describe individual articles I use a modified version
of the AZCompactList classifier (modified, for example, to use the
classify_section method instead of the classify method). Anyway,
it takes a very long time (about 12 hours on a 2-processor Linux
server) to build this 700 document collection and the bulk of that
time is in building these classifiers.
Likely the poor performance is due to some change I made interacting
with the recursive nature of the algorithm (I get lots of "classify
called multiple times" warnings). Hopefully an AZCompactList.pm
that works with document sections will be in the next version of
Greenstone. But in the meantime I need to rebuild this collection
often because it is under development.
Do you think I could set up a process that nightly rebuilt the
search indexes and simple (issue-level) browse structures, and
continued to use the old AZCompactList classifiers until maybe each
weekend when I could rebuild the whole thing?