Re: Digital camera archiving + Greenstone

From John R. McPherson
DateMon, 18 Feb 2002 11:21:41 +1300
Subject Re: Digital camera archiving + Greenstone
In-Reply-To (200202180200-DAA05882-geri-narc-com)
nlin@nlin.net wrote:
>
> Hello,
>
> I just discovered the Greenstone system and have a question about its applicability
> in my particular circumstance. I am looking to digitize around 20000 pages of
> material with a 2 megapixel digital camera. The raw output format is JPEG.
> >From what I understand 2 megapixels is not quite sufficient for OCR purposes,
> so I will NOT be using OCR.
>
> I was wondering if Greenstone is suitable for this purpose - digitzing 20000
> text pages without OCR. I am aware that I will not be able to search the texts.
> This is fine with me. Is Greenstone a good tool for working with 20000 digitized
> images of text pages? Are there any tips about how to use Greenstone in the
> most effective way when dealing with images of text?

Hi,

Firstly, I don't see why OCR couldn't work with a 2MP camera. The
most important thing would be that the pictures are taken completely
level, or else the lines at the bottom are wider than the lines at
the top or vice versa. (Speaking from experience...)

Secondly, I think you would need some kind of textual data for
navigation purposes - either document text (via OCR) or metadata.
I think this would involve manual entry of metadata, which does not
sound like a trivial undertaking with 20,000 images. I'm guessing
that you won't be able to use filenames either, as digital cameras
generally give the images sequential names.

You could try with a few images to get some idea of whether or not
the navigation features of Greenstone are adequate, either using
1) manually-entered metadata via the "metadata.xml" file
(which I think is dicussed in the user guide, or online at
http://nzdl2.cs.waikato.ac.nz/cgi-bin/library?a=d&c=gsdldocs
&cl=search&d=HASH01bfddd10446dba2720c23cb
[all one word])

or
2) The Image plugin, which I think won't really be useful for
images of text. (It automatically creates thumbnails). It
would also require some metadata anyway.

John McPherson