Monday, September 22, 2008

Update #Next

Had meeting with SP yesterday afternoon. She wanted to know if my final products were going to be images or text; she is concerned, and thinks the committee of officers will be concerned, about the end functionality of the digitization project. She poses the question, "How usable do we want this to be," referring to whether relying on well-tagged items will be enough, or if everything should be text searchable. It occurs to me as I write this that argument #1 for sticking to the original plan of scanned images is that if you want it to be a good resource, you have to have good metadata. Obviously I am no expert on weaving terminology, so that would have to come from them (subjects, keywords, etc).

So, my current tasks are:

1. Create a "talking points" memo covering pros and cons of images vs. text; available, effective OCR software; what constitutes a "file" and a "document," and what standard file size, I guess, is; what will be necessary for the WSSA e.g. server space, a dedicated guild computer, etc. I think there is something else I need to cover in this paper but my notes are at home.

2. Figure out what sort of software to use for the collection. Fedora is another option I have been looking into; its advantage over greenstone is that imaged scans, such as the pages of a journal, are clearly associatable with their parent images in a hierarchical layout. Also, I have been looking at the documentation for both and it will not be easy but I think I have a better chance of getting fedora up and running than greenstone. Possibly.

3. Start the scanning so that at least I can get that part going, so that I have something to work with as soon as everyone comes to a conclusion about point 1.


Re: OCR research, I am going to go into the Rebecca wiki because I remember Russell posting results of his investigations into finding online OCR resources; also Liza told me there is Adobe OCR software on the lab computers.

0 Comments:

Post a Comment

<< Home