Some Early Thoughts

Library by fenraven
Library by fenraven

Armenian Library project continues to move forward. Today I posted on its Facebook page that Google had a library partner program but I didn’t see them prioritizing Armenian content any time soon. They ask the libraries to fill out a form and all they say is that at some point in the future they may contact the libraries. The Facebook post resulted in a very fundamental question (thank you for asking that question!). What do you propose to do about that… was the question. Following are some of my early thoughts. The vision is very clear in my mind but I welcome all feedback on this subject from anyone related to any of the Armenian libraries. Please keep in mind my overall objective to expand the Armenian language section of the Internet – the searchable and indexed content that’s in Armenian accessible to all who want to find it.

I propose (and inch by inch I am working on this as part of the Armenian Library project):

1. We establish a working group of interested parties (such as librarians of Armenian libraries around the world). I would very much welcome a group of advisers to guide my efforts related to the Armenian Library. I am not aware of any current such working groups (and it could just be me being new in this space) but I am also not aware of any major scale resource sharing among libraries (again this may already exist). The group would need to meet quarterly to discuss issues such as the proposals below. The library content is only one part of Armenianizing the Armenian section of the web.

2. We develop and deploy a cloud based unified Armenian card catalog based on modern standards such as Unicode and a number of modern database technologies. The current catalogs I have seen are using transliterated characters making it impossible to search for a book name or keywords using Armenian language search tools. This may involve a number of large sub-projects such as linking up records with existing transliterated records and conversion of existing records to proper Armenian language. The objective is to enable search in Armenian for content in Armenian. There may already be an Armenian computerized card catalog in Armenia (in case we can reuse). However, the card catalog I have in mind would be multi-tenant (in other words, many book owners could participate in the catalog). We can also research what would be needed to take advantage of existing Internet based catalogs (based on their stated missions and governance – the last thing anyone wants to do is work hard with a cloud provider to find that they have conflicting objectives or motives).

3. We separate Armenian content from Google and any other centralized library project unless their mission, vision, objectives and governance are clearly stated and do not conflict with any of the objectives of preserving and promoting the Armenian language and the content and any of the missions of the various Armenian libraries. As people, we should be able to develop and maintain our own repository of our content. However, we should not expect every single library to build or buy its own catalog in the next decade. (There is a significant shift in all computing systems from on-premise to cloud based services. We need to help libraries embrace that shift in the Armenian space.) We wouldn’t want to outsource Մատենադարան to an օտար organization. This is no different. The challenge is how to fund shared resources but I believe where there is a will there is always a way (to be addressed in a future post as the business plan comes together).

4. We promote the use of digitization technology at each library (such as the scanner I have developed from open source hardware) to upload all scanned content to the shared cloud with various flags. For example, all content would be “owned” by a library (that could be a personal library, a large library or a small library). The content owner would determine whether the content should be public or private. For example, a very private library would upload their content and mark it all private – essentially using the cloud resource as an offsite backup/archive/research facility/card catalog. An Armenian who has many books and is interested in digitizing content would also be able to create an account and upload all her content to the shared library and mark it as public or private. There could be a number of options such as automatically submitting content to public flag review when copyright is cleared. Each owner would decide their preferences. The driving principle here is that each library would own its content even when that content is in the cloud while giving the central organization ability (and the right) to perform OCR, content analysis, indexing of content and ability to find the content (again assuming the owner is interested in being discovered). For example, someone searches for “Արեգակնային Համակարգ”, and there are a thousand books where the words are mentioned (this would require OCR of the content), but half are in private libraries, and a quarter are in public libraries that wanted to keep their specific pieces of content (for copyright or other reasons such as encouraging personal visits to their libraries), the system would return 250 results of public books in public libraries and links to 250 resources at public libraries where the resources are marked private. There would be many additional metadata related options but the focus is content in Armenian – although support of non-Armenian content related to Armenia would also be relevant. To make it easy for all types of libraries, even non-Armenian content unrelated to Armenia would likely need to be allowed to prevent the Armenian libraries from having to utilize multiple resources.

5. There would need to be a few volunteer based resource pools. For example, OCR results by system may need to be proofread (this depends on how successful I can be on the current stage of the project). There would need to be content review boards or teams (that would review whether content should be allowed to be public – to insulate the library from potential copyright violations). There would need to be a group of system maintainers (this I believe must be a volunteer group of people who deeply care about the cause). There would need to be a governance board to make decisions about the overall project and an advisory board of experts to help guide the management.

6. We develop a business plan for funding and cost control to ensure the mission is properly funded through donations and contributions. I am against charging any Armenian for any part of the access to the content. There may be ideas such as bookstores for newly written content or fundraising activities or organized volunteer opportunities such as “scanathons” (term coined by my beautiful wife). The business plan may be the most challenging aspect of this effort given the way Armenian communities work. One idea is to charge fees to non-Armenian libraries interested in capturing their content to fund the efforts related to Armenian content.

7. Ensure everyone understands this is an effort based on inclusion and cooperation. There are already projects in Armenia and elsewhere to digitize content and we need to respect the tremendous progress they have made, learn from them, and support them in their efforts while also enabling libraries that are not in any position to undertake mass digitization efforts to do so with our support and tools.

 

On the technical front, the proposal is to develop the following:

1. Book scanner (prototype exists and works fairly well, 4 seconds per page). All open source and open to public.

2. Develop OCR capabilities for Armenian. Must be all open source and free to public. The OCR must be enhanced by dictionary check (spell check) and a number of other screens. The objective is to end up with no more than 3 errors per 1000 letters (very hard objective).

3. Develop central cloud based card catalog and book storage systems with corresponding web interfaces.

4. Enable content for indexing by search engines such as Google and Bing.

5. Perform content analysis to develop language tools for data entry (such as completely free and open source spell checking dictionaries)

6. Submit requests to major related vendors to include the data entry tools.

7. Develop and distribute a physical Armenian keyboard for computers to make it easier to maintain Armenian content in Armenian.

This is all for now.  There are many more ideas and requirements but those would become possible only after the major chunks of the initial effort are completed. For example, all content could be tagged by location or names (show me everything about Adana or about Petrossian family, etc.).

I strongly encourage your comments and want you to know that I appreciate your feedback very much. This is a long term undertaking. There will be many impediments along the way but united we can overcome them all.

Leave a Reply

%d bloggers like this: