Installing Article Index

I was recently asked, “I am looking at your Article Index and really like it. I’m not really a perl writer, but I can modify existing perl. What steps do I need to take to make a version of open journal access articles on my site?” Below is my attempt to answer the question.

Article Index demonstrates how MyLibrary can be used as an OAI-PMH service provider against content of the Directory of Open Access Journals (DOAJ). Here’s how to get it to work for you:

  1. Install MyLibrary - Describing this in detail is beyond the scope of this document. For more detail, see the online instructions.
  2. Download Article Index - A tarball of the entire application is available at http://mylibrary.library.nd.edu/download/article-index-2008-10-14.tar.gz.
  3. Create a new MySQL database - Use the file named etc/mylibrary-mysql-schema.sql to initialize an additional database. Again, this is documented as a part of Step #1. If you want to, you can use the file named etc/article-index.sql instead. It contains about 77 MB of sample data.
  4. Create a new MyLibrary instance - Use the program called config_mylibrary.pl, found in the MyLibrary distribution, to create a MyLibrary instance. For this implementation we suggest an instance named “articles”.
  5. Harvest data - If you did not use etc/article-index.sql to create your database, then run bin/doajarticles2mylibrary.pl. This program will use OAI-PMH to harvest article-level content from the DOAJ and insert the metadata into your MyLibrary instance. Be patient. The process is not zippy.
  6. Browse data - Once you get this far you can use a terminal-based interface to see what is in your collection. Get started by running bin/main-menu.pl.
  7. Index data - To make the harvested content searchable you need to index it. This is done with the program bin/index.pl. The program will loop through each record in the MyLibrary articles index, extract the metadata, and save the result in etc/index. Again, the process is not zippy. Once finished you should be able to use bin/search.pl to apply rudimentary queries against the index.
  8. Put it on the Web - You are almost done. Put this entire distribution in your HTTP file system and make index.cgi executable. You should now be able to connect to your Web server, browse the collection, and search the index. If you want to change the user interface, then edit etc/template.txt because it defines the look & feel of the Web interface.

MyLibrary: A Digital library framework & toolkit

I recently had published an article in Information Technology and Libraries entitled “MyLibrary: A Digital library framework & toolkit“. From the abstract:

This article describes a digital library framework and toolkit called MyLibrary. At its heart, MyLibrary is designed to create relationships between information resources and people. To this end, MyLibrary is made up of essentially four parts: 1) information resources, 2) patrons, 3) librarians, and 4) a set of locally-defined, institution-specific facet/term combinations interconnecting the first three. On another level, MyLibrary is a set of object-oriented Perl modules intended to read and write to a specifically shaped relational database. Used in conjunction with other computer applications and tools, MyLibrary provides a way to create and support digital library collections and services. Librarians and developers can use MyLibrary to create any number of digital library applications: full-text indexes to journal literature, a traditional library catalog complete with circulation, a database-driven website, an institutional repository, an image database, etc. The article describes each of these points in greater detail. 

The folks at ITAL are gracious enough to allow authors to distribute their work on the Web as long as the distribution happens after print publication. “Nice policy!”

MyLibrary vs. Primo

On the mylib-dev mailing list a person saw a press release regarding Notre Dame’s acquisition of Primo, and they asked, “There is concern from some on-campus here that this announcement means the end of future development of MyLibrary. What can I tell them?” I thought I’d echo my reply, below.

As Rob said, there are not any plans to discontinue the work/development of MyLibrary. MyLibrary is alive and well.

We here at Notre Dame we use MyLibrary to drive much of our website. [1, 2, 3] It works. It does what it is suppose to do. We also use it to manage a couple of other digital library thingees. For example, it drives a FAQ for our reference department [4], and we expect to expand this beyond the reference department. We also use it to drive much of the fledgling “Catholic Portal” [5].

MyLibrary is getting tweaked

MyLibrary is getting tweaked. For example, more granular recommendations against information resources will be implemented in order to satisfy a need for our upcoming website redesign. I believe librarians will be able to prioritize recommended resources in the form of a numbered list. “This is the most important resource. This is the second most important resource. Etc.” Work is also being done in regards to a statistics module. Count how many times a resources has been used and by what type of person. This will enable us to answer questions such as, “What resources is most popular?” and “What resources are used by people like you?” I hope these enhancements will be finished by the end of the summer. (Famous last words.)

MyLibrary is not a turn-key application

MyLibrary 3.x is a long way from MyLibrary 2.x. The later version is not a turn-key application; it is a true object-oriented module. I think this scared many would-be MyLibrary adopters away. To use MyLibrary 3.x there needs to be true co-operation between librarian and developer. This gap is sometimes difficult to bridge. Moreover, the implementation of facet/term combinations (which just so happened to pre-date the current fascination with “faceted browsing” in our “next-generation” library catalogs) is really rather foreign to many people. Most of us are used to LCSH and the like. I allude to many of these ideas in an upcoming article that will be appearing in Information Technology & Libraries (ITAL) sometime in September. I will also elaborate on these ideas at the Access Conference in October.

MyLibrary vs Primo

Finally, MyLibrary provides complementary services when compared to “next-generation” library catalog things such as Primo, VUFind, and AquaBrowser. Technically speaking, MyLibrary is a Perl-based API allowing the librarian and developer to read/write to a specifically shaped database. This this sense, MyLibrary is digital library framework and toolbox. Librarians and developers are expected to fill the MyLibrary database with content, write reports against the database, and thus facilitate digital library collections and services. As they are being touted and implemented, “next-generation” library catalog applications are essentially indexes with enhanced services applied against them. Aggregate content. Index it. Provide access to the index (search), and provide additional services against the index (Did you mean?, faceted browse, relevancy ranked output, tagging, etc.). As a database application, MyLibrary purposely does not support search very well. To support search librarians and developers are expected to pipe their content to an indexer like swish-e, Kinosearch, or Lucene — the indexers supporting most “next-generation” library catalog systems. On the other hand, “next-generation” library catalogs do not support the finely grained data creation and maintenance operations a database can support, nor can they have the flexibility to create the myriad of reports that MyLibrary can generate. MyLibrary and things like Primo each have their own strengths and weaknesses. Neither one is a silver bullet for implementing the sorts of information services our patrons increasingly expect.

In short, MyLibrary is more like a database application. “Next-generation” library catalog systems are more like indexes. Databases and indexes are two sides of the same information retrieval coin.

Links

[1] Subject page - http://www.library.nd.edu/subjects/
[2] Formats page - http://www.library.nd.edu/formats/
[3] Tools page - http://www.library.nd.edu/research_tools/
[4] FAQ - http://www.library.nd.edu/reference/faq/
[5] Catholic Portal - http://www.catholicresearch.net/

MyLibrary version 2.63: A Blast from the past

There was a request recently to make re-available an older version of MyLibrary, version 2.63. I have rooted through my archives, found a copy, and uploaded it to the local download directory. A blast from the past.

Flexibility of open source software: An example

I have been using MyLibrary is manage the content of a thing called the “Catholic Portal“, and because MyLibrary is open source I have greater flexibility in my development. Here’s an example.

The Portal is expected to ingest both EAD and MARC records. As these files are submitted I have been stuffing their entire contents into a MyLibrary::Resource object. After all, I know a lot of metadata about the file(s): name/title, date, creator, etc. As for the contents of the file itself I use the Resource object’s note method.

Unfortunately, and by default, the underlying database note field is defined as text. This means my data values can be no longer than 64K in size. Alas, some of my EAD and MARC records are larger than that.

No problem. Dump the database schema. Change the definition  of the text field to mediuimtext (16MB). Reinitialize the database. Begin again. No fuss. No muss. The MyLibrary API continues without a hitch. Alternatively, I could have used an SQL ALTER TABLE command to edit thing inline.

Actually, I used this hack once before, specifically, in my Alex Catalogue of Electronic Texts. There too I wanted to stuff the entirety of  an electronic text into a MyLibrary::Resource. In this case I upped the value to longtext (4GB) because, believe it or not, some of those electronic texts were huge!

The fun and power of open source software! 

simple, terminal-based scripts

I have made a number of simple, terminal-based scripts available for downloading from the local archive. From the distribution’s README file:

This distribution includes a set of terminal-based scripts to
manage a MyLibrary instance. Its primary purpose is to
demonstrate how to use some of the MyLibrary modules.

As such, the scripts allow you to manage facets, terms,
locations, resources. When it comes to the first three, they
allow you to create, find, edit, and delete them.

When it comes to resources, the scripts allow not only creating,
finding, editing, and deleting, but it also allows you to import
data via OAI from the Directory of Open Access Journals as well
as the Infomotions Image Gallery. They also support exporting
content as RDF, indexing the content with swish-e, and searching
the swish-e index.

Hackers will want to also take a look at the file named
subroutines.pl because it contains a number of things used in
just about every script.

Finally, remember, MyLibrary is not really about resources.
Instead, it is about creating relationships between people
(patrons and librarians) and resources. Morover, it can also be
about creating relationships between people and people, or
resources and resources. All of this is done by: 1) creating a
set of facet/term combinations, 2) assigning the combinations to
things (people or resources), and finally, 3) creating reports
from the database against those things. A traditionan library
subject page is just one example of a report. Sets of RDF files
are another. Content to be indexed is a third. This list is only
as long as your imagination combined with the principles of
librarianship.

Here’s what you will find in this distribution, and most of them
can be run from the command line without the need of the menu
program:

  • main-menu.pl - a rudimentary menu
  • manage-facets.pl - create, find, edit, and delete facets
  • manage-terms.pl - create, find, edit, and delete terms
  • manage-locations.pl - create, find, edit, and delete locations
  • manage-resources.pl - create, find, edit, and delete resources as well as report on them
  • doaj2mylibrary.pl - import DOAJ content into MyLibrary
  • images2mylibrary.pl - import Infomotions Image Gallery into MyLibrary
  • index-resources.cfg - a swish-e configuration file
  • resources2swish.pl - read resources and export for swish-e
  • index-resources.pl - index resources with swish-e
  • search.pl - query a swish-e index and display results
  • resources2rdf.pl - dump resources as simple RDF/XML
  • subroutines.pl - commonly used functions throughout the system
  • README - this file
  • LICENSE - the GNU Public License

Enjoy, and happy hacking.

MyLibrary and OAI-ORE

Hmm… Maybe something could be hacked to MyLibrary and OAI-ORE work closely together.

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) posits a thing called a Resource Map (ReM). These ReMs are used to bundle together sets of Internet resources. Once created these bundles are expected to be discoverable through ATOM, RSS, RDF, and/or microformats embedded in things like HTML files. Once discovered the ReMs can be harvested and used to create value-added services. For example, an ReM could be created for an issue of a journal article. Create the ReM. Make it accessible. Download it. Parse it. Harvest the actual data (not just the metdata). Index it. Repeat for other ReMs. Other ReMs might be bundles for TEI files and associated images. A person could bundle up a whole library website or special collection exhibit.

ReMs look a whole lot like ATOM feeds. As alluded to previously, reports written against a MyLibrary database could be in the form of ATOM feeds. This leads me to believe MyLibrary content could (easily) support OAI-ORE. Implement MyLibrary. Fill it up with content either manually or automatically. Make its content available in any number of Web Services: RSS, ATOM, REST, and now OAI-ORE.

Food for thought.

MyLibrary is about relationships

The attached image illustrates that MyLibrary is about relationships. You first design a set of facet/term combinations, and then you apply them to people (librarians and patrons). Through these associations MyLibrary enables relationships such as: my resources are, the things I curate are, people who are in my class include, resources for my class include, my library is, my patrons are, resources for this discipline of study include, because I’m a faculty member I have access to X, Y, & Z.

MyLibrary is about creating relationships.

 

MyLibrary in Code4Lib Journal

By the way, MyLibrary was briefly reviewed in Issue #2 of Code4Lib Journal. From the text:

…Today, MyLibrary is now much more than simply a subject guide portal. While creating dynamic Web pages, such as subject guides, is still one of the major features of MyLibrary, it is now a full-fledged digital library framework and toolbox. Librarians have various options to import content into the system. Aside from accepting manual data-entry, MyLibrary can import MARC data and OASI-accessible data. Content from the system can be syndicated into institutional-wide portals and other systems, and users can also personalize pages in “my library” that are akin to MyYahoo, iGoogle, etc. MyLibrary is well documented, as evidenced by a 200+ page, two-volume manual. MyLibrary is written in Perl and requires MySQL. It is licensed under the GNU General Public License (GPL) Version 2.

Lot’s of good stuff there at Code4Lib Journal, but hey, I’m biased; I’m on the editorial board. Check it out anyway?

Your Page

A couple of months ago Chris Gray brought to my attention the fact that the MyLibrary Your Page demonstration was broken. Well, I (finally) got around to fixing it. Please try:

http://mylibrary.library.nd.edu/demos/your-page/   

There you will be presented with a list of famous intellectuals who you can impersonate. Once logging in using their credentials you are presented with:

  1. a set of recommended resources based on their course of study
  2. the names and email addresses of their subject-specific librarian
  3. a list of the courses they are enrolled in and some recommended resources for each course
  4. a list of the books they have checked out

The process of creating such a page is not difficult:

  1. Create sets of information resources and “catalog” them in MyLibrary.
  2. Acquire the names of students and “catalog” them in MyLibrary.
  3. Acquire the classes give across the University and “catalog” them in MyLibrary.
  4. Query MyLibrary and “join” students, classes, and information resources through their shared facet/term combinations.
  5. Display the result.
  6. Go to Step #4 for each person who logs in.

The source code for this demonstration is online. I created a very similar application – Sonne of Your Page – except it contains real data. Real people and real class rosters. This is not a production service, and you will not be able to access it unless you are a member of the University of Notre Dame community. The resulting pages look a lot like this:

sonne of your page

 Fun with MyLibrary.