Queequeg's Content Saloon: Internet Librarian day 1: preconference workshop & wine

Implementing Federated Searching and Open URL-based Linking Services
Presenters:
Frank Cervone
Asst. University Librarian for Info Technology
Northwestern University

Jeff Wisniewski (pinch hitter)
Web Services Librarian
University of Pittsburgh

Frank’s plane was delayed in LA due to fog, so Jeff filled in with Frank on the conference phone. Thanks to Jeff for filling in at the last minute, and being such a good sport. Overall, he did a great job, and I hope that InfoToday, Frank, and Jeff (if he’s still talking to any of us ; ) will consider presenting this online as well so we can get to all the details and nuts & bolts of implementing a federated search product. If anyone’s interested, we could easily do it via OPAL, and I would be happy to set it up with a password protect if that’s needed. Since Frank already has his slides, it would be easy-beasy to set it up in OPAL, kick back, enjoy the slides, and use the text chat and VoIP to chat about federated search engines and everything Z39.50 has to offer (and not).

Here’s my overview, but there are lots and lots of details in the powerpoint, and I’ll link to that once it’s up online.

“Librarians are the toughest critics of federated searching.” ~ Jeff Wisniewski

This is a great point, and I think one of the most important elements of implementing a federated search is managing expectations—particularly the expectations of staff.

Federated searching isn’t about trying to replace the sophisticated type of searching that you get via the native database. Federated searching is really a googlized search, and for most users that is perfectly fine. If someone is an advanced searcher and wants to use advanced-level research, federated searching isn’t the best tool. For average users, they are just doing keywords anyway, and most likely, they aren’t using our databases now, so federated searches are a great tool for libraries to provide to their patrons.

Agenda:
What is federated searching?
What does Open URL have to do with? How does it work?
How does federated searching work?
Implementation issues

Why do I care about federated searching?
To provide a single interface to info resources
Lots of the databases/purchased content packages have less-than intuitive names, and therefore don’t know where to start searching (Example: what do patrons think when they see a link to “Ebsco”?)
Decrease duplication of effort: you don’t want to have to re-run your search in multiple databases if you can get around it
Remote access to everything (which you probably already have with ez-proxy or some other method) PLUS a single-search interface, is a nifty thing for patrons.

Northwestern did focus groups with faculty and students to find out what they liked and didn’t like about library research, and found out that the federated search met most of the needs identified from both faculty and students.

Univ. of Pittsburgh has Web Feet. Depending on the product, you can save searches, set up alerts, and other additional services. They have had Web Feed up for 1.5 years, and they are now implementing the extra features like saved searches and alerts, and their faculty have been really excited about it.

A good question to ask federated search vendors: what can’t your software federate?
What Web Feet has always said to them: anything that has a search box, you can federate. But, it’s going to depend on the product.

Another thing that will change, depending on the vendor, will be the configuration decisions you can make regarding the pre-processing of the results, and how the results display to the user.

Where does OpenURL fit in?
Open URL is a NISO standard. So, because Link resolvers are based on those standards, there isn’t necessarily any reason to have to use the same vendor for the link resolver and the federated search engine.

If in addition to your federated search product, if you have an Open URL link resolver, you not only get a total list of citations and resources, but the Open URL resolvers makes the link directly to the full-text of the resources—no matter which database vendor is actually providing the full-text.

How is OpenURL enabled?
1. Buy/install an OpenURL (or set up a externally hosted solution)
2. Provide local holdings information
3. Tell source vendors to enable Open URL functionality (identify your OpenURL resolver to the vendors)
4. Once it is enabled, the Open URL-enabled button will appear as an option (some let you substitute an image, other text, etc…).

On-going maintenance
Need to edit subscription information as your subscriptions change, your vendors change content holdings, etc. (Some vendors provide these updates monthly. Some Open URL resolvers take care of the subscription maintenance as long as you at least let them know when you add/drop databases.)

One of the challenges is the full-text content available in aggregated databases, such as Lexis/Nexis, because the information changes frequently, so it’s important to have a vendor that updates often to keep the information current.

What do you get when you metasearch/use a federated search?
Issues of consistency come from databases not having the same types of searches. A keyword search in one database will work differently in different databases. (Searches run differently, look at different fields, etc.)

Types of results returned
Level 1: Link to (Provides a link to other databases and wishes you luck)
Level 2: Search & link (The content of the database is searched by the federated search)
Level 3: Search & return a brief record (IR returns brief parsable records containing enough information to construct a basic OpenURL)
Level 4: Search & return a full record (IR returns fully parsable records)

How does federated search affect collection development and/or database usage statistics?
With regard to the stats, federated search throws everything on its head. One of the metrics that used to make sense to them (at Pitt) was “number of searches” by database. But now, it doesn’t have meaning since the federated search is going to hit the database constantly to pull any results. The number they look at now is the number of full-text retrievals.

There is also a national standards group called “Project Counter” that is working with the database vendors to standardize how database vendors count and report end-user stats. During negotiations with database vendors, both Northwestern and Univ. of Pittsburgh ask database vendors if they are supporting Project Counter. And, if they’re not, they encourage the vendors to start. (You would also want your Federated Search vendor to be supportive of Project Counter as well.)

Does federated searching affect the number of concurrent users you have to have?
Yes. At first, Univ. of Pittsburgh had an option to search everything, and they had a lot of turn-away issues because of concurrent users. Users just tried to search everything, but then they didn’t get good results. Since they have 200+ databases, it slowed it all down, so it was better for the library staff to make some decisions up front. They still have the option to search everything, but it’s not promoted as the main or first option.

Issues with federated searching:
Variety in the returned results (because of different protocols, metadata formats)
Multiple vocabularies, ontologies, disciplines
Databases that are not fundamentally bibliographic
Merging result sets is difficult
It’s difficult to handle duplicate records, because of field discrepancies (a title here isn’t a title there)
Results sets that differ due to time outs

Z39.50 issues
A lot of the problems people have with federated searching have to do with Z39.50 issues.

Z39.50 was designed prior to the Web; it has a lot of features, but it’s very complicated.
It uses MARC as its basic format, and it has robust search options (Boolean, truncation, proximity, completeness).
It has lots of benefits and functions as the “fundamental glue” that binds disparate systems together (For example, many use it to search multiple library catalogs).

Transaction overhead: There is a lot of stuff going on to complete each transaction, so it slows the system down, or takes a long time to deliver the results. Therefore if you have a federated search vendor that is doing a number of Z39.50 transactions at once, it can really affect your network. It can also affect the server side as it tries to de-duplicate all the results.

Differences in implementation of the standard: When people compare how the federated search results look next to the results of the native database, they will often see that native database results are quite different. But, this is because of the variations in the implementation of the standard (what is a keyword, what does keyword search, etc…)

On the horizon: CQL query: XML based, but it’s lighter, faster, easier to process

(There was lots and lots more here, but we didn’t get a chance to get into most of it. Frank was trying to talk to us via phone, but the connection faded in and out.)

Implementation
http://www.library.pitt.edu/
Pitt uses 3 different entry points to the federated search:
From the front page of the library: the search pulls four major general databases and the catalog (and you can exclude either the databases or the catalog)
Databases by Subject: search across a collection of subject-specific databases –has the option to do a basic search across all or an advanced search across subject-specific options
Resource A-Z list—users can chose which databases are included in their total federated search (they do have the option to search all, but it’s at the bottom of the page, so they don’t slam into their concurrent user limitations)

Major trouble spots to implementing federated searching and open URL:
Configuring databases and resources: Have the correct technical information—it’s not complicated, it’s just getting that information pulled together.
Defining collections: Kick start the process by appointing a small group to make the first pass at each area to get a draft out, and then work on it with the subject specialists. Also, look at performance characteristics: “do these databases ‘play well’ together?”
User interface: How much are you going to customize the interface? Will you de-dup the results and have relevance ranking?
Rollout and acceptance: Be ready for internal resistance and plan for promoting it to the public.

In the end, a federated search isn’t perfect, but for most patrons, good enough usually is. Yes, librarians are often troubled by the discrepancies in the searches between native and fed searches, but we need to manage expectations and not sell federated searching as a replacement for native database interfaces. There are a lot of users who don’t look at the library as a source for information, and if we can bring those users into the library (or library Web page) to show them how the library’s resources are beneficial and a good ROI, it’s all good for all of us in the end.

Now...onto the wine!

Technorati/Flicker Tag: IL05

Queequeg's Content Saloon

Saturday, October 22, 2005

Internet Librarian day 1: preconference workshop & wine

0 Comments:

Previous Posts