Abstracts

Controlled Vocabularies in the Digital Age: Are They Still Relevant?

by William Baker




Institution: University of North Texas
Department:
Year: 2017
Keywords: Library of Congress Subject Headings; LCSH; Controlled Vocabularies
Posted: 02/01/2018
Record ID: 2170296
Full text PDF: https://digital.library.unt.edu/ark:/67531/metadc1011802/


Abstract

Keyword searching and controlled vocabularies such as Library of Congress subject headings (LCSH) proved to work well together in automated technologies and the two systems have been considered complimentary. When the Internet burst onto the information landscape, users embraced the simplicity of keyword searching of this resource while researchers and scholars seemed unable to agree on how best to make use of controlled vocabularies in this huge database. This research looked at a controlled vocabulary, LCSH, in the context of keyword searching of a full text database. The Internet and probably its most used search engine, Google, seemed to have set a standard that users have embraced: a keyword-searchable single search box on an uncluttered web page. Libraries have even introduced federated single search boxes to their web pages, another testimony to the influence of Google. UNT's Thesis and Dissertation digital database was used to compile quantitative data with the results input into an EXCEL spreadsheet. Both Library of Congress subject headings (LCSH) and author-assigned keywords were analyzed within selected dissertations and both systems were compared. When the LCSH terms from the dissertations were quantified, the results showed that from a total of 788 words contained in the 207 LCSH terms assigned to 70 dissertations, 246 of 31% did not appear in the title or abstract while only 8, or about 1% from the total of 788, did not appear in the full text. When the author-assigned keywords were quantified, the results showed that from a total of 552 words from304 author-assigned keywords in 86 dissertations, 50 or 9% did not appear in the title or abstract while only one word from the total of 552 or .18% did not appear in the full text. Qualitatively, the LCSH terms showed a hierarchical construction that was clearly designed for a print card catalog, seemingly unnecessary in a random access digital environment. While author-assigned keywords were important words and phrases, these words and phrases often appeared in the title, metadata, and full text of the dissertation, making them seemingly unnecessary in a keyword search environment as they added no additional access points. Authors cited in this research have tended to agree that controlled vocabularies such as LCSH are complicated to develop and implement and expensive to maintain. Most researchers have also tended to agree that LCSH needs to be simplified for large, full text databases such as the Internet. Some of the researchers have also called for some form of automation that seamlessly links LCSH to subject terms in a keyword search. This research tends to confirm that LCSH could benefit from simplification as well as automation and offers some suggestions for improvements in both areas.Advisors/Committee Members: Oyarce, Guillermo A, O'Connor, Brian, Senn, Will.