In the first of series of discussions with leading figures in the world of libraries and metadata, BDS talks to Dr Lars G. Svensson, world-expert on linked data and the Semantic Web, and asked him about his vision of the future for libraries…
A Connected, Courageous New World
BDS interviews Dr Lars G. Svensson
BDS: Can we start this discussion by asking you, Lars, about your background?
Lars Svensson: That’s an extremely diverse question because I have an extremely diverse background. After school I had the choice between studying sacred music or mathematics and I went for mathematics which proved too dry. So I took time out and worked for two years in an oil refinery. After that I moved from my home in Sweden to Germany, deciding I wanted to do something completely different. I started a PhD in library history with the goal of eventually becoming a librarian but I wasn’t accepted for library training. Instead I was retrained as a software developer and worked for two years writing e-commerce applications but then came the dot com crash and I lost my job. However, at exactly the same time, a position at the German National Library (DNB) became available that required someone between a librarian and a developer. I applied; got the job, and the initial two year project was extended into a long term position.
During those early years I became interested in something I had read about called the Semantic Web which I couldn’t make sense of at all. It wasn’t until I came into a library, really working with metadata, that I saw that we have the titles here and the authorities there, and they are linked together with typed relations. I realised I had read about this… this is called the Semantic Web. I asked, “Why aren’t libraries contributing to the Semantic Web, the data is predestined for it?” I went on to suggest that we should try it. So in 2010 we ventured into our first linked data project here in the DNB, starting with the authorities, which are much easier than bibliographic data, and since then we have built it up, revamping the service three or four times, taking a great deal of inspiration and encouragement from the Swedish National Library who were the first to undertake bringing out their entire union catalogue as linked data in 2008.
BDS: We hear the term “linked data” with ever increasing regularity, so much so it has become a “buzz word” but what exactly is linked data?
Lars Svensson: Linked data is a design principle. It is a set of technologies working together. It goes back to the famous article by Tim Berners-Lee, James Hendler and Ora Lassila written in 2001, where they first coined the term “Semantic Web.” This was something that never really took off because it was seen as too rigid and formal and no-one ever showed that the large data sets made sense outside of the specific communities that had created them, to a large extent because the data resided in silos. So, at that time, it never went in the direction of a data driven infrastructure which was what Berners-Lee really wanted to have. So after a few years, it was realised that what was needed was to start linking things together and so they asked people to publish data on the Web using the four linked data principles which define how to identify things, how to provide useful information and particularly the fourth of which is to supply links to other data sets so a person or a machine can read how data sets in the world link together. That is the key point to linked data: providing links to the outside world.
BDS: So the links have to exist to constitute linked data, but how are they created?
Lars Svensson: The one we are most used to in libraries is the manual creation of links. Links to a work, the publication to an author, to topical subject headings, to events, whatever. Serials are also linked together, this is manual work. There are also projects like MACS – multi-lingual access to subjects – which aimed to link together subject headings from different thesauri in order to provide multi-lingual search options. Then there are machine generated links, which require some sort of algorithm that can decide “how do I match those things together?” This requires controls around what maps onto what and to what extent it maps; is one term broader than another, or less.
BDS: Surely, working with machines poses problems, as anyone who has searched for information on their computer can testify when the computer returns information that is completely tangential to their requirements?
Lars Svensson: Indeed. We have a similar problem with metadata matching. How do you decide if two publications are the same? The answer to the question is context dependant. Now, with people it is easy: it is either this person or it isn’t but with bibliographic metadata is the first edition the same as the second edition? Is it a reprint? At what point should it be considered as different? When identifiers, e. g. ISBN, match you can rank that higher than matching title strings, but it is a choice. If we create a pdf from a Word document, a computer will definitely see it as different even though the contents will be exactly the same.
BDS: Are bibliographic agencies capturing the right information to facilitate the creation of linked data?
Lars Svensson: By and large, yes. RDA goes in the right direction because it is going much more into authorities, using codes or controlled vocabularies for different things. Also, the current move from “strings to things” is a very welcome development. However, there are still plenty of other cataloguing codes being used, and we have to attempt some kind of “fuzzy” matching to link them up.
BDS: In your published article “Are bibliographic models suitable for integration with the Web?” you seem to be offering a challenge to libraries, suggesting that they are retreating from addressing these issues around linked data and the semantic web. Is that impression correct?
Lars Svensson: Essentially, yes. Libraries are very good at creating standards, and sometime those standards are interoperable within the library community, or definitely not outside the library community, and sometimes we seem to ignore the possibility to reuse standards created by other communities. But my main point in that article is that right now we don’t know what the best bibliographic model is. Is it the RDA/FRBR? Is it Bibframe? Is it better having a two tier architecture: just bibliographic data and holdings, as it used to be? – I don’t think so. In order to figure this out we now need to do something with this data, we need to build applications that consume it to discover the best way of creating it. The proof is in the pudding. It is not until we start using this data, seeing it work, that we can answer these questions.
BDS: It’s a bit like using a map. You don’t know if it is any good until you get lost.
Lars Svensson: Yes. And in this case there is no GPS.
BDS: In the paper mentioned above you several times make the point that the bibliographic information used in the library community would not make sense to Joe Public, indeed, you clearly state that it could mislead the woman in the street. Is this changing?
Lars Svensson: Well, perhaps not to the woman in the street since she usually doesn’t try to interpret the underlying data model. If she – however – were a web developer and wanted to write an application using library data she probably would be confused over the concept of authorities. If you look at traditional library data it looks as if authorities write books but they don’t, people write books. There has been recent talk of “entity based cataloguing” which decides to ignore the authority part of the data, at least as far as people and corporate bodies are concerned. Libraries are starting to walk this path and I am largely comfortable with that. Larger libraries and consortia are leading and smaller libraries will follow, especially as their consortia adopt a new model. They may not understand its intricacies but they will say, “as long as it works, it’s ok.”
BDS: If you were evangelising for linked data what would you say to smaller libraries? Why should they adopt the linked data model?
Lars Svensson: If I go very far into the future, I envision an infrastructure that is entirely built on linked data. For a small library, that would essentially mean less cataloguing. The moment that you acquire some kind of item, you look it up in the National Database and you just link to it. You say, “I’ve got this book, too.” And then you add an acquisition number and a note that there is a coffee stain on page forty two and that’s it. The rest of the data is out there. The system will know how to retrieve it, do the indexing and everything else.
BDS: Who should be creating this bibliographic metadata?
Lars Svensson: A large part of the job has to be done by national bibliographic agencies. They have a mandate to do this. They will rely on publisher metadata which they in turn will augment and enhance. Then services can be built on top of this data which we, for example at the DNB, have recently made available to everyone, openly and freely.
BDS: This requires a national will and it also requires national funding. There are surely huge issues around open and free data with regard to funding its creation?
Lars Svensson: Absolutely. To me library metadata is part of the national infrastructure. It has to be government funded one way or the other. What business model each country chooses is a matter for the country.
BDS: Another consequence of the thinking around linked data is that it extends to much more than library holdings, in fact it embraces cultural objects of all kinds. Are you currently linking into other national collections, say in museums?
Lars Svensson: Indeed. What we discuss very much is common use of entities. As soon as you have a cultural heritage project it is about persons, places, events and so on. It doesn’t really make sense for each institution to create data on its own. For example, there is a service in Germany for smaller museums to upload their data to a national database. At first it was very plain. A photo of a painting and a short description, for example. Then, when I revisited the site, I noticed an information link close to the painter’s name. Up came a light box with the data from the integrated authority file. Oh, this is nice, I thought. The third time I came back, they had pulled in content from Wikipedia in all languages they could get. They had extracted the links from the authority file and those proliferate if you curate the entry correctly.
BDS: Does the reliability of such linked information worry you?
Lars Svensson: Not very much. Sources such as Wikipedia are much more reliable today and quality control is much stricter. We co-operate with the Wikipedia community now; the community create links for us in the national authority file. Of course, there is the usual caveat: use this information at your own peril; it is not curated by us. And we must remember that library metadata isn’t perfect either. Also, the more the data is used by the public, the more likely that mistakes will be discovered and corrected.
BDS: It seems that linked data is the ultimate expression of library values because everybody can share in the effort that is being made in one place and you are getting something that is greater than the work that you have undertaken in your own institution.
Lars Svensson: Absolutely. The heart of librarianship is access to information. How people get access to it is a secondary question. The answer to that secondary question has been changing for the last twenty years. We are moving at least in the connected world – and this raises many questions about regions of the world with low internet connectivity such as parts of Africa, I might add – towards a global database integrating seamlessly and accessible to everyone, with no single point or source but as many sources as wish to participate each with their own vision of their own data and each, by and large, interoperable.
BDS: Lars Svensson, thank you very much.