John Skiles Skinner's portfolio

A photograph of Cornell University campus at sunset in winter

Using the Open Syllabus Project API

Data on what books are assigned in college coursework is the basis of a book recommendation system for libraries

The Open Syllabus Project (opensyllabus.org) has gathered millions of syllabi from millions of college and university classes. Among the data they've scraped from the syllabi are the works (textbooks, novels, journal articles, essays, etc.) assigned as reading in those classes. This post is about a way to use this data to build a book recommendation engine.

A video demonstration of the book recommendation engine driven by Open Syllabus Project data

The problem

Library discovery faces a disadvantage in comparison to e-commerce discovery. Amazon can show me books related to a particular book that interests me by using their records of what books have been purchased by the same people. Libraries, being committed to privacy and free inquiry, cannot ethically make use of checkout history in this way. Often, to prevent misuse, libraries do not even store the data that would be needed for a book recommendation algorithm.

Syllabus assignment data addresses this limitation. Two works that are assigned in the same college class are (probably?) related. Perhaps they are related in a way that's similar to how two books purchased online by the same customer are related. The people at Open Syllabus Project (OSP) have given the name "coassignment" to this property of being assigned in the same class.

A photograph of an empty classroom
A Cornell University classroom, Plant Science Building

In September 2019 I attended a Blacklight meeting at Stanford. Blacklight is a discovery layer — software people use to locate things in a library. Hearing attendees discussing this potential use of syllabus data, I wrote to Open Syllabus Project.

OSP kindly sent me a sample of syllabus data, from which I built a data service prototype that calculated coassignments. It was so successful as a recommendation engine that OSP collaborated with me to build a coassignments API call of their own:

https://api.opensyllabus.org/coassignments/isbn/9780385490818

When they were done, I switched my blacklight code from my prototype API to this official OSP API. This blog post is about my implementation.

How to query the OSP API

If you have Open Syllabus Project API credentials, clicking the above link will show you some API documentation and the JSON result of the API call. That result is a listing of works coassigned with (assigned in the same classes as) a given work, which is specified by ISBN number. In this example, the ISBN is "9780385490818" at the end of the URL. (The book with that ISBN is The Handmaid's Tale.)

The same API call can viewed in raw JSON format, like this:

https://api.opensyllabus.org/coassignments/isbn/9780385490818?format=json

In addition to ISBN, the API also supports finding works by DOI and by Open Syllabus Project's in-house ID system.

Query with multiple ISBNs

The API can also handle multiple ISBNs. They should be separated by commas, like this:

https://api.opensyllabus.org/coassignments/isbn/9780525435006,9780385490818

Both of the ISBNs in that example are valid identifiers for The Handmaid's Tale, but Open Syllabus Project only knows about one of them. What counts as "a book" is a complex matter, and ISBN assignments over history are correspondingly messy. In my implementation of the API, I transmit all ISBNs known for a given library catalog entry to maximize the chance that the API will discover the book(s) I mean.

A photograph of a bookshelf
A bookshelf where I work, Mann Library, Cornell University

The implementation

Below you'll find code for a Blacklight implementation of the OSP API. The implementation consists of two parts: back-end code that gets and processes data from the Open Syllabus Project API, and front-end code that shows that information to the user.

This code is intended to recommend works related to a current item being viewed in the catalog, which is the first feature in the above video. The second feature in the video, browsing works by course subject, is not currently supported by the API.

The back-end code

Blacklight is a Ruby on Rails application. My Rails code, which can also be found on Github, looks like this:

def osp_coassignments
  (render json: [], status: 200; return) if params[:isbns].blank?

  range = 0..19 # return top 20 most assigned works
  token = ENV['OSP_API_TOKEN'] # Token in .env file
  isbns = params[:isbns] # ISBNs joined with commas
  cnctn = HTTPClient.get(
    "https://api.opensyllabus.org/coassignments/isbn/" + isbns,
    nil, # query not used by Open Syllabus Project's API
    {authorization: "Token #{token}"} 
  )
  json_body = JSON.parse(cnctn.content)
  # return empty results if ISBNs not found
  (render json: [], status: 200; return) unless json_body.kind_of?(Array)
  # sort and return top results if found
  sort_rank = json_body.sort_by{|a| a['count'] }.reverse
  top_items = sort_rank[range]
  isbn_nums = top_items.map{|b| b['isbns'].map{|c| c.to_i }}.reject { |d| d.empty? }

  render json: isbn_nums, status: cnctn.status
end

This Ruby is mainly a wrapper for the OSP API. It connects to and authenticates with the Open Syllabus Project API, processes the results a little, and re-transmits them for use by our front-end code. The output is an array of works, where each works is represented by an array of ISBN integers. The resulting nested arrays look like this:

[[9780156035842,679417397,78389368,30565073,151660387,452254264],[307264882,375704140,701130601,452261368,1400033411,140283404,394535979,896211231]]

The most frequently coassigned works are listed first. This is accomplished by sorting by the count property, which represents the number of times each work listed was co-assigned with the input work. The listing stops after the 20 most coassigned works; this number can be adjusted with the range variable.

The .map{|c| c.to_i }} that converts each ISBN from a string to an integer was chosen as a simple way to remove nonnumeric suffixes that ISBNs in the resulting data sometimes have.

The front-end code

The front-end code can be found in full on Github. I'll highlight a few parts below.

The JavaScript function that triggers our back-end code, getting a list of works by ISBN, is pretty simple:

// Get coassignmenst via internal API that fronts OSP API
queryOspCoassignmentsApi: async function(isbnsParam) {
  try {
    const localRoute = "/browseld/osp_coassignments?isbns=";
    return await $.get(localRoute + isbnsParam);
  } catch (err) {
    return false;
  }
},

That function, and the others to come, use the async/await pattern to sequence the API calls. The queryOspCoassignmentsApi: key at the outset is because the function is defined as a method of an object used to collect related code together.

We don't want to recommend any books that aren't actually in our library catalog. So, we need a way to query Solr (the search platform used by Blacklight) to check that a given work is present. The first of these two functions asks Solr for a count of the number of works matching a list of ISBNs, and the second extracts a boolean value regarding if that count is nonzero:

// isbns: an array of strings designating a book to be looked up in Solr
querySolrCheckSuggestion: async function(isbns) {
  const joinedIsbns = isbns.join(" OR ");
  const solrServer = $("#solr-server-url-data").html();
  const solrParams = "/select?&wt=json&rows=1&q=" + joinedIsbns;
  // Using JSONP to avoid CORS errors, but it prevents use of try/catch
  return await $.ajax({
    url: solrServer + solrParams,
    type: "GET",
    dataType: "jsonp",
    jsonp: "json.wrf"
  });
},

// solrResults: result of querySolrCheckSuggestion
hasSolrResults: function(solrResults) {
  const numFound = solrResults["response"]["numFound"];
  return numFound > 0;
}

The address of the Solr server is coming from a hidden HTML element in the page.

I would ideally liked to have had some error handling on that AJAX call, in case Solr is not responding. But, as currently configured, our Solr server requires a JSONP connection. This prevents error catching.

The Solr query wouldn't have to be done on the front end. Our Ruby method could have filtered out the books missing from the catalog before sending ISBNs on to the front end JS. The reason I chose not to do it that way is speed — I want works known to be in the catalog to pop in to the UI as soon as their existence in Solr is discovered. If we'd done the check on the server side, we would have to wait until every book was checked in Solr before beginning to render the results.

I haven't included the code in this blog post, but we also need a function that writes the recommended books to the page. How this looks will depend on how your catalog looks and how you want the recommendations to fit into it. You can see what I did on Github.

A photograph of wall with objects attached to it, including strips of film and a floppy disk
Inside Cornell digitization services

Putting the front-end together

Finally, this is the code that triggers all the JS we've written so far:

// Display books related to (coassigned with) the current book view
getCoassignedBooks: async function(suggestions) {
  const isbns = $("#isbns-json-data").html();
  const isbnsParam = JSON.parse(isbns).join(",");
  const coAssigned = await this.queryOspCoassignmentsApi(isbnsParam);
  if (!coAssigned) {
    return; // stop execution if API call does not work
  }
  for (const assignment of coAssigned) {
    const queryCatalog = await this.querySolrCheckSuggestion(assignment);
    if (this.hasSolrResults(queryCatalog)) {
      const result = queryCatalog["response"]["docs"][0];
      openSyllabus.formatAndListSuggestions(result, assignment); // write HTML
    }
  }
  window.bookcovers.onLoad(); // fill cover images
},

Above you can see that the functions to query for Open Syllabus data, to query Solr, to count Solr results, and to write HTML to the page are all used. The list of input ISBNs comes from the webpage that the display is to be rendered on. I've assumed here it's in JSON form, in a hidden HTML element with the id isbns-json-data. This element should be added to the Rails view where the recommendations will appear, or to a partial used in that view.

Credits

Jody Leonard and Cassey Lottman were helpful in writing this code!

The code was written at Cornell University Library as a part of the Linked Data for Production: Pathway to Implementation (LD4P.org) grant by the Andrew W. Mellon Foundation. Its use is governed by the Apache License.


Top of page | Blog index | Return home