Last fall, with little fanfare, the US Copyright Office granted an exemption to a long-standing restriction on digital access to copyrighted books and films, allowing academic researchers to circumvent encryption so they can apply data mining techniques contemporary books and films. These same techniques have provided powerful information in the financial, scientific, and medical fields for decades because the materials they depend on are generally not protected by encryption backed by federal law. As a result, researchers were able, for example, to quickly survey a mass of coronavirus literature.
Some film specialists may be able to use the Copyright Office’s exemption, profiting by buying DVDs and bypassing encryption. It would be a big win for our collective understanding of an important part of our culture, especially given the global dominance of the American film industry.
But for those wishing to study literature, the exemption has proven frustratingly unachievable. Virtually all e-books available on the market today are licensed with terms that prohibit bypassing encryption. So while an academic breaking encryption for data mining no longer violates federal law, researchers could still be forced to remove a paper for breaching contractual terms, as has already happened for a paper on Covid-19 vaccine hesitancy. In addition, researchers may be liable for damages for breach of contract terms.
This means that humanities researchers using textual data mining techniques are still largely limited to studying public domain (i.e., pre-1925) works. Imagine if a data scientist was restricted to using demographic data from 1950, or if a medical researcher was restricted from conducting a meta-analysis on DNA samples from the last 25 years.
Although no one discovers the cure for cancer by studying popular culture, this new copyright exemption has the potential to inform and change the cultural conversation in ways that were not possible before. Given the huge influence of American popular culture on our global society — not to mention our country’s continued reckoning with its history of racial injustice — it’s no small feat.
Until the Copyright Office granted the exemption, Section 1201 of the Digital Millennium Copyright Act (DMCA) prevented researchers from engaging in data mining of copyrighted works. ‘author. The DMCA includes a provision that prevents anyone – including scholars pursuing clearly legal research projects – from accessing copyrighted material that is under a digital lock and key. Violators of the law, which aims to deter Internet piracy, face stiff criminal and civil penalties of up to $500,000 and up to 5 years in prison for the first offense and double fines and prison terms for the second offence. Even for a good cause, few scholars are willing to go to jail in search of knowledge.
Scroll to continue
To remove this barrier, 14 researchers, as well as two experts in academic publishing and Association for Computing and the Humanitiesa professional organization, submitted letters supporting a petition filed by Authors Alliancea digital advocacy group for writers, with the help of the Samuelson Law, Technology and Public Policy Clinic at Berkeley Law (which I direct). The Copyright Office granted an exemption to circumvent encryption in October 2021, removing a barrier to further research. It is progress.
But the problem remains that academics who want to engage in e-book data mining are still largely prevented from doing so. Academics will not carry out research projects, however valuable, that are not publishable because their completion requires breaching contract law. Moreover, few scholars will be willing to take personal liability for tens or hundreds of thousands of dollars in damages for breach of contract in order to advance their research programs.
There are several ways to ensure that academics can circumvent encryption to perform data mining, but each of them brings its own challenges. The best solution would be for Congress to protect researchers’ copyright rights by passing legislation ensuring that publishers cannot, by contract, limit what the law otherwise allows researchers to do. But Congress is in the throes of a partisan gridlock, and the content industry’s lobbying power is formidable.
States, too, could act. After all, they administer robust higher education systems and have a vested interest in ensuring that scholars can continue to do cutting-edge work. In a related controversy over the contractual restrictions that publishers impose on libraries buying e-books, some have offers that states regulate the terms of e-book licenses. Assuming this new approach is successful, states could also consider legislating that e-book contract provisions prohibiting academics from circumventing encryption to perform data mining are also contrary to public order. and inapplicable. But that would only result in piecemeal protections, as not all states are likely to take action.
Finally, large university systems might try to leverage their market power to insist that e-book contracts allow their professors and students to circumvent encryption for data mining. In certain recent battles between the publishers and the university systems, the universities succeeded in obtaining more favorable contractual provisions than those initially proposed. However, academic collections tend to under-represent popular works that generate the most research interest among digital humanities scholars. Thus, the major platforms providing these works – such as Amazon, Apple and Google – should also use their considerable bargaining power to ensure that the rights their users enjoy under the law are not taken away from them by contract.
Admittedly, some authors and publishers fear that “rogue actors” will break the encryption of e-books and then make them freely available on the Internet, depriving authors and publishers of compensation. But this concern was adequately addressed. The Copyright Office already requires university researchers to use strict security measures to protect eBooks that have been unlocked for text mining. Academic researchers routinely secure sensitive research data ranging from individuals’ medical data to national security information. These security measures are certainly more than enough to secure e-books as well.
One thing is clear: data mining is a valuable research technique in many areas of learning. The US Copyright Office has finally opened the door for American academics to engage in this 21st century technique by allowing researchers to circumvent the encryption of copyrighted works, but policies outdated publishers keep this potential source of cultural progress firmly locked in the past.