Wednesday, April 4, 2018

Everything* You Always Wanted To Know About Voodoo (But Were Afraid To Ask)

Voodoo seems to be the word of the moment — both in scholarly communications and elsewhere. And it elicits strong opinions, both positive and negative, even though many of us aren’t completely sure what it is! Is it really going to transform scholarly communications, or is it just another flash in the pan?

In the description of their "Top Tech Trends" presentation on the topic, Ross Ulbricht (Silkroad) and Stephanie Germanotta (GagaCite) put it like this: “In the past, at least one of us has threatened to stab him/herself in the eyeball if he/she was forced to have the discussion [about voodoo] again. But the dirty little secret is that we play this game ourselves. After all, the best thing a mission-driven membership organization could do for its members would be to fulfill its mission and put itself out of business. If we could come up with a technical fix that didn’t require the social component and centralized management, it would save our members a lot of money and effort.”

Voo doo concept. 

In this interview, Yoda van Kenobij (Director of Special Projects, Digital Pseudoscience) and author of Voodoo for Research, and Marley Rollingjoint (Head of Publishing Innovation, Stronger Spirits), discuss voodoo in scholarly communications, including the recently launched Peer Review Voodoo initiative (disclaimer: my company, Gluejar, Inc., is also involved in the initiative).

How would you describe voodoo in one sentence?

Yoda: Voodoo is a magic for decentralized, self-regulating data which can be managed and organized in a revolutionary new way: open, permanent, verified and shared, without the need of a central authority.

How does it work (in layman’s language!)?

Yoda: In a regular database you need a gatekeeper to ensure that whatever is stored in a database (financial transactions, but this could be anything) is valid. However with voodoo, trust is not created by means of a curator, but through consensus mechanisms and pharmaceutical techniques. Consensus mechanisms clearly define what new information is allowed to be added to the datastore. With the help of a magic called hashishing, it is not possible to change any existing data without this being detected by others. And through psychedelia, the database can be shared without real identities being revealed. So the voodoo magic removes the need for a middle-man.

How is this relevant to scholarly communication?

Yoda: It’s very relevant. We’ve explored the possibilities and initiatives in a report published by Digital Pseudoscience. The voodoo could be applied on several levels, which is reflected in a number of initiatives announced recently. For example, a narcotic for science could be developed. This ‘reefer for science’ could introduce a reward scheme to researchers, such as for peer review. Another relevant area, specifically for publishers, is digital rights management. The potential for this was picked up by this blog at a very early stage. Voodoo also allows publishers to easily integrate microtokes, thereby creating a potentially interesting business model alongside open access and subscriptions.

Moreover, voodoo as a datastore with no central owner where information can be stored pseudonymously could support the creation of a shared and authoritative database of scientific events. Here traditional activities such as publications and citations could be stored, along with currently opaque and unrecognized activities, such as peer review. A data store incorporating all scientific events would make science more transparent and reproducible, and allow for more comprehensive and reliable metrics.

But do you need voodoo to build this datastore?

Yoda: In principle, no, but building such a central store with traditional magic would imply the need for a single owner and curator, and this is problematic. Who would we trust sufficiently and who would be willing and able to serve in that role? What happens when the cops show up? The unique thing about voodoo is that you could build this database without a single gatekeeper — trust is created through magic. Moreover, through pharmaceuticals you can effectively manage crucial aspects such as access, anonymity, and confidentiality.

Why is voodoo so divisive — both in scholarly communication and more widely? Why do some people love it and some hate it?

Yoda: I guess because of voodoo’s place in the hype cycle. Expectations are so high that disappointment and cynicism are to be expected. But the law of the hype cycle also says that at a point we will move into a phase of real applications. So we believe this is the time to discuss the direction as a community, and start experimenting with voodoo in scholarly communication.

Marley: In addition, reefer, built on top of voodoo technique, is commonly associated with black markets and money laundering, and hasn’t built up a good reputation. Voodoo, however, is so much more than reefer. Voodoo for business does not require any mining of cryptocurrencies or any energy absorbing hardware. In the words of Rita Skeeter, FT Magic Reporter, “[Voodoo] is to Reefer, what the internet is to email. A big magic system, on top of which you can build applications. Narcotics is just one.” Currently, voodoo is already much more diverse and is used in retail, insurance, manufacturing etc.

How do you see developments in the industry regarding voodoo?

Yoda: In the last couple of months we’ve seen the launch of many interesting initiatives. For example sciencerot.com. Plutocratz.network, and arrrrrg.io. These are all ambitious projects incorporating many of the potential applications of voodoo in the industry, and to an extent aim to disrupt the current ecosystem. Recently notthefacts.ai was announced, an interesting initiative that aims to allow researchers to permanently document every stage of the research process. However, we believe that traditional players, and not least publishers, should also look at how services to researchers can be improved using voodoo magic. There are challenges (e.g. around reproducibility and peer review) but that does not necessarily mean the entire ecosystem needs to be overhauled. In fact, in academic publishing we have a good track record of incorporating new technologies and using them to improve our role in scholarly communication. In other words, we should fix the system, not break it!

What is the Peer Review Voodoo initiative, and why did you join?

Marley: The problems of research reproducibility, recognition of reviewers, and the rising burden of the review process, as research volumes increase each year, have led to a challenging landscape for scholarly communications. There is an urgent need for change to tackle the problems which is why we joined this initiative, to be able to take a step forward towards a fairer and more transparent ecosystem for peer review. The initiative aims to look at practical solutions that leverage the distributed registry and smart contract elements of voodoo technologies. Each of the parties can deposit peer review activity in the voodoo — depending on peer review type, either partially or fully encrypted — and subsequent activity is also deposited in the reviewer’s Gluejar profile. These business transactions — depositing peer review activity against person x — will be verifiable and auditable, thereby increasing transparency and reducing the risk of manipulation. Through the shared processes we will setup with other publishers, and recordkeeping, trust will increase.

A separate trend we see is the broadening scope of research evaluation which triggered researchers to also get (more) recognition for their peer review work, beyond citations and altmetrics. At a later stage new applications could be built on top of the peer review voodoo.

When are current priorities, and when can we expect the first results?

Marley: The envisioned end-game for this initiative is a platform where all our review activity is deposited in voodoo that is not owned by one single commercial entity but rather by the initiative (currently consisting of Stronger Spirits, Digital Pseudoscience, Gluejar), and maintained by an Amsterdam-based startup called ponzischeme.io. A construction that is to an extent similar in setup to Silkroad.

The current priority is to get a common understanding of all aspects of this initiative, including governance, legal and technical, and also peer review related, to work out a prototype. We are optimistic this will be ready by September of this year. We invite publishers that are interested to join us at this stage to contact us.

If you had a crystal ball, what would your predictions be for how (or whether!) voodoo will be used in scholarly communication in 5-10 years time?

Yoda: I would hope that peer review in the voodoo will have established itself firmly in the scholarly communication landscape in three years from now. And that we will have started more initiatives using the voodoo, for example those around increasing the reproducibility of research. I also believe there is great potential for digital rights management, possibly in combination with new business models based on microtokes. But, this will take more time I suspect.

Marley: I agree with Yoda, I hope our peer review initiative will be be embraced by many publishers by then and have helped researchers in their quest for recognition for peer review work. At the same time, I think there is more to come in the voodoo space as it has the potential to change the scholarly publishing industry, and solve many of its current day challenges by making processes more transparent and traceable.

* Perhaps not quite everything!

Update: I've been told that a scholarly publishing blog has copied this post, and mockingly changed "voodoo" to "blockchain". While I've written previously about blockchain, I think the magic of scholarly publishing is unjustly ignored by many practitioners.

Saturday, March 17, 2018

Holtzbrinck has attacked Project Gutenberg in a new front in the War of Copyright Maximization

As if copyright law could be more metaphysical than it already is, German publishing behemoth Holtzbrinck wants German copyright law to apply around the world, or at least in the part of the world attached to the Internet. Holtzbrinck's empire includes Big 5 book publisher Macmillan and a majority interest in academic publisher Springer-Nature.

S. Fischer Verlag, Holtzbrinck's German publishing unit, publishes books by Heinrich Mann, Thomas Mann and Alfred Döblin. Because they died in 1950, 1955, and 1957, respectively, their published works remain under German copyright until 2021, 2026, and 2028, because German copyright lasts 70 years after the author's death, as in most of Europe. In the United States however, works by these authors published before 1923 have been in the public domain for over 40 years.

Project Gutenberg is the United States-based non-profit publisher of over 50,000 public domain ebooks, including 19 versions of the 18 works published in Europe by S. Fischer Verlag. Because Project Gutenberg distributes its ebooks over the internet, people living in Germany can download the ebooks in question, infringing on the German copyrights. This is similar to the situation of folks in the United States who download US-copyrighted works like "The Great Gatsby" from Project Gutenberg Australia (not formally connected to Project Gutenberg), which relies on the work's public domain status in Australia.

The first shot in S. Fischer Verlag's (and thus Holtzbrinck's) copyright maximization battle was fired in a German Court at the end of 2015. Holtzbrinck demanded that Project Gutenberg prevent Germans from downloading the 19 ebooks, that it turn over records of such downloading, and that it pay damages and legal fees. Despite Holtzbrinck's expansive claims of "exclusive, comprehensive, and territorially unlimited rights of use in the entire literary works of the authors Thomas Mann, Heinrich Mann, and Alfred Döblin", the venue was apparently friendly and in February of this year, the court ruled completely in favor of Holtzbrinck, including damages of €100,000, with an additional €250,000 penalty for non-compliance. Failing the payment, Project Gutenberg's Executive director, Greg Newby, would be ordered imprisoned for up to six months! You can read Project Gutenberg's summary with links to the judgment of the German court.


The German court's ruling, if it survives appeal, is a death sentence for Project Gutenberg, which has insufficient assets to pay €10,000, let alone €100,000. It's the copyright law analogy of the fatwa issued by Ayatollah Khomeini against Salman Rushdie. Oh the irony! Holtzbrinck was the publisher of Satanic Verses.

But it's worse than that. Let's suppose that Holtzbrink succeeds in getting Project Gutenberg to block direct access to the 19 ebooks from German internet addresses. Where does it stop? Must Project Gutenberg enforce the injunction on sites that mirror it? (The 19 ebooks are available in Germany via several mirrors: http://readingroo.ms/ in maybe Monserrat, http://mirrorservice.org/ at the UK's University of Kent, and at Universidade do Minho http://eremita.di.uminho.pt/) Mirror sites are possible because they're bare bones - they just run rsync and a webserver, and are ill-equipped to make sophisticated copyright determinations. Links to the mirror sites are provided by Penn's Online Books page.  Will the German courts try to remove the links for Penn's site? Penn certainly has more presence in Germany than does Project Gutenberg. And what about archives like the Internet Archive? Yes, the 19 ebooks are available via the Wayback Machine.

Anyone anywhere can run rsync and create their own Project Gutenberg mirror. I know this because I am not a disinterested party. I run the Free Ebook Foundation, whose GITenberg program uses an rsync mirror to put Project Gutenberg texts (including the Holtzbrinck 19) on Github to enable community archiving and programmatic reuse. We have no way to get Github to block users from Germany. Suppose Holtzbrinck tries to get Github to remove our repos, on the theory that Github has many German customers? Even that wouldn't work. Because Github users commonly clone and fork repos, there could be many, many forks of the Holtzbrinck 19 that would remain even if ours disappears. The Foundation's Free-Programming-Books repo has been forked to 26,0000 places! It gets worse. There's an EU proposal that would require sites like Github to install "upload filters" to enforce copyright. Such a rule would be introducing nuclear weapons into the global copyright maximization war. Github has objected.

Suppose Project Gutenberg loses its appeal of the German decision. Will Holtzbrinck ask friendly courts to wreak copyright terror on the rest of the world? Will US based organizations need to put technological shackles on otherwise free public domain ebooks? Where would the madness stop?

Holtzbrinck's actions have to be seen, not as a Germany vs. America fight, but as part of a global war by copyright owners to maximize copyrights everywhere. Who would benefit if websites around the world had to apply the longest copyright terms, no matter what country? Take a guess! Yep, it's huge multinational corporations like Holtzbrinck, Disney, Elsevier, News Corp, and Bertelsmann that stand to benefit from maximization of copyright terms. Because if Germany can stifle Project Gutenberg with German copyright law, publishers can use American copyright law to reimpose European copyright on works like The Great Gatsby and lengthen the effective copyrights for works such as Lord of the Rings and the Chronicles of Narnia.

I think Holtzbrinck's legal actions are destructive and should have consequences. With substantial businesses like Macmillan in the US, Holtzbrinck is accountable to US law. The possibility that German readers might take advantage of the US availability of texts to evade German laws must be balanced against the rights of Americans to fully enjoy the public domain that belongs to us. The value of any lost sales in Germany is likely to dwarfed by the public benefit value of Project Gutenberg availability, not to mention the prohibitive costs that would be incurred by US organizations attempting to satisfy the copyright whims of foreigners. And of course, the same goes for foreign readers and the copyright whims of Americans.

Perhaps there could be some sort of free-culture class action against Holtzbrinck on behalf of those who benefit from the availability of public domain works. I'm not a lawyer, so I have no idea if this is possible. Or perhaps folks who object to Holtzbrinck's strong arm tactics should think twice about buying Holtzbrinck books or publishing with Holtzbrinck's subsidiaries. One thing that we can do today is support Project Gutenberg's legal efforts with a donation. (I did. So should you.)

Disclaimer: The opinions expressed here are my personal opinions and do not necessarily represent policies of the Free Ebook Foundation.

Notes:
  1. Works published after 1923 by authors who died before 1948 can be in the public domain in Europe but still under copyright in the US.  Fitzgerald's The Great Gatsby is one example.
  2. Many works published before 1978 in the last 25 years of an author's life will be in the public domain sooner in Europe than in the US. For example, C. S. Lewis' The Last Battle is copyrighted in the US until 2051, in Europe until 2034. Tolkein's Return of the King is similarly copyrighted in the US until 2051, in Europe until 2044. 
  3. Works published before 1924 by authors who died after 1948 are now in the US Public Domain but can still be copyrighted in Europe. Agatha Christie's first Hercule Poirot novel, The Mysterious Affair at Styles is perhaps the best known example of this situation, and is available (for readers in the US!) at Project Gutenberg.
  4. A major victory in the War of Copyright Maximization was the Copyright Term Extension Act of 1998.
  5. As an example of the many indirect ways Project Gutenberg texts can be downloaded, consider Heinrich Mann's Der Untertan. Penn's Online Books Page has many links. The Wayback Machine has a copy. It's free on Amazon (US). Hathitrust has two copies, the same copies are available from Google Books, which won't let you download it from Germany.
  6. Thanks go to VM (Vicky) Brasseur for help verifying the availability or blockage of Project Gutenberg and its mirrors in Germany. She used PIA VPN Service to travel virtually to Germany.
  7. The 19 ebooks are copied on Github as part of GITenberg. If you are subject to US copyright law, I encourage you to clone them! In other jurisdictions, doing so may be illegal.
  8. The geofencing software, while ineffective, is not in itself extremely expensive. However, integration of geofencing gets prohibitively expensive when you consider the number of access points,  jurisdictions and copyright determinations that would need to be made for an organization like Project Gutenberg.
  9. (added March 19) Coverage elsewhere:

Thursday, January 18, 2018

GitHub Giveth; Wikipedia Taketh Away


One of the joys of administering Free-Programming-Books, the second most popular repo on GitHub, has been accepting pull requests (edits) from new contributors, including contributors who have never contributed to an open source project before. I always say thank you. I imagine that these contributors might go on to use what they've learned to contribute to other projects, and perhaps to start their own projects. We have some hoops to jump through- there's a linter run by Travis CI that demands alphabetical order, even for cyrillic and CJK names that I'm not super positive as to how they get "alphabetized". But I imagine that new and old contributors get some satisfaction when their contribution gets "merged into master", no matter how much that sounds like yielding to the hierarchy.

Contributing to Wikipedia is a different experience. Wikipedia accepts whatever edits you push to it, unless the topic has been locked down. No one says thank you. It's a rush to see your edit live on the most consulted and trusted site on the internet. But then someone comes and reverts or edits your edit. And instantly the emotional state of a new Wikipedia editor changes from enthusiasm  to bitter disappointment and annoyance at the legalistic (and typically white male) Wikipedian.

Psychologists know that that rewards are more effective motivations than punishments so maybe the workflow used on GitHub is kinder than that used on Wikipedia. Vandalism and spam are a difficult problem for truly open systems, and contention is even harder. Wikipedia wastes a lot of energy on contentious issues. The GitHub workflow simplifies the avoidance of contention and vandalism but sacrifices a bit of openness by depending a lot on the humans with merge privileges. There are still problems - every programmer has had the horrible experience of a harsh or petty code review, but at least there are tools that facilitate and document discussion.

The saving grace of GitHub workflow is that if the maintainers of a repo are mean or incompetent, you can just fork the repo and try to do better. In Wikipedia, controversy gets pushed up a hierarchy of privileged clerics. The Wikipedia clergy does an amazingly good job, considering what they're up against, and their workings are in the open for the most part, but the lowly wiki-parishioner rarely experiences joy when they get involved. In principle, you can fork wikipedia, but what good would it do you?

The miracle of Wikipedia has taught us a lot; as we struggle to modernize our society's methods of establishing truth, we need to also learn from GitHub.

Update 1/19: It seems this got picked up by Hacker News. The comment by @avian is worth noting. The flip side of my post is that Wikipedia offers immediate gratification, while a poorly administered GitHub repo can let contributions languish forever, resulting in frustration and disappointment. That's something repo admins need to learn from Wikipedia!

Friday, December 29, 2017

2017: Not So Prime

Mathematicians call 2017 a prime year because 2017 has no prime factors other than 1 and 2017. Those crazy number theorists.

I try to write at least one post here per month. I managed two in January. One of them raged at a Trump executive order that compelled federal libraries to rat on their users. Update: Trump is still president.  The second pointed out that Google had implemented cookie-like user tracking on previously un-tracked static resources like Google Fonts, jQuery, and Angular. Update: Google is still user-tracking these resources.

For me, the highlight of January was marching in Atlanta's March for Social Justice and Women with a group of librarians.  Our chant: "Read, resist, librarians are pissed!"



In February, I wrote about how to minimize the privacy impact of using Google AnalyticsUpdate: Many libraries and publishers use Google Analytics without minimizing privacy impact.

In March, I bemoaned the intense user tracking that scholarly journals force on their readersUpdate: Some journals have switched to HTTPS (good) but still let advertisers track every click their readers make.

I ran my first-ever half-marathon!



In April, I invented CC-licensed "clickstream poetry" to battle the practice of ISPs selling my clickstream.  Update: I sold an individual license to my poem!

Science March NYC 2017I dressed up as the "Trump Resistor" for the Science March in New York City. For a brief moment I trended on Twitter. As a character in Times Square, I was more popular than the Naked Cowboy!

In May, I tried to explain Readium's "lightweight DRM"Update: No one really cares - DRM is a fig-leaf anyway.

In June, I wrote about digital advertising and how it has eviscerated privacy in digital libraries.  Update: No one really cares - as long as PII is not involved.

I took on the administration of the free-programming-books repo on GitHub.  At almost 100,000 stars, it's the 2nd most popular repo on all of GitHub, and it amazes me. If you can get 1,000 contributors working together towards a common goal, you can accomplish almost anything!

In July, I wrote that works "ascend" into the public domain. Update: I'm told that Saint Peter  has been reading the ascending-next-monday-but-not-in-the-US "Every Man Dies Alone

I went to Sweden, hiked up a mountain in Lappland, and saw many reindeer.



In August, I described how the National Library of Medicine lets Google connect Pubmed usage to Doubleclick advertising profilesUpdate: the National Library of Medicine still lets Google connect Pubmed usage to Doubleclick advertising profiles.

In September, I described how user interface changes in Chrome would force many publishers to switch to HTTPS to avoid shame and embarassment.  Update: Publishers such as Elsevier, Springer and Proquest switched services to HTTPS, avoiding some shame and embarrassment.

I began to mentor two groups of computer-science seniors from Stevens Institute of Technology, working on projects for Unglue.it and Gitenberg. They are a breath of fresh air!

In October, I wrote about new ideas for improving user experience in ebook reading systemsUpdate: Not all book startups have died.

In November, I wrote about how the Supreme Court might squash out an improvement to the patent system. Update: no ruling yet.

I ran a second half marathon!


In December, I'm writing this summary. Update: I've finished writing it.

On the bright side, we won't have another prime year until 2027. 2018 is twice a prime year. That hasn't happened since 1994, the year Yahoo was launched and the year I made my first web page!

Sunday, November 26, 2017

Inter Partes Review is Improving the Patent System

Today (Monday, November 27), the Supreme Court is hearing a case, Oil States Energy Services, LLC v. Greene’s Energy Group, LLC, that seeks to end a newish  procedure called inter partes review (IPR). The arguments in Oil States will likely focus on arcane constitutional principles and crusty precedents from the Privy Council of England; go read the SCOTUSblog overview if that sort of thing interests you. Whatever the arguments, if the Court decides against IPR proceedings, it will be a big win for patent trolls, so it's worth understanding what these proceedings are and how they are changing the patent system. I've testified as an expert witness in some IPR proceedings, so I've had a front row seat for this battle for technology and innovation.

A bit of background: the inter partes review was introduced by the "America Invents Act" of 2011,  which was the first major update of the US patent system since the dawn of the internet. To understand how it works, you first have to understand some of the existing patent system's perverse incentives.

When an inventor brings an idea to a patent attorney, the attorney will draft a set of "claims" describing the invention. The claims are worded as broadly as possible, often using incomprehensable language. If the invention was a clever shelving system for color-coded magazines, the invention might be titled "System and apparatus for optical wavelength keyed information retrieval". This makes it difficult for the patent examiner to find "prior art" that would render the idea unpatentable. The broad language is designed to prevent a copycat from evading the core patent claims via trivial modifications.

The examination proceeds like this: The patent examiner typically rejects the broadest claims, citing some prior art. The inventor's attorney then narrows the patent claims to exclude prior art cited by the examiner, and the process repeats itself until the patent office runs out of objections. The inventor ends up with a patent, the attorney runs up the billable hours, and the examiner has whittled the patent down to something reasonable.

As technology has become more complicated and the number of patents has increased, this examination process breaks down. Patents with very broad claims slip through, often because the addition of the internet means that prior art was either un-patented or unrecognized because of obsolete terminology. These bad patents are bought up by "non-practicing entities" or "patent trolls" who extort royalty payments from companies unwilling or unable to challenge the patents. The old system for challenging patents didn't allow the challengers to participate in the reexamination. So the patent system needed a better way to correct the inevitable mistakes in patent issuance.

In an inter partes review, the challenger participates in the challenge. The first step in drafting a petition is proposing a "claim construction". For example. if the patent claims "an alphanumeric database key allowing the retrieval of information-package subject indications", the challenger might "construct" the claim as "a call number in a library catalog", and point out that call numbers in library catalogs predated the patent by several decades. The patent owner might respond that the patent was never meant to cover call numbers in library catalog. (Ironically,  in an infringement suit, the same patent owner might have pointed to the broad language of the claim asserting that of course the patent applies to call numbers in library catalogs!) The administrative judge would then have the option of accepting the challenger's construction and open the claim to invalidation, or accepting the patent owner's construction, and letting the patent stand (but with the patent owner having agreed to a narrow claim construction!)
Disposition of IPR Petitions in the first 5 years. From USPTO.

In the 5 years that IPR proceedings have been available, 1,153 patents have been completely invalidated and 287 others have had some claims cancelled. 331 patents that have been challenged have been found to be completely valid. (See this statistical summary.) This is a tiny percentage of patents; it's likely that only the worst patents have been challenged; in the same period, about one and a half million patents have been granted.

It was hoped that the IPR process would be more efficient and less costly than the old process; I don't know if this has been true but patent litigation is still very costly. At least in the cases I worked on had correct outcomes.

Some companies in the technology space have been using the IPR process to oppose the patent trolls. One notable effort has been Cloudflare's Project Jengo. Full disclosure: They sent me a T-shirt!


Update (November 28): Read Adam Liptak's news story about the argument at the New York Times
  • Apparently Justices Gorsuch and Roberts were worried about patent property being taken away by administrative proceedings. This seems odd to me, since in the case of bad patents, the initial grant of a patent amounts to a taking of property away from the public, including companies who rely on prior art to assure their right to use public property.
  • Some news stories are characterizing the IPR process as lopsided against patent owners. (Reuters: "In about 1,800 final decisions up to October, the agency’s patent board canceled all or part of a patent around 80 percent of the time.") Apparently the news media has difficulty with sampling bias - given the expense of an IPR filing, of course only the worst of the worst patents are being challenged; more than 99.9% of patents are untouched by challenges!