Libraries and copyfraud

For the last week, I’ve been exchanging emails with curators at the Huntington Library about their use policies for digital images. For the Darwin Day 2009 Main Page effort on Wikipedia, I’ve been putting together a list of portraits of Darwin. Although a number of websites have significant collections of Darwin images, there isn’t any single comprehensive collection. One interesting shot I came across is an 1881 photograph, possibly the last one before Darwin’s death, that was allegedly “rediscovered” in the mid-1990s when a copy was donated to the Huntington. Press releases and exhibition descriptions invite people to contact the Huntington to request images, so I requested the Darwin photo. The response I got was typical of how libraries and archives deal with digital copies of rare public domain material.

The Huntington quoted distribution fees for the digital files (different sizes, different prices), and also asked for specific descriptions of how the image would be used, so that the library could give explicit permission for each use. Had I wanted to use it for more than just publicity (e.g., in a publication) more fees would apply. Apparently the curators were not used to the kind of response they got back from me: I politely but forcefully called them out for abusing the public domain and called their policy of attempting to exert copyright control over a public domain image “unconscionable”.

In the exchange that followed, I tried to explain why the library has neither the moral nor legal right to pretend authority over the image (although, I pointed out, charging fees for distribution is fine, even if their fees are pretty steep). A Curatorial Assistant, and then a Curator, tried to explain to me that the Huntington actually has generous lending policies (you don’t “lend” a PD digital image, I replied), that while the original is PD, using the digital file is “fair use” that library has the right to enforce (fair use, by definition, only applies to copyrighted works, I replied), that having the physical copy entails the right to grant, or not, permission to use reproductions (see Bridgeman v. Corel, I replied), that other libraries and museums do the same thing (that doesn’t make it right, I replied), that big corporations might use it without giving the library a cut if they didn’t claim rights (nevertheless, claiming such rights is called copyfraud and it’s a crime, I replied), and finally that I should contact the Yale libraries and museums and see if they do things any differently (a return to the earlier “everyone else does it” argument with a pinch of ad hominem for good measure, to which I see no point in replying).

Unfortunately, the Curator is right that copyfraud is standard operating procedure for libraries and archives. Still, I think it’s productive to point out the problem each time one encounters it; sooner or later, these institutions will start to get with the program.

As an aside, the copyright status of this image is rather convoluted. The original is from 1881. The photographer, Herbert Rose Barraud, died in 1896. The version shown here (originally; now lost) is a postcard from 1908 or soon after, making it unquestionably public domain. It comes from the delightful site Darwiniana, a catalog of the reproductions and reinterpretations of Darwin’s image that proliferated in the wake of his spreading fame. Apparently, when the image was “rediscovered” in a donation to the Huntington, they thought it had never been published and was one of but two copies; a short article about the photograph appeared in Scientific American in 1995. Had it actually never been published until then, it would arguably be under copyright until 2047 because of the awful Copyright Act of 1976. I say “arguably” because of the vague definition of “publish” and the rules for copyright transfer (“transfer of ownership of any material object that embodies a protected work does not of itself convey any rights in the copyright”) combined with the fact that another copy exists would seem to indicate that, at the very least, the Huntington has no place claming copyright. Paradoxically, publishing it for the first time in 1995 would have extended the copyright to 2047 but would have made the Huntington and/or Scientific American into violators of the copyright of whoever actually owned it (which would likely be indeterminable). But if it had remained unpublished, it would be public domain. I’m still unclear about whether it would have been public domain before 2002, when the perpetual copyright window of the 1976 law closed.

UPDATE – My thanks to the others who’ve linked to and discussed this post:

Wikipedia’s search engine dominance = informational homogeneity?

Nicholas Carr (of “Is Google Making Us Stupid?” fame) is a consistent source of thought-provoking but (in my view) off-base critiques of the information age in general and Wikipedia in particular. He has an interesting post on the Britannica Blog, “All Hail the Information Triuvirate“. This coincides with Britannica’s roll out of new features to invite readers to suggest improvements, and some of the usual impotent snipes from Robert McHenry and other Britannica editors. Wikipedia gets 97% of all encyclopedia traffic on the Internet, so they have little to do but whine about the culture that let this happen and/or try to learn from Wikipedia’s success.

A favorite tactic of Wikipedia critics is to bemoan Wikipedia’s search engine success. Carr demonstrates Wikipedia’s dominance of results from the most popular search engine (Google), showing that for ten diverse searches that he first ran in August 2006, then again in December 2007, and again this month, Wikipedia articles rose from an average of placement of #4 to being the #1 hit for all ten searches. Carr “wonder[s] if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing” and has difficulty imagining “that Wikipedia articles are actually the very best source of information for all of the many thousands of topics on which they now appear as the top Google search result.” But this rings shallow without examples (say, for any of his ten searches) of what single web pages would be better starting points.

The idea that the Net has become “radically homogeneous” just because Wikipedia is often the first Google hit is absurd. Wikipedia itself is far from homogeneous, and indeed its great strength is the way it brings together the good parts of many of the other sources of information on the Internet (and beyond). Carr’s implication seems to be that without Wikipedia (the “path of least resistance” for information delivery) search results would be better and finding valuable web content would be easier.

Carr seems to conceive of Wikipedia as a filter placed over Google that lets through only a homogeneous mediocrity. Wikipedia is better thought of as refined version of Google’s method of harnessing the heterogeneity of the Internet; where Google relies on a purely mechanical process, Wikipedia brings together sources with consideration of the individual topic at hand and human evaluation of the importance and reliability of each source.

Public weighs in on Flagged Revisions

Andrew Lih’s blog post “English Wikipedia ready for Flagged Revisions?” is a nice overview of the big news this week: it seems likely that some form of the Flagged Revisions extension is finally going to be used. For more details on the on-wiki discussion, this soon-to-be published Signpost article is a good place to start.

The comments on the NYT blog story on this development give a nice cross-section of public perceptions of Wikipedia among the Times’ audience, and their reactions to the possible change in the way the site works. Some choice quotes:

  • It’s a cesspool of misinformation and bias. Now that the Wikipedians are in charge, it will become even more useless as a reliable resource.

    Someone needs to be monitoring the Wikipedians. They are not to be trusted with the interpretation of things. -Wango

  • It’s a living, multidimensional document and I’m of the mind that it should be left the frak alone […] WIKIPEDIA NEEDS MISTAKES if it is to remain the vital document that it is today. Living things change, static dead things are perfect and immutable. -jengod
  • It’s not arrogant for wikipedia – or any source of authoritative information – to want to be right […] Grafitti on the wall may be instructive, but it does not make the wall more valuable or more purposeful. -Frank
  • Any edit beyond spelling, grammar and syntax, must be considered suspect, if done by a minor, an artist or any individual that does not have any expertise on the subject. -CGC
  • The real bad blunders are almost always corrected within hours (if the article is of no great interest) or minutes (if it is). So why bother? The true capital of Wikipedia is ALL of its contributors – and not just the “trustworthy” elite. Such measures will discourage new, fresh, motivated contributors, and in the long run dry out the project. -Oystein
  • It’s a standard fascist procedure to declare an outrage and then restrict freedoms under the guise of making things better for all. I’m not saying that’s what Wales is doing. Just saying that it sounds like a jack-booted tactic. -Kacoo
  • Is it possible that [the anons who ‘killed’ Kennedy and Byrd] weren’t vandals at all, but just people trying to be “that guy” who made the change to such an important entry. Who knows? -Light of Silver

Biting the newbies on Wikipedia

An example:

  1. New user creates article in good faith.
  2. Two minutes later, editor tags it for speedy deletion; article “does not indicate the importance or significance of the subject” (even though it does make a basic claim of notability).
  3. New user responds on talk page, explaining in more detail why the subject is significant by noting newspaper and magazine coverage.
  4. Administrator deletes article without either checking talk page or verifying speedy deletion rationale: “No indication that the article may meet guidelines for inclusion”.
  5. Newspaper publishes yet another piece of journalism that makes Wikipedia seem like a petty and unfriendly place and shows how overzealous deletion makes Wikipedia worse.

Will the Stanton usability grant stop Wikipedia community atrophy?

The recent Stanton Foundation grant to improve MediaWiki’s usability hopefully will lower the barrier for computer novices to get started on Wikipedia editing. This comes at an opportune time: we recently learned that the size of the Wikipedia community has not only stopped growing exponentially, it actually has been gradually shrinking since early 2007. The most likely causes of the decline include:

  • lack of “low-hanging fruit”
  • lack of new potential editors who are just discovering Wikipedia
  • Wikipedia’s scope gradually narrowing to mirror that of traditional encyclopedias (a.k.a., deletionism run amok)
  • Wikipedia’s occasionally expert-unfriendly culture that turns off those with the most to contribute
  • a Wikipedia culture that gives little priority (or even respect) to activities focused on the community itself rather than the encyclopedia
  • the natural decline in participation of early community members; according to Meatball Wiki, users of any online community generally say GoodBye after between 6 months and 3 years unless that community is connected to their offline lives

Usability improvements, it is hoped, will open editing opportunities to people who are scared off by the intimidating and sometimes overwhelming markup that appears when one clicks “edit”.

Whether or not this will halt or reverse the decline in editing activity on English Wikipedia is tied up with several conflicting currents of thought in the community. As Liam Wyatt and Andrew Lih have been pointing out in recent Wikipedia Weekly podcasts (66 and 68 are both very astute discussions), the standards for what is and is not valuable content have been shifting consistently towards the convential encyclopedia definition of valid topics. Quirky lists, small organizations that don’t meet the ever-harsher notability standards, obscure books and concepts, anything ScienceApologist finds to be an illegitimate invocation of scientific authority, anything deemed too ‘mere news’, and, increasingly, simply anything that wouldn’t be found in tradional encyclopedias–these are candidates for deletion.

The implications of deletion trends for community health are not entirely straightforward. Overzealous deletion leaves a sour taste in the mouths of many editors who have spent a lot of time adding the kinds of content that now gets deleted regularly. Some leave because of it, or lose their enthusiasm. On the other hand, a lot of what gets deleted is simply weak, unsourced content; removing it the article pool means that new editors will not base their own contributions on such bad examples. Deleting content on the borderline of notability, or better yet, downright notable and significant topics, also replenishes the supply of low-hanging fruit. If someone thought a topic deserved an article, someone in the future may think the same thing and recreate it in better form. Citizendium recognized the advantage of redlinks early on, and decided to start from scratch rather than from a Wikipedia dump.

And while about two-thirds of those polled want to see Flagged Revisions implemented, the other third think it would be too much of a dilution of the “anyone can edit” ethos. Although I’m in favor of Flagged Revisions, it’s not clear to me whether it would improve or worsen the problem of commnity atrophy. It’s a question of balance: some people are drawn in by ‘instant edit gratification’, while others are turned off by the perceived free-for-all nature of Wikipedia and assume their contributions would simply be swept away in the chaos. So the lure of stability might or might not outweigh the immediate thrill of seeing one’s edits go live. (I suspect the waiting, and the tacit acknowledgement of good work when someone approves a newbie’s edit, would do more to draw in new users to the community than the instant, impersonal status quo.)

So how would improved usability shake things up? On the one hand, it might spark a wave of naive article creation followed immediately by a wave of deletion of new content produced by newbies with no grasp of the community’s standards. If someone can’t figure or won’t figure out how to use basic wiki markup (says the cynic), how can we expect them to use proper sourcing and adhere to Wikpedia’s core policies of NPOV and Verifiability? Lowering the barriers to entry might just exacerbate the us-versus-them mentality of deletionism. On the other hand, maybe a host of new users would integrate well with the community and restore some of its past vitality while pulling the philosophical center back a bit from the deletionist brink. (Of course, it’s an open question how much usability improvements could actually affect the influx of new users; the difference might be rather small, if lack of tech savvy is highly correlated with other factors that make people unlikely to edit.)

As Erik Zachte has pointed out (in the earlier version of this post), many Wikipedias are still growing; English Wikipedia is not the be-all, end-all. It is not clear whether each language will follow a similar pattern in the rise and peak of community (accounting for number of speakers, connectivity, and economic issues) or whether different languages can develop sufficiently different Wikipedia cultures to avoid the failings of English Wikipedia (or perhaps generate unique problems of their own).

Wikipedia blogging outside the Wiki Planet orbit

The main discussion platforms in the Wikimedia community can be pretty insular. Lots of people write about their (often negative) experiences with and views on Wikipedia, and only a handful are part of Wiki Blog Planet, post on the Village Pump or the mailing lists, or hang out on freenode IRC. So I like to browse the wider world of Wikipedia blogging. Lots of other people do this too, I know, because usernames I recognize often appear in comments sections. Here’s what I found this time.

  • Have You Ever Edited Wikipedia? – a thoughful post on notability by Terrance of The Republic of T. , explaining why he stopped contributing articles on victims of LGBT-related hate crimes.
  • The coming Wikipedia election. – an interesting take on the way Wikipedia is increasingly significant for U.S. state-level politics, by an Virginia political junkie (User:WaldoJ).
  • Is it safe to edit Wikipedia? – Kelly Martin’s Nonbovine Ruminations isn’t on the wiki blog aggregators any more, but she posted this a few weeks ago.
  • final group project: editing USF’s wikipedia page – University of San Francisco media studies professor David Silver is running a Wikipedia assignment, group editing of the USF article. To be more precise, he’s grading it. The comments on his blog post are heartwarming. See how the USF article has changed since the assignment started two weeks ago.

Also, for those who haven’t seen it, Robert Rohde (User:Dragons flight) has some vital, long-wanted editing frequency statistics for the English Wikipedia community. The long and short of it is that the size of the Wikipedia editing community peaked around March 2007. I’ve been playing around with the data, and there are lots of interesting things hiding in there.
Editing_frequency_-_20_mainspace_edits,_2001-2008
The big research/data crunching questions I have now relate to what the life course of a Wikipedia editor looks like? Anecdotally, active Wikipedians have a typical lifespan of a few years; most of the early contributors have left, and many of the most active editors today joined around the time I did or later (that is, in the 2005-2007 boom). Do many or most editors follow a typical pattern in their editing rate over the course of their involvement (e.g., rapid rise that levels off, then gradually declines before fading away)? Can we expect (or are we experiencing) a generational die-off in the wake of the exponential expansion period? What would a histogram of recent edits sorted by when editors joined look like?

Tougher questions that probably can’t be answered directly even with really great statistical analysis: Does Wikipedia attract a different kind of editor than it used to? How much of the pool of potential editors has been used up? Are there really significant numbers of potential editors who would contribute if usability issues were addressed?

These are the kind of stories Wikinews should be doing

The election numerology blog fivethirtyeight.com has been publishing a series of fascinating “On the road” posts by Sean Quinn and photographer Brett Marty. Quinn and Marty have been traveling through battleground states investigating the “ground game” of the McCain and Obama campaigns, reporting on the voter registration and get-out-the-vote operations managed by volunteers and paid staffers in the regional and local campaign offices.

See the latest few:

Individually, these might seem minor, but the series as a whole makes for an important story that has been largely neglected by traditional news sources. It’s also the type of thing Wikinews could excel at, with a little more organization. Wikimedians all over the U.S. could go out the same weekend and do stories on the local dimensions of these national campaigns, and the result could be something very special.

Bonus link:

  • The Wikipedian Candidate – an interesting analysis of the (it seems increasingly clear) ill-advised selection of Sarah Palin as McCain’s VP and the important things that don’t come across in a Wikipedia article, from fivethirtyeight.com’s Nate Silver

How are your Wikimedia Commons photos being used elsewhere?

I don’t know about yours, but I do have some idea of how mine are being used.

Google searches for my name and my username reveal a lot more instances than I was aware of, especially for news article illustrations.

In the “license, schmicense” category, I found this article from The Jerusalem Post, which takes a recent photo of mine (either from Flickr or Wikipedia, but more likely Wikipedia) as simply says “Photo: Courtesy:Ragesoss”.

Marginal cases include the hundreds of Google hits for “ragesoss” come from World News Network websites. This organization runs thousands of online pseudo-newspapers, such as the West Virginia Star and Media Vietnam, that aggregate content from real news organizations. Stories at all of their portals link to World News pages that have teasers for the actual articles at the original sources. And I’ve found a bunch of my photographs as illustrations on these pages. See these:

Of course, my photographs are not the ones used by original articles. World News seems to have used almost every photo I uploaded from the February 4 Barack Obama rally in Hartford, to illustrate campaign news unrelated to the Hartford rally. In terms of photo credits (see the links), most of them they say “photo: Creative Commons / Ragesoss” or “photo: GNU / Ragesoss”. Nearly all of my photos on Wikimedia Commons are copyleft under GFDL and/or CC-by-sa, so non-specific credits like that do not constitute legitimate use under the terms of either license. The GFDL requires a link to the license (GFDL, not “GNU”), and CC-by-sa at least requires notice that the image is free to reuse as long as derivatives are issued under the same license (simply “Creative Commons” is not a license). It is also implicit with CC licenses that credits for my photos should include a link to my Commons userpage, since the author field on the image pages is typically a link titled “Ragesoss”, not just the text. (The third link above, among others I found, does link to the GFDL, although the photo has nothing to do with the article.)

Another major user of my photos is Associated Content, a commercial user-generated content site that pays contributors. AC is a mixed bag in terms of legitimate uses of photos, since individual contributors are responsible for selecting and crediting the illustratons for their articles. This one, which uses a photo of Ralph Nader, credits my shot as “credit: ragesoss/wikipedia copyright: ragesoss/GNU FDL 1.2”. It almost meets the basic requirements of the license (all it needs is a link to the text of the license), although a link to the source would preferable to simply mentioning Wikipedia. This one, on the other hand, just says “credit: Ragesoss copyright: Wikimedia Commons”.

Popular Science, in this article, lists the GFDL, but links it to the Wikipedia article on the license rather than the actual text.

The Bottle Bill Resource Guide links to my Commons userpage, but does not list the license or link to the image source.

Another partly-legit use is by LibraryThing, a book related site that uses several of my photos for authors (e.g., Dava Sobel). They include links back to the original image pages, but the site behaves erratically and sometimes insists on me signing in or creating an account to view the image details.

Unexpectedly, I also found several of my photos illustrating Encyclopedia Brittanica. See:

In each case, they provide a link to one of the licenses (GFDL 1.2 and CC-by-sa 3.0 unported, in these cases), although they don’t provide a userpage link. At least they seem to take the licenses seriously.

Of course, it’s much tougher to find out where my photos are being used without mentioning me at all. I suspect that the majority of uses don’t even attempt to assign credit or respect copyright. Most of the publications that are serious about copyright aren’t even willing to use copyleft licenses, preferring to get direct permission from the photographer (even if it means paying, often).

Fun photo project for Wikipedian photographers

Taking pictures to illustrate Wikipedia articles is the reason I got into photography. I started with my wife’s point-and-shoot, and pretty soon I started to appreciate the joys of photography for their own sake…and I started to experience that strong desire for better and still better equipment. A few weeks ago I finally realized my long-time goal of shooting an original Featured Picture (FP), this ‘Peach Glow’ water lily.

My equipment (Canon EOS 400D, 50 mm prime lens, 18-55mm kit lens, and low-end 70-300mm superzoom/macro) is not professional, but it’s not cheap either. With my setup and my intermediate skill level, the circumstances under which I could take an FP are pretty narrow.

But there are many opportunities for taking valuable photos for Wikipedia. A project that I just completed, which many American Wikipedians could do as well, was to take photos of every Registered Historic Place in my town. In West Hartford, there are 28 Registered Historic Places, only a few of which had images or articles. But there is a wonderful List of Registered Historic Places in Hartford County, Connecticut, that lists the addresses and geographical coordinates for every one in my town and the surrounding towns. It has slots for thumbnail images, so even the ones without articles have a home for photos, and there is even a Google Maps link at the bottom that maps out every place on the list.

I spent a couple days doing bike trips to all the West Hartford places on the list, and now I’ve shot them all. Now I’m starting a series of longer trips to shoot the places in neighboring towns. It’s definitely been worthwhile; I learned a lot about local geography, got some exercise, and took a bunch of photos.

Not all local NHRP lists have the useful table format that the Connecticut lists have (and the Western U.S. has relatively few registered places), but the NHRP WikiProject can help and there is a tool for automatically generating formatting lists by county. There are currently only a handful of lists that are fully illustrated so far, but I hope eventually to add the Hartford County, Connecticut list to that group. An even more ambitious goal would be to create articles for all the places on that list, but I’m afraid there may not be relevant sources for most of them.

Wikipedia’s epistemological methods

A colleague of mine recently asked me about Wikipedia’s policy on sources and evidence, Wikipedia:Verifiability (WP:V). In short, the threshold for including content in Wikipedia is that it be “verifiable, not true”. Truth alone, without appropriate evidence that fits with the Wikipedia community’s standards, is not enough to justify adding something to Wikipedia.

You can interpret this in a number of ways. For some, it’s an embodiment of post-modern notions of truth and subjectivity (people disagree about truth, so we don’t let people simply add what they know to be true, instead relying on authority). For others, it’s just a practical concession to the sociological nature of Wikipedia, in which some people are more objective and more capable than others (and those are the people that know how to leverage authority effectively). The Verifiability standards could also be taken as a fundamentally rhetorical, rather than epistemological, policy: communal standards of evidence ensure a basic level of apparent reliability, since readers can be pointed straight to relevant authorities. (Citizendium, in contrast, as has looser evidentiary standards and relies in part on the personal authority of its Authors and Editors.)

From an academic standpoint, there are plenty of relevant sets of literature that bear on the problems that Wikipedia’s evidentiary standards and policies attempt to deal with. But from my own perspective as a historian of science, I think the parallel to scientific epistemology and evidentiary norms is an interesting one.

WP:V works in ways that are closely paralleled in scientific (and historical) method as it is actually practiced. Communities of scientists have various norms (mostly unwritten) for what does and does not constitute legitimate evidence for making novel scientific claims. These norms are highly context dependent, and can include (for exclude) experiments, reference to the work of others, reasoning and rhetoric, visual evidence, artificially simulated data, etc., depending on field and venue. Verifiability in the traditional scientific sense of experimental repeatibility is actually very rarely a consideration in science (and in fact, many philosophers and historians of science have argued that repeating experiments is rarely possible and almost never desirable… the questions instead are, do the results accord with the results of related experiments?, can we build on these results?, etc.)

Science, as scientists are increasingly willing to admit in recent decades, is about what is verifiable rather than true in a similar sense to WP:V, since experimental science is increasingly conducted in largely artificial physical contexts. What happens in the lab is hoped to be a faithful reflection of what happens in nature, but the whole point of the lab is to isolate certain parts of nature so that they can be studied without all the complicating factors…and sometimes those complicating factors mean that a given experimental result may actually only be “true” for the very peculiar and artificial set of circumstances tested. The analogous situation on Wikipedia is when a seemingly reliable source is wrong; all the Wikipedian can do, without other sources to compare it to, is either limit claims to “source X says Y” (instead of just claiming Y and citing X) or ignore the source altogether. On Wikipedia we also hope that what the sources says accords with reality, but (for sociological rather than technical reasons) editors can’t go out and probe reality in its full complexity and must stick within the (negotiable) norms (which, like in science, are tailored to try to maximize the chances of accord between evidence and reality).

WP:V, and Wikipedia’s approach to sourcing and evidence more broadly, is just a different set of evidentiary norms, suited to a different group of people with a different purpose.