Josh Greenberg, Zotero, and Scholarship 2.0 (!! Beta! Zap! Pow!)

Today, my department’s Holmes Workshop speaker was Josh Greenberg (aka, Epistemographer): an historian/STSer/hacker, formerly of the Center for History and New Media, now the “Director of Digital Strategy and Scholarship” (how rad a title is that?) at the New York Public Library.

I’ve been following the CHnM for a while now, and I had read about their flagship project Zotero, but I never realized what a revolutionary vision they have for this thing. Zotero is a Firefox plugin that does citations. It was initially conceived as an open source replacement EndNote (the only selling point for which, from what I hear, is that it’s not quite as bad as Word for footnotes).

In his introduction, Josh had an insightful comparison of “Finding vs. Searching”, basically the difference between an organized hierarchy of information (e.g., early Yahoo!, library stacks, and bibliographies), in which serendipitously finding things is the great benefit, and using the ubiquitous search boxes of the modern internet (e.g., Google, online library catalogs), with which you are searching for finite results in an undifferentiated database where anything outside the search parameters is simply invisible. (By random coincidence, he had randomly included this picture by me as an icon of the finding mode; hooray for unattributed syndication!).

Part of the goal of Zotero is to harness the best of both the searching and finding modes by adding a Web 2.0 social element to the citation program. This summer, the developers will be launching a Zotero server that will archive a user’s citation database so that it can be accessed from anywhere and retained in case of hardware failure. The upshot is that, unless the user opts out, the citation database will be used (sans private information, if desired) to create a sort of del.icio.us for scholarly material. Zotero will be useful enough to be used on its own, with the aggregate social aspect as icing that brings the potential for scholarly collaboration and recommendation to a new level. You can find other bibliographies similar to yours to see what like-minded scholars are reading that you aren’t, and you might be able to find other scholars you didn’t know about with similar research interests. In future versions, you’ll be able to share your marginalia, your original sources (interviews, photographs from archives, etc.), etc.

What makes Zotero cool today is the ability to automatically pull citation data from a large and ever-growing list of online sources. So you do a search on your local library catalog, and with one click you import the metadata for that source to your library. Then, when you want to cite that source, you have a wide range of output options (MLA, Chicago Style, EndNote, etc.). What sold me is that it even does export in Wikipedia citation template syntax. I never use the cite templates, because it’s usually easier to just type in the references how I want them. But with Zotero, I’m going to start using them. For the Wikipedians reading this, I recommend trying it out (make sure you get Beta 4, from the Zotero website; the one straight from Firefox is out of date and doesn’t have the Wikipedia support). It’s under heavy development and improving rapidly, but it’s already a very helpful thing.

Superb Wikipedia podcast; Ideas for Wikipedia to steal

There’s an extremely, superbly, awesomely good Wikipedia debate podcast at Language Lab Unleashed! It’s not good because it’s so correct (there are a number of misunderstandings, clichés, and analog wine in digital bottles) or insightful (Wikipedians have hashed out most of discussion many times over), but it gives a great cross-section of the ways academic humanists view Wikipedia.

The star of the show is Don Wyatt, chair of History at Middlebury College. He’s a classic curmudgeon, and gives voice to much of what I despise about the culture of the modern academy (a regular topic of my polemics), though he seems like a nice enough guy and it’s a rich and eloquent voice he gives it. Most of the comments coming out of Middlebury have been notably consonant with the wiki way (hence Jimbo’s endorsement of their official policy). But the policy was obviously a compromise, with Wyatt at the far end, viewing Wikipedia as a fundamental flawed endeavor and an unequivocal waste of time for any real scholar.

On the other end, Bryan Alexander and Robert Berkman (you know a geek when you hear one) have a good grasp of Wikipedia’s virtues, real and potential. In the middle is Elizabeth Colantoni, who is running a Wikipedia assignment at Oberlin (shoutout to User:WAvegetarian, apparently the student who inspired the assignment).

One of the best parts starts at around 55:15 (spun off from issues first posed beginning at 46:18), exploring the confluence of philosophy, epistemology, and copyright, with attitudes of today’s academics contrasted with the kids these days (and projecting into the future of the academy, when us kids will be in charge).

In other news, I found a major Wikipedia assignment I hadn’t noticed before: Marx Blog, the class blog of Derek Stanovsky at Appalachian State University, which is being used to write a monumental article/outline on Capital, Volume I.

Via Mills Kelly, I found a very cool site whose concept Wikipedia should steal: Swivel.com. Users upload data sets (in spreadsheets), and the site creates a huge and flexible array of graphs. Multiple data sets can be used to make a single graph, so that it would be easy to create custom graphs for specific articles, with baselines of some sort of general data graphed together with more specific data (e.g., and non-sequiter mash-up of Wikipedia stats with the temperature in Fresno). Kelly describes it as a Flickr for data (in another excellent Digital Campus podcast, though with no mention of Wikipedia this time, except for a plug of Joseph Reagle’s recent plagiarism post). There is a lot of room for improvements in Swivel’s functionality, but the bigger reason Wikipedia needs to steal the concept is that (in my humble opinion) the potential reach of “a Flickr for data” is rather limited unless it’s part of a larger project.

A watershed in the history of Wikipedia?

Now may be as arbitrary a time as any other to identify as a watershed in the history of Wikipedia, but it seems like things are changing in a number of ways.

The most obviously change is in public perception and media coverage. Between the Middlebury College story, the Essjay story, and the Sinbad story, Wikipedia has been a constant presence in the headlines for several weeks running. With the possible exception of Essjay, none of these is even close to the significance of the Seigenthaler controversy, but the volume of related news and blog noise since late January (when the first of these stories emerged) has been as large or larger.

The remarkable thing about the Middlebury story is that it’s the only one like it; all the media stories have focused on it because no similar policies (beyond individual professors) have been enacted. The history department banned citations of Wikipedia, but actually endorsed Wikipedia as a starting point for research (and held an excellent recorded debate which highlighted the non-sensational reality of Middlebury situation, and includes an eloquent argument for the pedagogical value of Wikipedia). Meanwhile, many other professors have been interviewed by student newspapers (e.g., 1, 2, 3, 4, 5, 6, and many more) and professional news organizations, and written their own defenses of Wikipedia in venues like The Chronicle of Higher Education and The New Republic. The second Digital Campus podcast has a follow-up on the previous Wikipedia discussion I mentioned, with some commentary on a few of these stories; apparently, reporters are practically knocking these professors’ doors down requesting Wikipedia-related expert opinions. Professor have by and large become familiar enough with Wikipedia to respect its strengths and not project too many of the weaknesses they expect.

Other Wikipedian bloggers have covered the recent challenges to and discussions about Jimbo’s nebulous role in Wikipedia governance; see Joseph Reagle and Stephen Bain for more.

Another side to the watershed, which nobody is quite recognizing yet, relates to the limits of Wikipedia. The exponential phase of (English) Wikipedia’s growth (in terms of number of articles, and in terms of number of active users) is probably over. From 2003 to mid-2006, the number of articles had followed a very regular exponential pattern. Had exponential growth continued, it would have hit 2,000,000 a few weeks ago; it just passed 1,700,000 today. The average number of articles created per day since late December (around 1724) has actually been lower than the average number per day over the previous year (1823). This difference is only partly the result of the always slower holiday season. It seems that available unwritten encyclopedic topics is becoming a significant constraint.

The number of active users is harder to gauge, since Erik Zachte’s statistics page has not been updated for the English Wikipedia since mid-October. However, we can probably look to the German Wikipedia as a rough analog, since German Wikipedia seems to have a larger level of market saturation, when you account for the ratio of English-speakers to German-speakers (~8:1). The number of “active“(5 edits per month) and “very active” (100 edits per month) German Wikipedians seems to have plateaued in August 2006, at about 7500 and 1000 respectively. At that time, English Wikipedia had around 44,000 actives and 4500 very actives. If English Wikipedia’s active community has continued to expand towards cathcing up with German ratio-wise, we could expect the number of very active Wikipedians to max out around 8000 somewhere near the end of 2007. However, unlike English, German Wikipedia has had a near linear growth curve since early 2004 in terms of number of articles. I don’t know why article number has grown linearly while editor numbers were growing exponentially (until they plateaued), but it seems likely that because of topic saturation, English Wikipedia will plateau (or peak) in terms of editor:speaker ratio at a lower level than German. Consistent with the watershed thesis, my guess is that active community size is plateauing right now.

Of unknown but likely relevance to the watershed, two central Wikimedia employees announced their resignations today (apparently for unrelated reasons): Danny Wool and Brad Patrick. Both have implied that as independent Wikimedians they will, in the immediate to intermediate future, be bringing forward some constructive criticisms of the way the Wikimedia Foundation runs.

Other recent Wikipedia reading material:

Digital Campus podcast: “Wikipedia: Friend or Foe”

GMU’s Center for History and New Media has a new podcast that launched a week and a half ago: Digital Campus.

It’s “A biweekly discussion of how digital media and technology are affecting learning, teaching, and scholarship at colleges, universities, libraries, and museums”, and the first episode is on Wikipedia.

The intro music is worth listening to. After that, I recommend skipping to 17:33, when the Wikipedia discussion begins.

What you would be skipping includes:

  • Vague speculation about Windows Vista
  • Banter about the value and limitations of Google Docs
  • Hand-wringing about a recently granted overly broad patent for Blackboard Inc.

Wikipedia topics include:

  • Mills Kelly explaining why he is using Wikipedia as the “textbook” for his Western Civ course this semester
  • The similarity between constructing knowledge on Wikipedia an in scholarly venues, as revealed by those pages “hidden” behind the articles
  • How Citizendium’s name is crappy, and how in the end scholars are going to have to “roll up their sleeves and just get involved with the main Wikipedia” to set things straight
  • How scholars write for themselves and their peers too often, when they should be engaged with and teaching their students about the “enthusiast communities” like Wikipedia
  • What Wikipedia could do better to work with the professional community
  • “Specialized wikis for specialized topics in specialized communities” and the ways Wikipedia (and it’s pitfalls) may overshadow the wiki technologies
  • Friend-or-foe conclusion: “sometimes unreliable, sometimes stands you up, but good friend”

Overall, it’s pretty good.

(via T. Mills Kelly at edwired)

A day in the wiki life

I went to bed last night with the express intention of focusing all of today’s energy on the Johannes Kepler Wikipedia article. The article, which I re-wrote almost entirely from scratch and replaced the old version with in mid-December, and have been gradually improving with the help of others in the meantime, was nominated for Featured Article status by a passerby about a week ago. This slipped under my radar until just recently (as I was still recovering from qualifiers and working on my Wii Tennis skills to avoid mental exertion), but it garnered a lot of positive feedback… despite not being finished. I had planned to spend spring break working on it casually, for a final push toward featured quality (and completeness), but the Great Wikipedia Spirit had other plans.

Anyhow, I spend some time working out the language kinks after waking up in the mid-afternoon. Then I took a break and stopped by one of those interesting but rarely-visited (by me, at least) areas of Wikipedia, the Humanities Reference Desk. I gave my two cents on what a self-motivated student interested in art history should be reading, and the proverbial three hours of fascinated clicking later, I had found my way to an interesting topic that, horror of horrors, didn’t have a Wikipedia article! So now there is a short, unsourced article about the Hockney-Falco thesis, but still no discussion of the historiographical and philosophical legacy of Kepler. (Although, the Hockney-Falco thesis is only two degrees of Wikipedia from Kepler: the Hockney-Falco thesis is about optical aids to drawing like the (click-1) camera lucida, which was first described by (click-2) Johannes Kepler in Dioptrice… a fact I did not know before this evening, despite writing the Kepler article. Wiki works in mysterious ways.)

In other Wikipedia fun (as opposed to the Wikipedia work I ought to be doing to finish the Kepler article), I recently started a trend at Featured pictures candidates (where I learned, and am learning, everything I know about photography). Now a number of editors are demanding adequate extended captions for featured image candidates, and new nominations are starting to appear that take this into account. Hooray for context!

The FPC process on English Wikipedia is an interesting beast. As with Featured Articles, the standards for Featured Pictures have risen enormously over the last two years or so. And the way editors at FPC analyze pictures, and what they expect out of a good picture, is quite different than what a random viewer values in an image. The most dramatic example of this is the Picture of the Year on Wikimedia Commons. Commons has a separate featured picture process (which unlike on Wikipedia, does not take encyclopedicity into account), and it recently held a well-publicized vote for the best picture of the 321 that achieved featured status in 2006. The winner (below) was not yet an FP on Wikipedia, and its subsequent nomination only stood a chance of passing FPC (which it did not) in deference to its Picture of the Year status; it was widely criticized on both technical and aesthetic grounds.

1024px-Polarlicht_2

Wikipedia and Notability

Wikipedia:Notability (WP:N), one of the most cited, and most contentious, elements of Wikipedia editorial policy (it’s technically a “guideline”, which still means it’s a pretty firm part of the rules) looks to be on its way out. “Notability” is a concept that evolved from a sort of common sense “worth having in an encyclopedia” to a monster of Wikipedia jargon. Until recently, the stable version of the “primary notability criterion” was:

A topic is notable if it has been the subject of at least one substantial or multiple, non-trivial published works from sources that are reliable and independent of the subject and of each other.

In practice, the significance line for what makes an acceptable Wikipedia article has increasingly diverged from the official guideline, and the bar has dropped as the userbase has grown. Especially with entertainment, art, and other elements of popular culture, new editors and established ones who disagree with or don’t know about WP:N continue to write new articles on cultural ephemera and minutiae, and many others find value in such articles or at least don’t see any harm in including them (“wiki is not paper“). However, editors who are involved with policy and with deletion discussions (i.e., the ones who create and use notability policy) tend more toward deletionism than inclusionism (mergism is roughly my position).

Notability policy has been a cause of minor but growing irritation in the form bad press, especially in the recent weeks. Webcomics have attracted more attention that most areas of Wikipedia; a few complaints about deletions focused the attention of the Wikipedia community, which then resulted in stricter adherence to WP:N for minor webcomics (and hence more deletions), which fed into even more negative reactions within the webcomics community, etc., etc. The issue of notability, and with it the confusing and sometimes arbitrary conventions for deletion, has appeared in a few mainstream news pieces as well (such as Marshall Poe’s September 2006 article in The Atlantic Monthly); these have been less significant than the webcomics issue among editors, but have brought some of Wikipedia’s dirty laundry to a wider audience. Slate writer Timothy Noah, in a recent series of articles (some content of which was also on NPR and in the Washington Post), explored WP:N through the lens of watching the article on him go through the deletion process; ironically, the first article, which was actually about him, provided a level of sourcing for the article to pass WP:N muster.

The Wikipedia mailing list was aflame for several days over the constellation of notability issues, and the discussion there, beginning with Phil Sandifer’s report on a dinner discussion with comics expert Scott McCloud, finally generated enough heat for a real attempt at melting down the old WP:N and forging Wikipedia’s inclusion criteria anew.

A straw poll resulted in a clear lack of consensus for WP:N, even in a slightly looser form: it looks like half or more of the community wants to rebuild WP:N from scratch or ditch notability altogether and simply rely on the policy that everything in Wikipedia must be attributable to a published (though not necessarily paper) source. At this point, it looks like the most probable conclusion of the WP:N debate will be the adoption of the more flexible substitute Wikipeida:Article inclusion and the reformation of the subject-specific notability guidelines to be a baseline for automatic inclusion (assuming someone actually writes the article) rather than a justification for exclusion.

Whether one has a Wikipedia article is fast becoming a validation of someone’s fame and importance, in the popular imagination (or at least among many of the non-Wikipedians I’ve interacted with over the last two years or so). So there is some level of implicit understanding that not just anybody gets an article. But the notability process, most editors are starting to agree, is (or, hopefully, was) badly broken. It’s heartening to see that Wikipedia is not so resistant to change that it cannot deal with its scaling problems, though it remains to be seen how effective the response will be. If the fate of WP:N works out well, maybe there’s hope for another topic of frequent debate: the admin promotion process and the ever-increasing standards for adminship, and the resulting increase in admin workload (and perhaps admin burnout).

Wikipedia, Original Research, and popular culture

Wikipedia has a (nominally) strict policy of “No Original Research” (NOR), which is for the most part both necessary and beneficial. After all, for most topics that warrant inclusion, there has been more than enough analysis in reliable published sources that original arguments are not necessary and simply degrade the quality and reliability of articles.

Occasionally, this policy leads to the deletion of valuable material, but mostly it keeps out the crap. (I recently instigated the deletion of one interesting and informative–and, I’m fairly certain, true–article that very clearly violated NOR. It left a bad taste in my mouth and reminded me why I don’t participate in any of the constant push to delete content that is clearly accurate but fails to meet the requirements for Notability and NOR.)

But there is (at least) one conspicuous area where banning original research gets in the way of creating high-quality content. Articles about popular culture fiction (for example, Battlestar Galactica episodes or little-known novels from “paraliterary” genres like science fiction) represent the borders of what can–and in many cases can’t–be analyzed using reliable published sources. Yet amateur literature or film analysis is often of high quality (especially when it can be contested, debated, and talked out among a group of intelligent fans), even comparable to academic criticism.

Of course, some kinds of original analysis are better than others. The Wikipedia Manual of Style guideline for “writing about fiction” (the only substantive guideline I’ve had a hand in developing and implementing as official) requires that articles on fictional content take an “out-of-universe” approach, looking at the work of fiction as a work of fiction rather than part of a “real” timeline. Out-of-universe analysis prevents some of the most egregious and useless original research, but the main reason why I supported (and continue to support) the “writing about fiction” guideline is that it just makes for better, more useful articles. Placing cultural products in cultural context is what makes for the most useful and interesting content. But the downside of this is that many articles, especially for fiction about which little or no criticism has been published, can offer no more than a plot summary.

See for example, the many articles on Battlestar Galactica (re-imagining) episodes. Following the letter of Wikipedia’s rules would mean deleting all or nearly all of them; there are no reliable, independent sources to established the notability (much less analysis) of every individual episode. And yet, these kinds of articles are of great interest to readers (especially since Wikipedia has so many, for a wide and growing range of tastes). However, current conventions lead to sterile plot-summary-only articles because original research about the allusions and symbolism, artistic and technical elements, dramatic development, acting, and resonance with contemporary cultural and political issues cannot be included. Even aspects that do have relevant sources are frequently excluded because articles are written more based on examples (e.g., other plot-summary-only articles) than on official guidelines. Most editors understand NOR, but not very many know about the “writing about fiction” guideline.

Case in point: Chief Tyrol’s speech as union leader in the final episode of season 2 comes almost word-for-word from Mario Savio’s famous Dec. 2, 1964 speech at Berkeley–“you’ve got to put your bodies upon the gears and upon the wheels”. Though the episode’s article makes no mention of it (as of now), the speech and its source are discussed in a podcast from the director–according to the Mario Savio article. But even if there were no official source, astute fans notice things like this. They notice when one piece of fictional material alludes to another one. They notice when an episode’s plot parallels what’s been in the news lately. They notice obvious hints and foreshadowing conveyed through camera work and music. These things aren’t part of a plot summary per se (and are decidedly out-of-universe, since they are related more to the viewing experience than to the internal “causes” of plot events), but they are often straightforward. Such analysis certainly involves a high level of originality (by both Wikipedia and conventional definitions), but when done judiciously it can be close enough to “right” (as far as there is a right interpretation of a work of fiction) to secure the consensus of nearly anyone who views or reads the work.

So that’s my argument. Wikipedia original research isn’t all bad, and Wikipedia’s rules about notability and NOR should be a little more lenient when it comes to cultural artifacts (music, movies, books, TV, etc.).

Wikipedia as a source

Yale Daily News ran a story on Wednesday, “Profs question students’ Wikipedia dependency“. I guess it’s a disturbing sign that I thought angry and vindictive thoughts about the student, freshman John Behan, who created a number of fake articles. I used to think that kind of thing was funny, and I feel like I should still (in principle, at least; Behan’s work didn’t even rise to the level of BJAODN). The article focuses on one fake in particular, “emysphilia” (turtle fetish). But as it turns out, emysphilia (Behan’s “most successful” article) was deleted rather quickly; its only traction came from syndication on Answers.com. It’s unclear whether anyone besides people with direct knowledge of the hoax and the Wikipedians voting to delete it even read the article; it’s not something one would just run into on Answers.com without searching for the non-existent term.

The YDN article goes on with some quotes from professors about how Wikipedia is not an acceptable academic source. The headline for the page 6 continuation is “Inaccuracies make Wikipedia an unreliable academic source”, which is a pretty mediocre summary of what the faculty actual say about the subject. One prof makes reference to the “rigorous editing standards of hard copy sources”, compared to an anecdote about a Wikipedia article (with accurate, referenced information) giving the wrong first name (James Boswell instead of John Boswell) for a source. Unfortunately, the professor failed to take any action; I just tracked down the article I presume he was referring to, which it took 5 seconds to correct. (I have my own share of anecdotes about contradictions between a hard copy scholarly source and WP where it’s the hard copy that is wrong, but I digress.)

Like most stories about Wikipedia as an academic source, the Yale story misses the point. Another professor hits on the legitimate basis for excluding Wikipedia as an academic source: it’s an encyclopedia. 5 years from now, Wikipedia is going to be more accurate than any general print encyclopedia (at least on topics that traditional encyclopedias actually cover). And for random contradictions between a book source and a referenced Wikipedia article, Wikipedia will be the correct one more often than not. But it still won’t be an acceptable academic source, except perhaps as a place to point readers for peripheral background information. Because it will still be a tertiary source.

This issue has been in the news a lot since the Middlebury College Wikipedia ban and the Chronicle of Higher Education story on it.

Here’s a similar blog post about the issue, from a clear-headed historian.

What will Wikipedia be like 5 years from now?

With the continued growth of Wikipedia and its sister projects, it’s worth asking what the Wikimedia ecosystem will look like down the road. Here’s my vision of what it will and/or should be like.

Necessary functional improvements:

  1. Search. Wikipedia’s current internal search program is horrible. It is bizarrely sensitive to case, but lacks all the features we’ve come to expect from search. Quotation marks mean nothing. Results are often woefully incomplete (I often have to use a site-specific Google search to find what I’m looking for on Wikipedia). The interface is clunky, especially with all the check boxes at the bottom for different namespaces (and the fact that checking/unchecking only registers if you use the right search box, of the three available). But when search finally gets done right on Wikipedia, it will be a great thing; we’ll need a new verb to complement “to google” (“look it up on Wikipedia” just doesn’t have the same ring). Wikipedia search will be cross-project, with redirects and related entries (Wiktionary and Wikisaurus, Wikimedia Commons, articles in other languages) nested together. It should have some of the elements of Google’s search algorithm; the readable text of piped links should affect results, and results should be ordered by a sort of internal PageRank with the option of reordering them by size, date of last edit, etc.
  2. Stable versions and Approved versions. It’s been in the works for a while now, but there is still no system for managing stable articles where acceptable edits are few and far between, nor is there a good way to flag vetted versions (e.g., a version approved as a Featured Article). Semi-protection is a mediocre substitute for version control, while proposals to implement similar features manually have been too complicated for the community to accept. For stable, largely complete articles, new edits should not show up until they have been screened by one or a few other editors. And for ultra-stable articles, there should be an integrated system for revision and draft work while the consensus version remains viewable to readers.
  3. Audio/Visual accessibility. Because the major formats are all patented and could potentially have significant use limitations placed on them, Wikipedia uses Ogg files with free and open encoding to store and serve audio and video content. For the most part, users must go through a bit of trouble (i.e., downloading and installing codecs from off-site), although audio content now has rudimentary in-browser support. Obviously, the ideal would be integrated audio-video content without leaving the article; YouTube and Google Video have done this fairly well, though with proprietary technology (Adobe Flash with patented codecs). Video (both historical and user-created) will undoubtedly become a much bigger part of Wikipedia and Commons in the future.
  4. Unified login. Obviously, it would be convenient to have a single account for all the Wikimedia projects. It’s been in the works for a while now, but it’s more of a convenience for editors (and a correction of a design flaw) than a major improvement.
  5. Metadata handling. The current system of templates, categories, and other article metadata (beyond basic linking and formatting markup) is unintuitive, inconsistent, awkward, and intimidating to new editors, and the categories are difficult to navigate and far less useful than they could be. Something like a metadata namespace, for infoboxes, categories, Featured Article stars, interwiki links and the like, would be very beneficial.
  6. Categories. Related to the metadata issue, the category system needs to be completely overhauled. In the current system, categories must be divided and subdivided to maintain usefulness, and editors (new and established) often apply overly general categories to new articles. Instead, Wikipedia subdivides large categories into more specific ones. Broad categories like “American people” or “Songs” must be constantly monitored so they do not grow out of control. For example, for a given song, the subdivision branches into a wilderness of partially-overlapping subcategories like “songs by year”, “songs by artist”, “songs by lyricist”, “songs by nationality”, and “songs by genre”, along with a host of other possible orthogonal categories like “songs with sexual themes” and “cat songs”. Ideally, categorization would be both simpler and more flexible. Assigning broad categories (“songs”, “folk music”, “1963”, “protest”, “Bob Dylan”), with some semantic information (“is”, “from”, “related to”, “performed by”) should automatically create appropriate subcategories (Blowin’ in the Wind is a song and is folk music , from 1963, related to protest, performed by Bob Dylan).

Hoped-for functional improvements:

  1. Verifiability assessment. Eventually, Wikipedia will need a way to sort articles according to verifiability and sourcing (as a proxy for reliability, the direct measurement of which will always run into the problem of self-reference and the authority of editors). Readers should be able to tell immediately (before even beginning to read) whether an article is based on peer-reviewed articles and scholarly books, mainstream media sources, local or niche-oriented professional journalism, blogs and internet sites, primary sources, etc. Potentially, this could solve some of the perennial contentious issues about notability and the borderline of original research. The volume of material on minor topics (especially related to popular culture, current events, and minor/local institutions) is growing much faster than it can be strictly vetted (and deleted when appropriate) according to the current notability and verifiability guidelines, and there is a lot of material that is de facto acceptable, even if it doesn’t strictly comply with the current rules. And a lot of this is good, accurate material that readers and editors find useful. If material with few or potentially unreliable sources is clearly flagged as such, there will be less incentive to wage futile wars of deletionism on what is undeniably valuable. In other words, a compromise between elitist and populist visions of what Wikipedia should be.
  2. Discussion forums. I envision a discussion board for each article, separate from the talk page, where users (editors and readers alike) can discuss the subject of the article without the concern of trying to improve the article. This departs somewhat from the core mission of Wikipedia, but I think it would be beneficial is several ways. First, it would direct most of the irrelevant commentary away from talk pages, making collaboration among editors run more smoothly. Second, it could host ads for the support of the Wikimedia Foundation, without compromising the non-commercial nature of Wikipedia itself. And third, it would enhance the usefulness of Wikipedia at the borders of verifiability; readers who want more than the article has to offer can turn to the other forum participants for the speculation, rumor, and strained interpretation they seek.
  3. Stat tracking. Mainly for performance reasons, Wikipedia does very little in the way of internal stat tracking. But in the long run, it would be useful, both for identifying popular articles and for studying Wikipedia itself. In addition to hit counters for every article, the site should track (without retaining any potential identifying information) visit paths as readers surf from one article to another. And for those with editcountitis, some automatic sophisticated contribution analysis (like what can be done through JavaScript hacks by knowledgeable editors now) would be nice: things like total content added, deleted, histograms of edit size and frequency, etc.

So what will the future Wikipedia be like in a broader sense? Its cultural authority and perceived reliability will continue to increase, but surely both will begin to level off within the next few years. Traditional non-specialist encyclopedias will simply be irrelevant, and probably bankrupt. Given the degree of brand success Wikipedia has already achieved, the chances for a successful fork are quickly approaching nil. Citizendium seemed like it had an outside chance at becoming a viable competitor, but it has been managed poorly thus far and I think the window of opportunity is closing rapidly. Citizendium membership is turning out as odd mix of people who don’t edit Wikipedia because it doesn’t respect (their) authority enough, and because it respects authority (of published sources) too much; thus, many of the same issues that drive experts away from Wikipedia will show up in Citizendium if it grows large enough to matter. If it retains the GFDL license, Citizendium may have a place as a minor satellite of Wikipedia from which content is occasionally imported.

Wikipedia will also seriously eat away at the specialist encyclopedia market. I expect the viability of specialist encyclopedias will vary by field, according to which experts embrace and contribute to Wikipedia. In general, scientists (especially in the “harder” fields) and mathematicians have shown a great deal more enthusiasm than humanists, with social scientists somewhere between. (I find this ironic, because humanities fields have so much more to gain from an integrated and cross-linked ecology of knowledge; despite constant flux and discipline genesis at the borders and the current rhetorical vogue of “interdisciplinary” research, science topics are relatively self-contained compared to humanities topics.) It’s an open question whether the academic culture of the humanities will get on board in a significant way. Unfortunately, I think the Ivory Tower mentality and its paradoxical counterpart of academic careerism (especially in the current tight job market) are too entrenched; I expect participation just to continue with incremental gains through the recruitment of individual humanist Wikipedians.

As more and more people look to Wikipedia as their first (and often only) source for arbitrary information, Wikipedia will begin to seriously encroach on the market share of the search companies. It’s entirely possible that one or more of the major portals (most likely Ask.com and Yahoo!) will replace Wikipedia search results with mirrored content with added advertising. And if implemented well, some users might even prefer this; after all, ads results are sometimes just what you were looking for. (Similarly, Wikipedia itself might implement optional ads, which would only appear if explicitly enabled by users.) The ecosystem of value-added and exploitive businesses making a living off of Wikipedia will expand dramatically, which is bound to create plenty of unforeseen issues and controversies. But I don’t expect any major crises in that respect, since Wikipedia has always been built with the (legal and practical) potential for commercial exploitation.

The bigger problem will be professional PR and information management. In the next year or two, Wikipedia will have to create a system to deal with the complaints and requests of powerful economic and political entities. The recent Microsoft brouhaha over paid editing is the tip of the iceberg. It will be a challenge to create a system that is acceptable to the community but also acceptable enough to outsiders that they will use it instead of guerrilla editing. However, 5 years from now I think there will be some kind of stable equilibrium through a combination of an official system for dealing with accusations of bias from article subjects and vigilant groups of Wikipedians on the lookout for whitewashing.

In addition to encyclopedias, search, and PR, a number of other industries are going to feel pressure from the free content behemoth of Wikimedia projects. Wikimedia Commons will cut drastically into the market for stock photography, although Getty Images and Corbis will still have control of plenty of images that can’t be reproduced, and free media from limited-access venues (like celebrity functions) will still be hard to come by. (Wikipedia has tried, unsuccessfully thus far, to get Wikipedian photographers into red carpet events and award shows.) The glut of easily available images is already prompting stock photography companies to go the MPAA/RIAA route of suing liberally over copyright.

Politically, Wikipedia will do a lot to foster the free culture movement and especially to improve the atmosphere for copyright reform. It’s probably too optimistic to expect a reduction of copyright terms within the next five years, but at least any further extension (beyond the atrocious Copyright Term Extension Act of 1998) should be unlikely. Unfortunately, there’s no good way to show people how lame 95 year copyright terms are until the great content from the 1920s, 30s, 40s, and 50s starts to come into the public domain. (That stuff is our cultural heritage and ought to be in the public domain already; I think something like 50 years or the life of the author plus 20 is more than enough protection to serve the intended purpose of copyright.)

That’s all…the crystal’s gone dark.

Closing in on qualifiers

My first semester as a teaching assistant is done, and in about two months I’ll be taking qualifiers. I was pretty pleased with the way the course (Ole Molvig’s “History of Modern Science in Society”) went. I was disappointed with the amount of low amount of reading the class was willing to do; discussions were perpetually hamstrung because a large portion of the class didn’t do the reading in any given week. However, leading discussions was fun and I think I got a lot better at it as the semester went by. We used the Wikipedia assignment that I designed, which took a lot more work (on my part and the students’) than the original assignment. You can see the results here:
http://en.wikipedia.org/wiki/User:Ragesoss/HIST_236

(On a related noted, I wrote an article for the Wikipedia Signpost on Wikipedia assignments.)

As it turns out, I’m a hard-ass grader. (Technically, grades were Ole’s responsibility while grading was mine, but still.) History of Science, History of Medicine has the unfortunate reputation of being an easy major at Yale. So, especially considering the amount of work I required of them, many students were frustrated with the low grades (a B+ average, which is considered a bad grade these days). Now I have the reputation of a TA to avoid. But it’s hard to feel bad about it; grade inflation doesn’t do anyone any favors.

This semester will probably be different. I’m TAing for Susan Lederer’s “A History of American Bodies”, which has 7 TAs and possibly up to 300 students. That means the grading will be fairly uniform across sections and the overall distribution will probably be higher.

Matt Gunterman is back at Yale, and he’s the head TA, which means he has to run point for all the class logistics. Ha ha. Sucks to be him. He has a blog post about the first lecture. I’m glad he’s back, and I’m also TAing with my fellow 3rd year Brendan; it looks like we’ll do our orals on the same day, and we have two fields more-or-less in common, so I’m looking forward to having some orals prep discussions with him. It’s nice to have someone to bitch about orals with.

For the first time in my graduate career, I’m not going to audit or sit in on any extra classes this semester. In theory, that means I have more time for orals reading. But in practice, I can’t read for more than about four hours a day; after that, nothing sinks in and I lose all will to keep at it. Some people are capable of more sustained reading, but I think most graduate students are not (unless they’re popping Ritalin); qualifier preparation is like a hazing ritual. (At Yale, qualifiers are actually not so harrowing an experience, but they still have enough of the traditional elements to cause plenty of stress and induce plenty of depression.)

Speaking of Ritalin, I’ve been trying to convince Faith to score me some free samples from the pharma reps, but she won’t. (They give out whatever prescriptions medical students are on that are still under patent protection; unforunately, Faith’s meds just went generic but aren’t yet being produced by very many sources, so they’re hardly any cheaper but no longer free.) Since becoming a coffee and tea drinker, I’ve become much more attuned to the effects things like carbohydrates, salt, and caffeine. I want to branch out to self-testing of more psychoactive substances, but I haven’t gotten around to it. Oh well.