What will Wikipedia be like 5 years from now?

With the continued growth of Wikipedia and its sister projects, it’s worth asking what the Wikimedia ecosystem will look like down the road. Here’s my vision of what it will and/or should be like.

Necessary functional improvements:

  1. Search. Wikipedia’s current internal search program is horrible. It is bizarrely sensitive to case, but lacks all the features we’ve come to expect from search. Quotation marks mean nothing. Results are often woefully incomplete (I often have to use a site-specific Google search to find what I’m looking for on Wikipedia). The interface is clunky, especially with all the check boxes at the bottom for different namespaces (and the fact that checking/unchecking only registers if you use the right search box, of the three available). But when search finally gets done right on Wikipedia, it will be a great thing; we’ll need a new verb to complement “to google” (“look it up on Wikipedia” just doesn’t have the same ring). Wikipedia search will be cross-project, with redirects and related entries (Wiktionary and Wikisaurus, Wikimedia Commons, articles in other languages) nested together. It should have some of the elements of Google’s search algorithm; the readable text of piped links should affect results, and results should be ordered by a sort of internal PageRank with the option of reordering them by size, date of last edit, etc.
  2. Stable versions and Approved versions. It’s been in the works for a while now, but there is still no system for managing stable articles where acceptable edits are few and far between, nor is there a good way to flag vetted versions (e.g., a version approved as a Featured Article). Semi-protection is a mediocre substitute for version control, while proposals to implement similar features manually have been too complicated for the community to accept. For stable, largely complete articles, new edits should not show up until they have been screened by one or a few other editors. And for ultra-stable articles, there should be an integrated system for revision and draft work while the consensus version remains viewable to readers.
  3. Audio/Visual accessibility. Because the major formats are all patented and could potentially have significant use limitations placed on them, Wikipedia uses Ogg files with free and open encoding to store and serve audio and video content. For the most part, users must go through a bit of trouble (i.e., downloading and installing codecs from off-site), although audio content now has rudimentary in-browser support. Obviously, the ideal would be integrated audio-video content without leaving the article; YouTube and Google Video have done this fairly well, though with proprietary technology (Adobe Flash with patented codecs). Video (both historical and user-created) will undoubtedly become a much bigger part of Wikipedia and Commons in the future.
  4. Unified login. Obviously, it would be convenient to have a single account for all the Wikimedia projects. It’s been in the works for a while now, but it’s more of a convenience for editors (and a correction of a design flaw) than a major improvement.
  5. Metadata handling. The current system of templates, categories, and other article metadata (beyond basic linking and formatting markup) is unintuitive, inconsistent, awkward, and intimidating to new editors, and the categories are difficult to navigate and far less useful than they could be. Something like a metadata namespace, for infoboxes, categories, Featured Article stars, interwiki links and the like, would be very beneficial.
  6. Categories. Related to the metadata issue, the category system needs to be completely overhauled. In the current system, categories must be divided and subdivided to maintain usefulness, and editors (new and established) often apply overly general categories to new articles. Instead, Wikipedia subdivides large categories into more specific ones. Broad categories like “American people” or “Songs” must be constantly monitored so they do not grow out of control. For example, for a given song, the subdivision branches into a wilderness of partially-overlapping subcategories like “songs by year”, “songs by artist”, “songs by lyricist”, “songs by nationality”, and “songs by genre”, along with a host of other possible orthogonal categories like “songs with sexual themes” and “cat songs”. Ideally, categorization would be both simpler and more flexible. Assigning broad categories (“songs”, “folk music”, “1963”, “protest”, “Bob Dylan”), with some semantic information (“is”, “from”, “related to”, “performed by”) should automatically create appropriate subcategories (Blowin’ in the Wind is a song and is folk music , from 1963, related to protest, performed by Bob Dylan).

Hoped-for functional improvements:

  1. Verifiability assessment. Eventually, Wikipedia will need a way to sort articles according to verifiability and sourcing (as a proxy for reliability, the direct measurement of which will always run into the problem of self-reference and the authority of editors). Readers should be able to tell immediately (before even beginning to read) whether an article is based on peer-reviewed articles and scholarly books, mainstream media sources, local or niche-oriented professional journalism, blogs and internet sites, primary sources, etc. Potentially, this could solve some of the perennial contentious issues about notability and the borderline of original research. The volume of material on minor topics (especially related to popular culture, current events, and minor/local institutions) is growing much faster than it can be strictly vetted (and deleted when appropriate) according to the current notability and verifiability guidelines, and there is a lot of material that is de facto acceptable, even if it doesn’t strictly comply with the current rules. And a lot of this is good, accurate material that readers and editors find useful. If material with few or potentially unreliable sources is clearly flagged as such, there will be less incentive to wage futile wars of deletionism on what is undeniably valuable. In other words, a compromise between elitist and populist visions of what Wikipedia should be.
  2. Discussion forums. I envision a discussion board for each article, separate from the talk page, where users (editors and readers alike) can discuss the subject of the article without the concern of trying to improve the article. This departs somewhat from the core mission of Wikipedia, but I think it would be beneficial is several ways. First, it would direct most of the irrelevant commentary away from talk pages, making collaboration among editors run more smoothly. Second, it could host ads for the support of the Wikimedia Foundation, without compromising the non-commercial nature of Wikipedia itself. And third, it would enhance the usefulness of Wikipedia at the borders of verifiability; readers who want more than the article has to offer can turn to the other forum participants for the speculation, rumor, and strained interpretation they seek.
  3. Stat tracking. Mainly for performance reasons, Wikipedia does very little in the way of internal stat tracking. But in the long run, it would be useful, both for identifying popular articles and for studying Wikipedia itself. In addition to hit counters for every article, the site should track (without retaining any potential identifying information) visit paths as readers surf from one article to another. And for those with editcountitis, some automatic sophisticated contribution analysis (like what can be done through JavaScript hacks by knowledgeable editors now) would be nice: things like total content added, deleted, histograms of edit size and frequency, etc.

So what will the future Wikipedia be like in a broader sense? Its cultural authority and perceived reliability will continue to increase, but surely both will begin to level off within the next few years. Traditional non-specialist encyclopedias will simply be irrelevant, and probably bankrupt. Given the degree of brand success Wikipedia has already achieved, the chances for a successful fork are quickly approaching nil. Citizendium seemed like it had an outside chance at becoming a viable competitor, but it has been managed poorly thus far and I think the window of opportunity is closing rapidly. Citizendium membership is turning out as odd mix of people who don’t edit Wikipedia because it doesn’t respect (their) authority enough, and because it respects authority (of published sources) too much; thus, many of the same issues that drive experts away from Wikipedia will show up in Citizendium if it grows large enough to matter. If it retains the GFDL license, Citizendium may have a place as a minor satellite of Wikipedia from which content is occasionally imported.

Wikipedia will also seriously eat away at the specialist encyclopedia market. I expect the viability of specialist encyclopedias will vary by field, according to which experts embrace and contribute to Wikipedia. In general, scientists (especially in the “harder” fields) and mathematicians have shown a great deal more enthusiasm than humanists, with social scientists somewhere between. (I find this ironic, because humanities fields have so much more to gain from an integrated and cross-linked ecology of knowledge; despite constant flux and discipline genesis at the borders and the current rhetorical vogue of “interdisciplinary” research, science topics are relatively self-contained compared to humanities topics.) It’s an open question whether the academic culture of the humanities will get on board in a significant way. Unfortunately, I think the Ivory Tower mentality and its paradoxical counterpart of academic careerism (especially in the current tight job market) are too entrenched; I expect participation just to continue with incremental gains through the recruitment of individual humanist Wikipedians.

As more and more people look to Wikipedia as their first (and often only) source for arbitrary information, Wikipedia will begin to seriously encroach on the market share of the search companies. It’s entirely possible that one or more of the major portals (most likely Ask.com and Yahoo!) will replace Wikipedia search results with mirrored content with added advertising. And if implemented well, some users might even prefer this; after all, ads results are sometimes just what you were looking for. (Similarly, Wikipedia itself might implement optional ads, which would only appear if explicitly enabled by users.) The ecosystem of value-added and exploitive businesses making a living off of Wikipedia will expand dramatically, which is bound to create plenty of unforeseen issues and controversies. But I don’t expect any major crises in that respect, since Wikipedia has always been built with the (legal and practical) potential for commercial exploitation.

The bigger problem will be professional PR and information management. In the next year or two, Wikipedia will have to create a system to deal with the complaints and requests of powerful economic and political entities. The recent Microsoft brouhaha over paid editing is the tip of the iceberg. It will be a challenge to create a system that is acceptable to the community but also acceptable enough to outsiders that they will use it instead of guerrilla editing. However, 5 years from now I think there will be some kind of stable equilibrium through a combination of an official system for dealing with accusations of bias from article subjects and vigilant groups of Wikipedians on the lookout for whitewashing.

In addition to encyclopedias, search, and PR, a number of other industries are going to feel pressure from the free content behemoth of Wikimedia projects. Wikimedia Commons will cut drastically into the market for stock photography, although Getty Images and Corbis will still have control of plenty of images that can’t be reproduced, and free media from limited-access venues (like celebrity functions) will still be hard to come by. (Wikipedia has tried, unsuccessfully thus far, to get Wikipedian photographers into red carpet events and award shows.) The glut of easily available images is already prompting stock photography companies to go the MPAA/RIAA route of suing liberally over copyright.

Politically, Wikipedia will do a lot to foster the free culture movement and especially to improve the atmosphere for copyright reform. It’s probably too optimistic to expect a reduction of copyright terms within the next five years, but at least any further extension (beyond the atrocious Copyright Term Extension Act of 1998) should be unlikely. Unfortunately, there’s no good way to show people how lame 95 year copyright terms are until the great content from the 1920s, 30s, 40s, and 50s starts to come into the public domain. (That stuff is our cultural heritage and ought to be in the public domain already; I think something like 50 years or the life of the author plus 20 is more than enough protection to serve the intended purpose of copyright.)

That’s all…the crystal’s gone dark.