Email: Saeid Mousavi
My CS 590 page.
Project or Thesis: How Wikipedia works
What I am planning to accomplish
Wikipedia has been known the de facto online encyclopedia for the last couple of years. The beauty of this work is manifold as the creators of the material are not professional staff but ordinary people who intend to share their knowledge and experience with the world. My intention in taking this subject for my Masters thesis is to find out the secret of success of this business model from different perspectives: psychosocial, software/hardware, and administration. I hope this work can single out and emphasize the important factors involved in the successful progression of this business model and be used for similar works in progress.
(Thesis) Issues to be investigated
Administration How the management of the material/people/hardware and software/finance, … is done.
Psychosocial What factor inspired and directed people to chip in and continue to support Wikipedia.
Hardware What kind of hardware supports the backbone of the operation?
Network The topology/hardware of the network.
Software How the software components come together to create an (almost) user friendly environment for the users
Finance How does Wikipedia survive/prosper?
Community and users
What classes of users are there?
How the users get promoted or designated to higher administrative levels?
What is the number of users in each category? Their contribution?
What are the privileges/rights assigned to each user level?
How is a policy approved/adopted and later on maintained?
Will a user in the higher management levels ever be a part of the WIKI Foundation? How?
Is the promotion of the users automatic or they need to volunteer themselves for it?
What kind of automatic edit/update/maintenance a bot does?
As it seems the different language WIKI's are so different, are the policies and procedures governing them also different?
What's the direction WIKI is moving toward? (Future trends in administration, goals, community,...)
Where do the donations to WIKI go? Is it spent on HW/SW or might trickle down the management?
(Thesis) Annotated table of contents
Why this is academically interesting
This work can be a blueprint of success for some categories of companies and organizations to follow. Also this work if done thoroughly can be commercialized and distributed amongst the interested parties.
Academia: Can help with initiation of similar academic works.
Computer professionals: To have a model of a very efficient successful system.
Sociologists: To study the basic driving force inspiring and empowering people who are involved in this project.
Ordinary people: Those who are already involved will get a better picture of what is going on and those who are not, can see the difference they can make and if they want, to take part.
Why this is MS-level work
If done properly, this research work will create a roadmap of success for the similar enterprises.
There are many books on how to operate/edit inside the Wiki environment but I haven’t come across any looking at the topic with the perspective put forth in the project.
There are many online documents though which cover bits and pieces here and there.
Researching books/documents/multimedia resources on Amazon for Wikipedia:
This is an email-based interview between four of the Wiki admins.
Here is a gist of the interviews:
About roles in Wikipedia, there are readers, editors, administrators, recent changes patrollers (reverting vandalism), policy makers, subject area experts (WikiProjects offers a place for people who want to focus on one topic to have a focused community within the larger Wikipedia community), content maintainers, software developers, system operators and many more. There are also all sorts of informal groups within the project. For example, the welcoming committee is a self selected group of people who say they will help with welcoming new users.
Most people start out as editors or uploaders. The majority stays in that role. After that, though, many different roles are possible. Maybe the most prominent one is the administrator role.
Wikipedia is approaching the quality problem from two sides: From the bottom and from the top. From the bottom, the deletion process simply is used to weed out poor articles. From the top we encourage high quality articles by providing extra recognition for an author’s work as ‘excellent articles’. Such extra recognition by the community as well as the visibility to the general Wikipedia readership gives authors immaterial rewards for their work.
Some challenges that Wikipedians face:
Legal threats, in particular libelous edits and copyright infringements. In general a legal conflict can harm a project, even if in the end no real conflict before a court arises between the rights holder and the Wikimedia Foundation. The problem is that being in limbo might prevent further development of content and might be a source of human conflict on the project. But usually, it is nothing that can’t be fixed.
Keeping integrity as a project, Some Wikipedias, like the English or German one, have many editors who are also involved with global activities like the Commons, Meta, or Foundation wikis. On other Wikipedias, much fewer volunteers like these exist, and bad communication between the local level and the global level might result. This can be a severe problem for the local projects.
Lack of involvement, A lot of people are needed to keep a project alive! For smaller wikis, a dearth of contributors happens easily. Poor involvement of editors or even inactivity challenges the sustainability of the project. Therefore we need to go back to the first and foremost challenge: To keep the openness of the wikis that makes it easy for people to join.
Credibility, Young Wikipedias need to build a certain level of credibility. If they fail to establish their credibility or take too long a time, the project might falter.
More than a third of American adult internet users (36%) consult the citizen-generated online encyclopedia Wikipedia, according to a new nationwide survey by the Pew Internet & American Life Project.
In addition, young adults and broadband users have been among those who are earlier adopters of Wikipedia. While 44% of those ages 18-29 use Wikipedia to look for information, just 29% of users age 50 and older consult the site.
Hitwise data suggest several reasons for the popularity of Wikipedia:
First, there is the sheer amount of material on the site, covering everything from ancient history to current events and popular culture.
Second, Wikipedia's dramatic growth is strongly correlated with Americans’ affection for search engines. Wikipedia’s article structure helps explain this. Many of the pieces in the encyclopedia are full of links to other Wikipedia articles and other material on the Web. One of the prime factors in Google's search results algorithm is the number of links connected to a given webpage.
The paper aims at being the first detailed study about user behavior on the Wikipedia, and on how users of the system create and maintain information appearing on its pages. As the authors tell us:
"This paper tries to model the behavior of users contributing to Wikipedia (hereafter called contributors) as a way of understanding its evolution over time. It presents what we believe to be the first extensive effort in that direction. This understanding will allow us, in the future, to create a model for Wikipedia evolution that will be able to show its trends and possible effects of changes in the way it is managed."
A few of the findings about user interactions, from the paper:
1. The number of articles on the Wikipedia has been growing at an exponential rate since it started, but the number of articles from each contributor has decreased over time.
2. Most users tend to revise existing articles rather than creating new ones.
3. Most users tend to focus their attentions upon a single main article.
There are also some interesting numbers coming out of the study (which uses Wikipedia data from October, 2006). Here are a few of those:
1) The amount of links in the Wikipedia number 58.9 million, an average of 45 links per article.
2) The number of broken links is 6.5 million (I wonder if that includes “citation needed” links that appear in some articles.)
3) The number of internal redirects is 6.8 million.
4) The number of revisions listed in the study’s data is 48.2 million.
Detailed HW/SW specifications/configurations
May need to interview some of the administration figures
Anticipated approach to each challenge
Detailed HW/SW specifications/configurations Digging deeply into the available documents Speaking with the few people who are working for the Wikipedia
Financial data Using the available documents/?
May need to interview some of the administration figures Using connections to set some interview time.
What I bring to this work
My background as a software engineer who has been in the field for the last 18 years.
My relevant background and experience (CS 590)
I got my BS in Software Engineering and was a developer/system integrator for quite a while.
In 2002 I took Microsoft Certification curses and became:
Microsoft Certified Systems Engineer (MCSE) Microsoft certified Systems Administrator (MCSA) Microsoft Certified Database Administrator (MCDBA)
The material covered in these courses gave me the necessary insight to see the interaction between front end, back end and network in a new light.
Afterward I pushed my relevant expertise more towards database administration/development and now professionally I am mostly focused in the architecture, design and implementation of the backend solutions.
In 2006 and 2007, I expanded my networking knowledge to the Internetworking world through CISCO and got certified as:
CISCO Certified Network Associate (CCNA) CISCO Certified Network Professional (CCNP)
This has given me more insight into how networks interoperate in much wider scales than I used to know.
With this background I am planning to approach the Hardware/Software architecture of the Wikipedia and find out the secret to their highly efficient design and implementation.
What I find interesting about this work
The improbability of a volunteer work becoming so global
Relative accuracy of the work in spite of the nature of versatile contribution sources
Decentralized management strategies governing the community
How this work goes beyond my experience and course work
As my professional life has been around technical aspects most of the time, this thesis gives me the opportunity to look at the managerial aspects of a highly successful information technology project.
I'll need to explore some of the behind the scenes activities which are not presented to a technical individual.
I'll also need to talk to people who are deeply involved and/or are managing Wiki. This will demand a level of public relation I haven't used in my professional career..