Courses/CS 491ab/Winter 2008/Yuet-Chi Lee

From CSWiki

Jump to: navigation, search

I am Yuet-Chi Lee.

My time schedule is 12:15AM.


[edit] Week 1 - January 4, 2008

Introduce myself. Think for what to do next.

[edit] Week 2 - January 11, 2008

[edit] Week 3 - January 18, 2008

Installation Guide of MediaWiki

Installation requirements

In addition to the software MediaWiki itself, a standard MediaWiki installation has the following requirements:

  • PHP is required to run the software.
  • A database server is required to store the pages and site data.
  • A web server is required to send the generated pages to your web browser.

PHP

PHP is the programming language in which MediaWiki is written, and is required in order to run the software.

For the latest version of MediaWiki, PHP version 5.0 or later is recommended.

Database server

MediaWiki stores all the text and data (articles, user details, system messages, etc.) in a database, which it is capable of sharing with other web-based applications.

For the latest version of MediaWiki, MySQL 4.0 or later is recommended.

Web server

In order to serve the generated pages to your browser, MediaWiki requires some web server software. Often you will not have a choice of which software to use - it will be the one provided by your hosting provider.

For the latest version of MediaWiki, Apache is recommended.

Installing MediaWiki

[edit] Week 4 - January 25, 2008

Continue installing wiki.

[edit] Week 5 - February 1, 2008

[edit] Installing an extension

MediaWiki is ready to accept extensions just after installation is finished. To add an extension follow these steps:

  1. Before you start
    1. A few extensions require the installation of a patch. Many of them also provide instructions designed for installation using unix commands. You require shell access (SSH) to enter these commands listed on the extension help pages.
  2. Download and install ExtensionFunctions.php.
    1. Some extensions, especially newer ones, require a helper file called ExtensionFunctions.php. ExtensionFunctions includes a series of functions that allow extensions to be modularized away from the MediaWiki core code. The best way to install this file is to download the current version from SVN. This file is visible to the public here at all times. Once downloaded, copy the ExtensionFunctions.php file to the $IP/extensions/ subdirectory of your MediaWiki installation.
  3. Download your extension.
    1. Extensions are usually distributed as modular packages. They generally go in their own subdirectory of $IP/extensions/. A list of extensions documented on MediaWiki.org is available on the extension matrix. Some extensions are available as source code within this wiki. You may want to automatize copying them.
  4. Install your extension.
    1. Generally, at the end of the LocalSettings.php file, (but above the PHP end-of-code delimiter, "?>"), the following line should be added:
require_once "$IP/extensions/extension_name/extension_name.php";

This line forces the PHP interpreter to read the extension file, and thereby make it accessible to MediaWiki.


[edit] Extension to be studied

  • OpenSearch:
www.mediawiki.org/wiki/Extension:OpenSearch
  • RigorousSearch: a ineffective search on full text
www.mediawiki.org/wiki/Extension:RigorousSearch
  • SphinxSearch: a complex search engine that seems to allow phase search
www.mediawiki.org/wiki/Extension:SphinxSearch
  • Wildcard search: allow the use of wildcard "*"
www.mediawiki.org/wiki/Extension:Wildcard_search
  • Lucene: the one used by wikipedia
en.wikipedia.org/wiki/Lucene

[edit] Week 6 - February 8, 2008

[edit] test code ideas

[edit] searchForm.html

<html>
<body><form action="search.php" method="post">
Keywords: <input type="text" name="keywords" />
<input type="submit" />
</form></body>
</html>

[edit] search.php

// Error: no keyword 
if ($keywords == "") 
{ 
  echo "You forgot to enter keyword"; 
  exit; 
}
// Connect to the database
$con = mysql_connect("servername","username","password");
if (!$con)
{
  die('Could not connect: ' . mysql_error());
}
// filter for keywords MAY/MAY NOT be needed 
// searching
$data = mysql_query("SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('%$keywords%' IN BOOLEAN MODE)");
while($result = mysql_fetch_array( $data ))
{
  //some kind of output such as echo $result['title'];
}
// no result found
$matches=mysql_num_rows($data); 
if ($matches == 0) 
{ 
  echo "Sorry, no match in your query<br><br>";
  // spelling check part 
}
echo "Searched For: " .$keywords; 
// close connection
mysql_close($con);
?>

[edit] About spelling check

  1. I am thinking about using a wordlist called "Ispell English Word Lists" (downloads.sourceforge.net/wordlist/ispell-enwl-3.1.20.zip), inputting the list in a wiki page (i.e Special: Wordlist) and search for:
    1. the keyword itself (correctly spelling = a match founded)
    2. the first few alphabeta of the misspelling keyword (using wildcard * i.e catc*)
    3. the last few alphabeta of the misspelling keyword (using wildcard * i.e *atch)
  2. Research the spelling check program accessible for smarter spell check rule.

[edit] Final Report Draft

==Brief project description== 
A extension for mediawiki with mySQL allows phase searching, searching keywords usage (AND, 
OR, NOT etc.) and spelling check.
==Anticipated users==
The users will be mediawiki/mySQL users who want more complex functions for searching in wiki.
==Main conceptual (i.e., user-level) objects==  
A textbox with a button labelled as "Search".
The user will enter the search phase in a textbox and click search button as usual.
==Primary conceptual (i.e., user-level) operations== 
The extension will enable the wiki user to do phase searching. The user will enter the search 
phase in a textbox and click search button as usual.  The result of the search will be 
displayed.  If no entry is found, he search engine will do a spelling check on the phase and
suggest the possible words that the user want to search.
==Why I am interested in this project== 
I am interested in the region of database usage and management.

[edit] Week 7-8 - February 15&22, 2008

A Draft for the custom search with proper output format of wiki

Reference: www.mediawiki.org/wiki/Manual:Special_pages

[edit] body code draft

<?php
// A custom search
// Yuet-Chi Lee
// require SpecialPage.php to do output
require_once "SpecialPage.php";
class MyCustomSearch extends SpecialPage {
   function MyCustomSearch() {                
       SpecialPage::SpecialPage("MyCustomSearch");
       wfLoadExtensionMessages('MyCustomSearch');
   }
   function execute( $par ) {
       global $wgRequest, $wgOut;
       $this->setHeaders();
       $param = $wgRequest->getText('param');
       // set up the namespace to search
       $spaces = SearchEngine::searchableNamespaces();
       $searchNs = $this->selectedNamespaces($wgRequest, $spaces);
       if (!$searchNs)
           $searchNs = $spaces;
       // do the search and set up the output
       $wgOut->setPagetitle(("MyCustomSearch");
       if ($param)
           $wgOut->addWikiText($this->searchResults($param, $searchNs));
   }
   // Extract the selected namespaces' settings from the request  
   function selectedNamespaces(&$aRequest, &$someSpaces) {
       // ...
   }
   // Search for the given pattern
   function searchResults($pattern, &$spaces) {
       $db = &wfGetDB(DB_SLAVE);
       $out = "";
       $out .= "\n";
       $out .= "You searched for: " . $pattern . " .\n";
       $matches = 0;
       // ...
       return $out;
   }
?>

[edit] Code to study: SpecialSearch.php

<?php
# Copyright (C) 2004 Brion Vibber <brion@pobox.com>
# http://www.mediawiki.org/
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
# http://www.gnu.org/copyleft/gpl.html
/**
 * Run text & title search and display the output
 * @addtogroup SpecialPage
 */
/**
 * Entry point
 *
 * @param $par String: (default )
 */
function wfSpecialSearch( $par =  ) {
 global $wgRequest, $wgUser;
 $search = $wgRequest->getText( 'search', $par );
 $searchPage = new SpecialSearch( $wgRequest, $wgUser );
 if( $wgRequest->getVal( 'fulltext' ) ||
  !is_null( $wgRequest->getVal( 'offset' ) ) ||
  !is_null ($wgRequest->getVal( 'searchx' ) ) ) {
  $searchPage->showResults( $search );
 } else {
  $searchPage->goResult( $search );
 }
}
/**
 * implements Special:Search - Run text & title search and display the output
 * @addtogroup SpecialPage
 */
class SpecialSearch {
 /**
  * Set up basic search parameters from the request and user settings.
  * Typically you'll pass $wgRequest and $wgUser.
  *
  * @param WebRequest $request
  * @param User $user
  * @public
  */
 function SpecialSearch( &$request, &$user ) {
  list( $this->limit, $this->offset ) = $request->getLimitOffset( 20, 'searchlimit' );
  if( $request->getCheck( 'searchx' ) ) {
   $this->namespaces = $this->powerSearch( $request );
  } else {
   $this->namespaces = $this->userNamespaces( $user );
  }
  $this->searchRedirects = $request->getcheck( 'redirs' ) ? true : false;
 }
 /**
  * If an exact title match can be found, jump straight ahead to it.
  * @param string $term
  * @public
  */
 function goResult( $term ) {
  global $wgOut;
  global $wgGoToEdit;
  $this->setupPage( $term );
  # Try to go to page as entered.
  $t = Title::newFromText( $term );
  # If the string cannot be used to create a title
  if( is_null( $t ) ){
   return $this->showResults( $term );
  }
  # If there's an exact or very near match, jump right there.
  $t = SearchEngine::getNearMatch( $term );
  if( !is_null( $t ) ) {
   $wgOut->redirect( $t->getFullURL() );
   return;
  }
  # No match, generate an edit URL
  $t = Title::newFromText( $term );
  if( ! is_null( $t ) ) {
   wfRunHooks( 'SpecialSearchNogomatch', array( &$t ) );
   # If the feature is enabled, go straight to the edit page
   if ( $wgGoToEdit ) {
    $wgOut->redirect( $t->getFullURL( 'action=edit' ) );
    return;
   } 
  }
  $wgOut->addWikiText( wfMsg( 'noexactmatch', wfEscapeWikiText( $term ) ) );
  return $this->showResults( $term );
 }
 /**
  * @param string $term
  * @public
  */
 function showResults( $term ) {
  $fname = 'SpecialSearch::showResults';
  wfProfileIn( $fname );
  $this->setupPage( $term );
  global $wgOut;
  $wgOut->addWikiText( wfMsg( 'searchresulttext' ) );
  #if ( !$this->parseQuery() ) {
  if(  === trim( $term ) ) {
   $wgOut->setSubtitle(  );
   $wgOut->addHTML( $this->powerSearchBox( $term ) );
   wfProfileOut( $fname );
   return;
  }
  global $wgDisableTextSearch;
  if ( $wgDisableTextSearch ) {
   global $wgForwardSearchUrl;
   if( $wgForwardSearchUrl ) {
    $url = str_replace( '$1', urlencode( $term ), $wgForwardSearchUrl );
    $wgOut->redirect( $url );
    return;
   }
   global $wgInputEncoding;
   $wgOut->addHTML( wfMsg( 'searchdisabled' ) );
   $wgOut->addHTML(
    wfMsg( 'googlesearch',
     htmlspecialchars( $term ),
     htmlspecialchars( $wgInputEncoding ),
     htmlspecialchars( wfMsg( 'searchbutton' ) )
    )
   );
   wfProfileOut( $fname );
   return;
  }
  $search = SearchEngine::create();
  $search->setLimitOffset( $this->limit, $this->offset );
  $search->setNamespaces( $this->namespaces );
  $search->showRedirects = $this->searchRedirects;
  $titleMatches = $search->searchTitle( $term );
  $textMatches = $search->searchText( $term );
  $num = ( $titleMatches ? $titleMatches->numRows() : 0 )
   + ( $textMatches ? $textMatches->numRows() : 0);
  if ( $num > 0 ) {
   if ( $num >= $this->limit ) {
    $top = wfShowingResults( $this->offset, $this->limit );
   } else {
    $top = wfShowingResultsNum( $this->offset, $this->limit, $num );
   }
$wgOut->addHTML( "

{$top}

\n" );
  }
  if( $num || $this->offset ) {
   $prevnext = wfViewPrevNext( $this->offset, $this->limit,
    SpecialPage::getTitleFor( 'Search' ),
    wfArrayToCGI(
     $this->powerSearchOptions(),
     array( 'search' => $term ) ),
     ($num < $this->limit) );
   $wgOut->addHTML( "
{$prevnext}\n" ); }
  if( $titleMatches ) {
   if( $titleMatches->numRows() ) {
    $wgOut->addWikiText( '==' . wfMsg( 'titlematches' ) . "==\n" );
    $wgOut->addHTML( $this->showMatches( $titleMatches ) );
   } else {
    $wgOut->addWikiText( '==' . wfMsg( 'notitlematches' ) . "==\n" );
   }
   $titleMatches->free();
  }
  if( $textMatches ) {
   if( $textMatches->numRows() ) {
    $wgOut->addWikiText( '==' . wfMsg( 'textmatches' ) . "==\n" );
    $wgOut->addHTML( $this->showMatches( $textMatches ) );
   } elseif( $num == 0 ) {
    # Don't show the 'no text matches' if we received title matches
    $wgOut->addWikiText( '==' . wfMsg( 'notextmatches' ) . "==\n" );
   }
   $textMatches->free();
  }
  if ( $num == 0 ) {
   $wgOut->addWikiText( wfMsg( 'nonefound' ) );
  }
  if( $num || $this->offset ) {
$wgOut->addHTML( "

{$prevnext}

\n" );
  }
  $wgOut->addHTML( $this->powerSearchBox( $term ) );
  wfProfileOut( $fname );
 }
 #------------------------------------------------------------------
 # Private methods below this line
 /**
  *
  */
 function setupPage( $term ) {
  global $wgOut;
  $wgOut->setPageTitle( wfMsg( 'searchresults' ) );
  $subtitlemsg = ( Title::newFromText($term) ? 'searchsubtitle' : 'searchsubtitleinvalid' );
  $wgOut->setSubtitle( $wgOut->parse( wfMsg( $subtitlemsg, wfEscapeWikiText($term) ) ) );
  $wgOut->setArticleRelated( false );
  $wgOut->setRobotpolicy( 'noindex,nofollow' );
 }
 /**
  * Extract default namespaces to search from the given user's
  * settings, returning a list of index numbers.
  *
  * @param User $user
  * @return array
  * @private
  */
 function userNamespaces( &$user ) {
  $arr = array();
  foreach( SearchEngine::searchableNamespaces() as $ns => $name ) {
   if( $user->getOption( 'searchNs' . $ns ) ) {
    $arr[] = $ns;
   }
  }
  return $arr;
 }
 /**
  * Extract "power search" namespace settings from the request object,
  * returning a list of index numbers to search.
  *
  * @param WebRequest $request
  * @return array
  * @private
  */
 function powerSearch( &$request ) {
  $arr = array();
  foreach( SearchEngine::searchableNamespaces() as $ns => $name ) {
   if( $request->getCheck( 'ns' . $ns ) ) {
    $arr[] = $ns;
   }
  }
  return $arr;
 }
 /**
  * Reconstruct the 'power search' options for links
  * @return array
  * @private
  */
 function powerSearchOptions() {
  $opt = array();
  foreach( $this->namespaces as $n ) {
   $opt['ns' . $n] = 1;
  }
  $opt['redirs'] = $this->searchRedirects ? 1 : 0;
  $opt['searchx'] = 1;
  return $opt;
 }
 /**
  * @param SearchResultSet $matches
  * @param string $terms partial regexp for highlighting terms
  */
 function showMatches( &$matches ) {
  $fname = 'SpecialSearch::showMatches';
  wfProfileIn( $fname );
  global $wgContLang;
  $tm = $wgContLang->convertForSearchResult( $matches->termMatches() );
  $terms = implode( '|', $tm );
  $off = $this->offset + 1;
$out = "
    \n"; while( $result = $matches->next() ) { $out .= $this->showHit( $result, $terms ); } $out .= "
\n";
  // convert the whole thing to desired language variant
  global $wgContLang;
  $out = $wgContLang->convert( $out );
  wfProfileOut( $fname );
  return $out;
 }
 /**
  * Format a single hit result
  * @param SearchResult $result
  * @param string $terms partial regexp for highlighting terms
  */
 function showHit( $result, $terms ) {
  $fname = 'SpecialSearch::showHit';
  wfProfileIn( $fname );
  global $wgUser, $wgContLang, $wgLang;
  $t = $result->getTitle();
  if( is_null( $t ) ) {
   wfProfileOut( $fname );
   return "\n";
  }
  $sk = $wgUser->getSkin();
  $contextlines = $wgUser->getOption( 'contextlines',  5 );
  $contextchars = $wgUser->getOption( 'contextchars', 50 );
  $link = $sk->makeKnownLinkObj( $t );
  //If page content is not readable, just return the title.
  //This is not quite safe, but better than showing excerpts from non-readable pages
  //Note that hiding the entry entirely would screw up paging.
  if (!$t->userCanRead()) {
return "
  • {$link}</li>\n"; } $revision = Revision::newFromTitle( $t ); $text = $revision->getText(); $size = wfMsgExt( 'nbytes', array( 'parsemag', 'escape'), $wgLang->formatNum( strlen( $text ) ) ); $lines = explode( "\n", $text ); $max = intval( $contextchars ) + 1; $pat1 = "/(.*)($terms)(.{0,$max})/i"; $lineno = 0; $extract = ; wfProfileIn( "$fname-extract" ); foreach ( $lines as $line ) { if ( 0 == $contextlines ) { break; } ++$lineno; $m = array(); if ( ! preg_match( $pat1, $line, $m ) ) { continue; } --$contextlines; $pre = $wgContLang->truncate( $m[1], -$contextchars, '...' ); if ( count( $m ) < 3 ) { $post = ; } else { $post = $wgContLang->truncate( $m[3], $contextchars, '...' ); } $found = $m[2]; $line = htmlspecialchars( $pre . $found . $post ); $pat2 = '/(' . $terms . ")/i"; $line = preg_replace( $pat2, "\\1", $line ); $extract .= "
    {$lineno}: {$line}\n"; } wfProfileOut( "$fname-extract" ); wfProfileOut( $fname ); return "
  • {$link} ({$size}){$extract}</li>\n"; } function powerSearchBox( $term ) { $namespaces = ; foreach( SearchEngine::searchableNamespaces() as $ns => $name ) { $checked = in_array( $ns, $this->namespaces )  ? ' checked="checked"'  : ; $name = str_replace( '_', ' ', $name ); if( == $name ) { $name = wfMsg( 'blanknamespace' ); } $namespaces .= " <label><input type='checkbox' value=\"1\" name=\"" . "ns{$ns}\"{$checked} />{$name}</label>\n"; } $checked = $this->searchRedirects  ? ' checked="checked"'  : ; $redirect = "<input type='checkbox' value='1' name=\"redirs\"{$checked} />\n"; $searchField = '<input type="text" name="search" value="' . htmlspecialchars( $term ) ."\" size=\"16\" />\n"; $searchButton = '<input type="submit" name="searchx" value="' . htmlspecialchars( wfMsg('powersearch') ) . "\" />\n"; $ret = wfMsg( 'powersearchtext', $namespaces, $redirect, $searchField, , , , , , # Dummy placeholders $searchButton ); $title = SpecialPage::getTitleFor( 'Search' ); $action = $title->escapeLocalURL(); return "

    \n<form id=\"powersearch\" method=\"get\" " . "action=\"$action\">\n{$ret}\n</form>\n"; } }

    [edit] Week 9 - February 29, 2008

    This week, I spent 5 days on extracting and organizing the contents of the MyISAM tables of the database, but little progress at all.

    This mainly due to the unsufficient knowledge to the MyISAM part of the MySQL.

    Studies of this week:

    dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html

    dev.mysql.com/doc/refman/5.1/en/myisam-table-formats.html


    In the meanwhile, as worrying the speed of progress, I spent the remaining 2 days on another alternative project -- ringtone making program on Android.

    [edit] Week 10 - March 7, 2008

    [edit] Search Suggestion

    Previously I have been thinking to work on spelling check function.

    Since FireFox already hass a plugin for spelling check and a lot of page title of wiki are not even words, I am thinking to work on search suggestion.

    Search Suggestion is a function that can simultaneously suggest page titles that match the entry from the user.

    For example, when the user enters 'sequenc' in the search entry box and there is a page titled 'Consequence' in the wiki, the search engine will automatically and simultaneously add the link 'Consequnce' under the entry box.

    To do this, I probably need to take advantage of the 'LIKE' keyword of MySQL and javascript

    [edit] MySQL part

    Reference

    www.mediawiki.org/wiki/Manual:Database_access#select

    MySearch.php

    <?php

    ...

    wgExtensionFunctions[] = 'searchSuggest';

    function searchSuggest( $entry ) {

    $l = new Linker;

    $dbr = wfGetDB( DB_SLAVE );

    $suggestions = $dbr -> select(

    'page', // table

    'page_title', // var = page title

    array( 'page_namespace' => 0, // all pages

    "page_title LIKE '%".$entry. "%'" ),

    'Database::select',

    array( 'LIMIT' => 10 ) // 10 output

    );

    $result = "<ul>"; // unordered list

    while( $row = $dbr -> fetchObject( $suggestions ) ) {

    $link = Title::newFromDBkey( $row->page_title ); // get the link of the page

    $result .= '<li>' . $l -> makeKnownLinkObj( $link ) . "</li>\n"; // add the link

    }

    $result .= '</ul>';

    return $result;

    }

    ...

    ?>

    [edit] javascript part

    To add a javascript, I am going to use Hooks OutputPageBeforeHTML

    www.mediawiki.org/wiki/Manual:Hooks/OutputPageBeforeHTML

    <?php

    ...

    $wgHooks['OutputPageBeforeHTML'][] = 'searchSuggestJS';

    function searchSuggestJS( $wgOut ) { //???

    $wgOut -> addScript("<script type=\"text/javascript\" src=\"/extensions/mySearch/suggest.js\"></script>\n");

    }

    ...

    ?>

    suggest.js

    function key_pressed() {

    var x = document.getElementById( 'searchInput' );

    x.onkeypress = function() { press-then(); };

    }

    function press_then() {

    var entry = document.getElementById("searchInput").value;

    if( entry.length /= 0 ) {

    sajax_do_call( "searchSuggest", entry );

    }

    }

    [edit] Brief project description

    A extension for mediawiki with MySQL allows phase searching, searching keywords usage (AND, OR, NOT etc.) and search suggestion.

    [edit] Anticipated users

    The users will be mediawiki/MySQL users who want more complex functions for searching in wiki.

    [edit] Main conceptual (i.e., user-level) objects

    A textbox with a button labelled as "Search". The user will enter the search phase in a textbox and click search button as usual.

    [edit] Primary conceptual (i.e., user-level) operations

    The extension will enable the wiki user to do phase searching. The user will enter the search phase in a textbox and click search button as usual. The result of the search will be displayed. The search engine will suggest the possible words that the user want to search at bottom of search box.

    [edit] Why I am interested in this project

    I am interested in the region of database usage and management.

    [edit] Status

    • phase searching i.e "go through" --- done
    • AND/OR/NOT keywords translation --- done
    • search suggestion --- working on it
    • other addition ?

    t

    [edit] Final Report

    [edit] Brief project description

    A extension for mediawiki with MySQL allows phase searching, searching keywords usage (AND, OR, NOT etc.) and search suggestion.

    [edit] Anticipated users

    The users will be mediawiki/MySQL users who want more complex functions for searching in wiki.

    [edit] Main conceptual (i.e., user-level) objects

    A textbox with a button labelled as "Search". The user will enter the search phase in a textbox and click search button as usual.

    [edit] Primary conceptual (i.e., user-level) operations

    The user will enter the search phase in a textbox and the search engine will instantly suggest the words that matches what user has been entered from fulltext word list at bottom of search box. User click search button as usual and the result of the search will be displayed.

    [edit] Why I am interested in this project

    I am interested in the region of database usage and management.

    [edit] Status

    • phase searching i.e "go through" --- done
    • AND/OR/NOT keywords translation --- done
    • search suggestion --- working on it
      • check for the entered phase on the fulltext word list
      • (check for synonym on the fulltext word list)