Site Search

 

Enter search terms or phrases, then press "Find":

 

   Match all terms specified.
   Rank results by IDF weighted hit count.
   Force whole word matching.

How To Use This Page

The notes below are for insomiacs and those who want to enter something a bit more sophisticated than single word searches, but are not getting back what they expect.

 

What Have I Found

The search results are formatted as links to pages on this web site. The name given to the link is the "Title" set in the HTML header for the page, so if I've done my job right, it should suggest what is on the page. If I haven't, it will be of no great help at all! In the extreme--if I've neglected to create a Title element--the relative file path to the page will be used, and that will be real cryptic! Please let me know about any pages where you thought the title was badly wrong.

Important: If you press your browser's [Back] button from any of the pages selected by these found links, it will obediently re-execute the search. If this is what you want, fine, but if what you really wanted to do was to enter a new search, you're better off re-selecting the "Search This Site" option from the left-hand navigation menu to save yourself some time.

Simple Searches

This search facility will crawl over all my HTML pages, looking for key words or phrases that you supply. A key word is any sequence of characters, delimited by the space character. A phrase is any sequence of characters, including the space character, enclosed in double quotes.

To preserve us from trivial searches that do nothing but consume CPU time, single character key words and phrases will be silently ignored. If that was all you entered, you'll get an error message for your collection, complaining about you not having supplied any search terms.

You may enter as many key words and/or phrases as you wish, although more terms means longer search times, and the possibility that you will filter out all possible pages!

By default, the search will use OR logic. That is, if you enter:

sparey diesel

the list of pages returned will contain either the word "sparey", or the word "diesel". If you specifically want to locate pages that contain the two words together as typed, enclose them in double quotes like this:

"sparey diesel"

This is the fastest type of search when multiple terms are involved as searching of individual pages stops after the first "hit" is encountered.

AND Searches

Ticking the Checkbox for all terms will enable AND logic. This means that a page must contain all of the terms you specify in order to qualify. Note that even with AND logic, unless you use phrases as shown above, you are only guarenteed that returned pages contain all the terms somewhere without any regard to order, or proxmity. So using the first example shown above with the "all" checkbox ticked will return the pages that contain both words sparey and diesel on them someplace, at least once each, with no regard to where or when they appear. Naturally, AND searches will take longer.

Ranked Searches

In order to give back results as fast as possible, the pages found are simply sorted alphabetically. However, you have the option to "rank" the results by checking the second Checkbox. This causes the listed result pages to be ordered using a technique knows as "term weighted Inverse Doucment Frequency" (IDF).

Briefly, this technique assumes that terms that do not appear very often on any page will be more interesting than those that appear frequently on lots of pages, hence these terms are "weighted" to rank their page(s) higher. So a page with a few interesting words may be ranked higher than one with lots of common words.

Each record is allocated a "score", being the sum of the number of times each term occurs on the page, times the IDF value for that term. The number displayed for each result is a percentage value calculated derived by dividing the page's score by the maximum score recorded for all of the pages found. Hence the one at the top will be 100% and so on down. I think you can guess that this takes longer--and a ranked, AND logic query will take longest of all.

Case Sensitivity

Searches are not case sensitive, so the terms Sparey and sparey (or SPAREY) will all return the same result. Apart from rampaging feature-itus, I can see no real benefit in providing an option for a case sensitive search. Please let me know if you can provide a scenario where this would help.

Word Matching

By default, term and phrase searches are made without consideration for word boundaries, so a search for carb would match things like carb, carburator, carbon and bicarbonate!! Checking the "Force whole word match" box will restrict matches to whole word boundaries (surprise). The cost, time-wise, is not a lot more. I've made the default "match partials" to increase the chance of hits. Turn it off if you're being deluged with false positives.

Under the Hood

For the techos, I must confess that the term parsing is rather simplistic and I imagine it will be possible to break it in several ways. Under the hood, the work is being done by a CGI Perl script using REs. If you don't know a RE from a hole in the road, consider yourself fortunate, and possibly still relatively sane. If you do, you can enter one as a term and really cause chaos with my poor script! (although several of the RE metacharacters are escaped--someone searching for an "ETA .29" really wants ".29", not any character followed by "29", right?)

Currently, this site is hosted on an old Solaris box with an old Perl implementation. Tests have shown a modern Linux server runs the search an order of magnitude faster--I suspect that file globbing and lack of support for pre-compiled REs are the culprits. Sad, but that's life.

Oops, I broke it!

Finally, if you manage to break the script in some way, indicated by an Apache error page being returned that mumbles about "internal server error" and suggesting that you contact the webmaster, please contact me--not our poor, besieged Web Mistress (or you'll be sorry ).