Internet Search Engines |
|
The World Wide Web is so big that trying to find specific information by yourself would be would be like an ant named Simon in the MCG during the grand final. Search engines are software and hardware systems that visit websites, catalogue the words on webpages, and present links to relevant pages to people who are searching for information. The software 'agents' that search for web pages and digest their contents are called 'spiders' because they travel across the 'web.' Simply cataloguing the words on a web page is not enough. When someone performs a search, there may be thousands of relevant web pages. An important role of search engines is to present the thousands of relevant pages in some sort of logical order, ranked by their likely relevance to the searcher's needs. On this page, "search terms" means the words entered into a search engine by a user. Search Engine Relevance Ranking Methods- Keyword matching. Simply checks to see how many of the search terms appear on a page. e.g. If searching for King Henry, a page with both the search terms would be judged more relevant than a page with only one of the terms. - Frequency. If a page mentioned King Henry frequently, it's a good sign the page is more valuable than another page that just mentions him once. - Position. Pages with 'King Henry' in the page title or in a heading, or formatted as bold would probably be more valuable than another page that did not emphasise the words or position them in a prominent place. A page that has the search terms near the beginning of the text would be more highly ranked than a page that had the search terms near the end of the text. HTML web pages also have an invisible section, the 'head' where 'META tags' can be inserted by the page's author. Two important META tags are KEYWORDS and DESCRIPTION, and another important head tag is TITLE. If search terms appear in these tags, it is a clue that the words are important to the page and it will tend to be ranked more highly. There is a continuing battle to be the best search engine. While their services are free to users, search engines make a great deal of money through advertising and sponsorship. The current King Search Engine is Google pursued enviously by eager youngsters like Alltheweb, Teoma, MSN, Yahoo and a dozen others. Although they all do the same job, they use a variety of techniques to do it.
Google introduced a new concept in determining the rank of sites: it counts the number of other sites that link to a site. The more a site is linked to, the more authoritative it is likely to be. It's like a popularity contest for websites. A much-linked-to site will be ranked more highly in the search results than a less popular page would be - especially if the sites containing the links are themselves highly-ranked. 'Click-through' popularity measures how many people use those links to visit your site, how long they visit, and how often they return. Some search engines also take into account whether a site is listed in prestigious listings. One example is DMOZ, a list of sites selected by human editors. A site's inclusion in this list is an indicator of the quality of the site, and search engines like Google take this into account. Search CheatingMany people try to work out how different search engines calculate the positions of sites in search results. Most engines keep their formula secret to prevent people exploiting it to get their sites' popularity artificially inflated. Tricks like including a lot of irrelevant keywords in the META tags have been popular with web authors. A site on a band, for example, would include irrelevant but popular keywords like "Pamela Anderson, nude, porn, free". Many search engines now pay little attention to keywords for this reason. Google ignores them completely. Another trick is to repeat keywords many times on a page, often with the text made invisible by setting the text colour to the page colour (pressing CTRL+A to select all will display these bogus words). Again, some search engines know this trick and will ignore blatant mass-listings. Warning: many search engines will penalise a page that is attempting to 'spam' them. Groups of enthusiasts enjoy trying to determine the formulae used by search engines. They set up test pages using different types of tricks, meta tags, etc. and see how different search engines rate the pages. By studing the differnet rankings given to different pages, the secret ranking formulas of search engines can often be deduced. Unfortunately, search engine companies often change their formulas, and this makes it hard to create the 'perfect' page. Oddly, Google will occasionally list pages that don't have
your search terms, but other pages linked to that page with those
words. This led to the
Social implications of search enginesSounds odd, doesn't it? 'Social implications of search engines'? You might as well talk about the social implications of screwdrivers. They are just tools, aren't they? In fact, search engines have heavy moral and social responsibilities
and can wield considerable influence. Google, for instance, was pressured
by the church of Scientology to remove links to pages they said contained
copyrighted material. In fact, the material was on an anti-Scientology
site, and removing the links to the pages was seen by many as the silencing
of free speech. They believed that a person searching for information
on the church would be getting a sanitised and biased list of links. A
bit of a * Does Google have a responsibility to fight censorship?
What if a search engine was programmed to boost the ratings of sites with a particular political inclination? People would, for example, see a lot of sites at the top of the list advocating a certain political stance, which contrary pages could be ranked way down where most people would not see them. The search terms entered by users also provide an immediate snapshot of people's thoughts. Google had a whimsical feature that tracks how often different topics are searched for. Such information would be invaluable to advertisers, politicians, newspaper reporters, commercial organisations etc because they would get a clue about what's "hot". People use indexes of search terms as indicators of fame: is Madonna being searched for less often? Is Britney Spears on the way up or down in her career? You can often get clues from the trends in search terms submitted to search engines. Different search engines handle moral issues differently. Some reward sponsors by inserting their link into search results regardless of words used in the search terms. Often, these sponsored links are not identified as such, leaving people to wonder why "Bob's Hardware" appeared in the results of their search for Britney Spears. Other search engines pride themselves on more reputable behaviour. They only include sponsored links if they are relevant to a search (but their rank will be artificially inflated.) Google goes further, and keeps sponsored links separate from search results, and clearly identifies them as sponsored. Also see - Also see -
|
Back to the IT Lecture Notes index
Page Created March 21, 2003
Last changed March 21, 2003 1:35 PM
IT Lecture notes copyright © Mark Kelly 2001-