Powerful Semantic Search Engines

The World Wide Web has so much information that seeking and finding specific information has become a chore at best. A typical search will present hundreds of thousands of web pages as a result, and it becomes extremely difficult to find the information needed. Current search engines use meta tags to categorize web pages. Meta tags are keywords used to describe web sites so that non-semantic search engines can correctly display suitable web pages when queried. The search engine has no understanding of the content of the web page. Using XML to structure a web page allows one to search for information with string and structure matching tools, however the user must know the exact specifications of the document and is therefore not sufficient for searching on a large scale.

Semantic search engines are powerful and robust. As stated earlier, semantic metadata describe content and relates it to specific domains. Metadata enables semantic search engines and semantic applications to fill in the gaps. For example, if a financial advisor is evaluating Philip Morris, through semantic association, the search engine may provide the financial advisor with a study on the impact of new laws governing the tobacco industry. The financial advisor did not explicitly ask for this information however the semantic search engine appropriately detected that Philip Morris and the tobacco industry are related.

The value of Metadata

Meta tags are keywords used to describe web sites so that search engines can correctly display suitable web pages when queried. Semantic metadata describes content and relates it to specific domains. As new areas are introduced and existing web pages are converted, metadata will play a key role. An example would be Michael Jackson. Michael Jackson could be the entertainer or may be a financial analyst on Wall Street. The knowledgebase should be able to detect the differences based on the context and the content. Michael Jackson would match an entity. There may be more than one Michael Jackson in the Knowledgebase. Deciding which context to choose is called resolving metadata extraction ambiguities. Having the ability to detect differences in content will dramatically reduce the number of hits displayed by a semantic search engine.

Information retrieval is the most common type of activity involving search engines. A keyword or phrase is entered into the search engine and a list displaying ranking and pertinent information based on criteria specified in the search is displayed. Current search engines do a poor job of producing appropriate links. Semantic search engines will be able to deliver a list of pertinent links with precision. This is due to the semantic search engine’s capability to understand web content both on the level of actual input information matched with appropriate documents and also on the level of understanding content areas, trends, and variations in opinions.

In the case of information retrieval, the content of the web page determines which semantic metadata to extract. In the example of Michael Jackson, there would be several contexts that would apply to Michael Jackson such as the singer and dancer, the celebrity spokesperson, the advocate of children’s causes and the personal life of Michael Jackson. Automatic classification technology helps select the context by classifying documents into categories and extracting or inferring semantic metadata from one or more of the contexts.

Once the semantic search engine determines the context of information described in the web page, it can explore the relationships that entity has with other domains. Continuing the same example, if the context selected was the entertainer, it could produce a list of awards won, videos completed, number of albums created, the level each album reached i.e. platinum, gold, etc., award shows hosted, popular dances invented and more.

Semantic search engines are superior in their ability to perform precise searches. Much of the effort put into finding exact and precise information will be a thing of the past. Through inference and the ability to understand content and context, searching will become much more efficient.