Enriching and populating academic ontology for search optimization


The one thing that sets these times that we live in apart from all that preceded it. Once considered the ultimate differentiator because it was so hard to come by — so hard indeed, that we believed Information is Power — it has become so abundant and freely available that it should be regarded as the great leveler.

Or should it?

Information is so freely available and in such great quantity that our times has been labeled the Information Age, but in effect it seems to have succeeded only in replacing one problem with another. Where there was too little information available, now there is too much. Whereas finding information at all used to be the greatest challenge, now it is sifting through all the information available and finding what you need when you need it.

People find information on the Internet by searching using search engines. These searches are basically of two kinds. The first kind is navigational, where the searcher knows exactly what resource or page has the required information, and uses the search engine to locate it on the web. The second type of search is research, where the searcher doesn’t know where the required information is located and is trying to locate a document or a collection of documents which would yield the required information.

The most common search engines such as Google use programmes called spiders or crawlers which go from page to page on the net following links. These are then indexed. When a search is initiated, the search engine locates pages on which the key word or words used in the search are present. The pages that are found are then ranked according to relevance as defined by the programme algorithm. In the case of Google, the three basic criteria that are used are:

  1. The frequency and location of the key word(s) within the page. If the word is present in the URL, or in the header of the page, it is ranked higher than if it is present in the content. The more the instances of the word on the page, the higher the rank and so on.
  2. How long the page has existed. Newer pages are ranked lower than ones that have survived longer.
  3. The number of other pages that link to the page with the key word(s).

The results are displayed as a list of pages ranked as per the criteria above. This method can throw up dozens of pages, sometimes millions of them. The searcher then has to go through as many pages as time and inclination permits to find the ones that are most relevant.
Search accuracy can be improved if the searcher’s intention is understood properly. This can be done only if the search is driven by the meaning of what the searcher specifies rather than the words. This method of searching is called Semantic searching. Semantic searching is irrelevant to navigational searching, but is highly effective in retrieving information in research.

As the name implies, semantic searching employs semantics, which is the study of meaning in language. Natural language, which we human beings use to communicate as opposed to machine languages used by computers, have rules that govern how words interact, based on their meanings. These rules create the essential framework that enables speakers of a language to understand one another.

Semantic searches can work on the open web or within a closed system. A semantic web search would require the consideration of various points such as context, location, intent, variations and synonyms, generalizations, concepts, etc, all of which qualify natural language queries. Closed systems, on the other hand, can be highly structured, with the information within them being organized along well-defined frameworks. Called ontologies, these structural frameworks provide for the representation of entities, ideas and events along with their properties and relations, according to a system of categories.
An ontology defines a common vocabulary for researchers who need to share information in a domain. Ontologies play an important role in semantic search. The creation of ontologies contributes towards building the knowledge base for intelligent web search. This is particularly important considering the fact that the number of domains within the web that have ontologic structures is growing, so much so that they have come to be referred to, collectively, as the Semantic Web.

Retrieving relevant information from this growing corpus is becoming increasingly challenging. The problem is that existing semantic search engines return results that are too inclusive when the focus is on ontologies, and too exclusive when the focus is on individual concepts. Any approach that would help to bridge this gap would help to take semantic search to the next level.

At the IITB-Monash Research Academy in Mumbai, research scholar Chetana Gavankar has come up with a solution that does just that. Working under the guidance of Prof. Yuan-Fang Li, and Prof. David Taniar of Monash University and Prof. Ganesh Ramakrishnan and Prof. N. L. Sarda of IIT Mumbai, Chetana has created a novel approach that searches, indexes and ranks concepts across ontologies. This would give search engines the semantic knowledge to conduct searches effectively.

For example the ontology will provide the information that “student” can be a research scholar or a person pursuing masters or bachelors degree. The populated ontology will also provide information on the relationship between various concepts. For instance, it will tell the search engine that “web search” is related to concepts such as algorithms, tools, etc.

These methods are generic and domain-independent. They would benefit the search engine providers as well as end users for search optimization. Says Chetana, “These would have universal relevance ranging from crop selection queries of a farmer to queries on stem cell research”

Research scholar: Chetana Gavankar, IITB-Monash Research Academy

Project title: Enriching and populating academic ontology for search optimization

Supervisors: Prof. Yuan-Fanf Li, Prof. Ganesh Ramakrishnan, Prof.N.L.Sarda, Prof. David Taniar

Contact details: chetana_gavankar@yahoo.com

Contact research@ for more information on this, and other projects.