Dealing with Structured Documents in Information Retrieval Systems
Abstract
In this paper we suggest how hypertext links and the content of HTML pages can be used to cluster pages into what we call Web documents We put forward a method to automatically construct a hierarchy ofWeb doc uments and with the help of an abstraction function the context hierarchy of a site This hierarchy is represented by a graph whose links are structural typed Structural links between nodes reveal a context relationship The con text hierarchy along with the graph of the pages underlying the site are used to better index and retrieve the pages Furthermore it permits a new operator to be added in the IRS Information Retrieval System query language whereby the user will be able to di erentiate the context from the subject of his queries .