Monday 30 November 2009

Week 8 - Information Retrival

Defining Information Retrieval (IR) has many variations. This seeking can be conscious or sub conscious. Information-seeking behaviour is the process of finding information relevant to the task at hand, given the state of the cognitive environment (Wilson 1999)

Formal definititions of IR can be grouped:

The Users perspective – resolving their anomalous state of knowledge (ASK) – Belkin, Oddy and Brooks (1982).
The System – Software and hardware. Persistent storage, processing and retrieval.
The Sources – Presentation.. Eg. Reuters, AFP, Dialogue who supply information.

Needs can also be classified:

Known Item retrieval – I want to know about the film ‘Orient Express’.
Fact retrieval – Time of trains from Weybridge to Waterloo.
Subject – How often did the trains run on time last year?
Exploratory – What information can be found about types of trains in Britain?

Indexing keywords or concepts allows users to find what they’re looking for. It is important to identify appropriate fields. In the example above this could be destination stations, departure stations, times of trains, types of trains.

Metadata such as the Dublin Core is one form. One must also identify the words (parsing). How you identify: digits, non a-z characters in words, Case, compound words and characters.

It may be useful to remove stop words, to, be, or, not, and etc. Stemming can also prove valuable such as: water, waters, watered and watering all conflate to water.

Specifying a list of synonyms can also provide more accurate results. At NPIA we refer to The Neighbourhood Policing team, aka Citizen Focus team or Local Force Policing. All relevant terms.

Index structures used are with various advantages and disadvantages. There are also two main types of searching, exact and best match. Exact for targeted specific or narrow searches, best match is when the user asks for various possibilities.

To retrieve information from my blog I needed to index the information appropriately by applying appropriate tags. All my entries have been tagged with DITA and relevant subject tags. Thereby providing quick access for each week’s entries.

No comments: