Defining Information Retrieval (IR) has many variations. This seeking can be conscious or sub conscious. Information-seeking behaviour is the process of finding information relevant to the task at hand, given the state of the cognitive environment (Wilson 1999)
Formal definititions of IR can be grouped:
The Users perspective – resolving their anomalous state of knowledge (ASK) – Belkin, Oddy and Brooks (1982).
The System – Software and hardware. Persistent storage, processing and retrieval.
The Sources – Presentation.. Eg. Reuters, AFP, Dialogue who supply information.
Needs can also be classified:
Known Item retrieval – I want to know about the film ‘Orient Express’.
Fact retrieval – Time of trains from Weybridge to Waterloo.
Subject – How often did the trains run on time last year?
Exploratory – What information can be found about types of trains in Britain?
Indexing keywords or concepts allows users to find what they’re looking for. It is important to identify appropriate fields. In the example above this could be destination stations, departure stations, times of trains, types of trains.
Metadata such as the Dublin Core is one form. One must also identify the words (parsing). How you identify: digits, non a-z characters in words, Case, compound words and characters.
It may be useful to remove stop words, to, be, or, not, and etc. Stemming can also prove valuable such as: water, waters, watered and watering all conflate to water.
Specifying a list of synonyms can also provide more accurate results. At NPIA we refer to The Neighbourhood Policing team, aka Citizen Focus team or Local Force Policing. All relevant terms.
Index structures used are with various advantages and disadvantages. There are also two main types of searching, exact and best match. Exact for targeted specific or narrow searches, best match is when the user asks for various possibilities.
To retrieve information from my blog I needed to index the information appropriately by applying appropriate tags. All my entries have been tagged with DITA and relevant subject tags. Thereby providing quick access for each week’s entries.
Monday, 30 November 2009
Friday, 27 November 2009
Week 7 - Databases
The original file approach was slow and cumbersome with risk of error high, often resulting in duplicated information. The next database management system (DBMS) was developed using a centralised approach. Data is stored once and centrally administrated.
The DBMS acts as the interface between the users and the data. It provides improved security via the Administrator who can grant or revoke rights. It provides a comprehensive collection of data instead of multiple files and also offers support via recovery routines and centralised back-up
The Relational Data Model consists of two dimensional rows and columns (columns sometimes called field or attributes) one of these is generally assigned the primary key and no two records can contain the same info.
Most DBMSs can automatically generate a numerical primary key thereby avoiding problems such as same surname. Tables can be related to each other. In referencing the primary key in another table, this is then called the foreign key. This acts as an index.
The most common language used for DBMS is SQL (Structured Query Language) Here’s a simple example of a table depicting the structure of our business within the National Policing Improvement Agency (NPIA), ‘created’ in SQL.
Create table Business Units (Bus_Unit_No int primary key, Bus_ Name char [32], Bus_Dir char[32], Location char [16] ) ;
Insert into Business Unit value (1, ‘Training’, ‘Professional Development’, ‘London’) ;
SELECT statement begins the query:
SELECT * (selecting all table)
FROM Business Units
WHERE Bus_Unit = 1
AND Location = London
Various comparison operators can be used with WHERE, =, <, > etc.
% operates like a wild card eg WHERE Bus_Unit LIKE ‘C%’ would search for all units beginning with C. Logical operators can also be used such as: and, or, not
DBMS becomes much more powerful when joining tables. It’s important to match primary keys to foreign keys to avoid useless rows of information.
The DBMS acts as the interface between the users and the data. It provides improved security via the Administrator who can grant or revoke rights. It provides a comprehensive collection of data instead of multiple files and also offers support via recovery routines and centralised back-up
The Relational Data Model consists of two dimensional rows and columns (columns sometimes called field or attributes) one of these is generally assigned the primary key and no two records can contain the same info.
Most DBMSs can automatically generate a numerical primary key thereby avoiding problems such as same surname. Tables can be related to each other. In referencing the primary key in another table, this is then called the foreign key. This acts as an index.
The most common language used for DBMS is SQL (Structured Query Language) Here’s a simple example of a table depicting the structure of our business within the National Policing Improvement Agency (NPIA), ‘created’ in SQL.
Create table Business Units (Bus_Unit_No int primary key, Bus_ Name char [32], Bus_Dir char[32], Location char [16] ) ;
Insert into Business Unit value (1, ‘Training’, ‘Professional Development’, ‘London’) ;
SELECT statement begins the query:
SELECT * (selecting all table)
FROM Business Units
WHERE Bus_Unit = 1
AND Location = London
Various comparison operators can be used with WHERE, =, <, > etc.
% operates like a wild card eg WHERE Bus_Unit LIKE ‘C%’ would search for all units beginning with C. Logical operators can also be used such as: and, or, not
DBMS becomes much more powerful when joining tables. It’s important to match primary keys to foreign keys to avoid useless rows of information.
Labels:
databases,
DBMS,
DITA,
Module 7,
Relationship Data Model,
SQL,
Structured Query Language
Wednesday, 11 November 2009
Week 6 - DOM & CSS
The Document Object Model is a concept or thought process used when designing documents assuming they follow the usual hierarchical structure. Using root structure, documents can have a tree of nodes, element nodes can have subnodes referred to as children but text nodes cannot. Elements can also have an attribute with a value and this is called an 'id' if it has such an 'id' it must be unique. DOM helps to clarify data hierarchies and the use of Cascading Stylesheets (CSS).
CSS provide a language used to apply consistent formatting to web pages. They separate out the presentation semantics, the look and feel, from the actual content itself. They’re most commonly used to style web pages in HTML and XHTML but can also be applied to XML.
Designers used to arrange graphical content on a page using rasters with graphics packages, however they use lots of space and render content difficult to search. CSS provides the rules for the layout of information using specific syntax. They specify such things as typeface, font size, colour, positioning and spacing.
Whilst there are lots of positives to using CSS it’s important to understand some of the disadvantages. Additional download of the CSS slows down delivery of page initially. They use different syntax to HTML so something else to learn! The CSS can also be easily overwritten as open text based so have little security. They can also complicate templates if using dreamweaver or a CMS. Nevertheless, the pros far out weigh the cons.
The cascading term explains how links defined later override earlier links. For example, a link to a web level CSS file may precede a link to a site level file, which in turn precedes a page level style definition. Assuming that all three of these define level 1 headings as different colours, for instance blue, red and green, the bottom link will take precedence and the headings will end up green.
Links to my CSS example1 and example 2
Here's Zen Garden's fantastic example of the power of CSS.
CSS provide a language used to apply consistent formatting to web pages. They separate out the presentation semantics, the look and feel, from the actual content itself. They’re most commonly used to style web pages in HTML and XHTML but can also be applied to XML.
Designers used to arrange graphical content on a page using rasters with graphics packages, however they use lots of space and render content difficult to search. CSS provides the rules for the layout of information using specific syntax. They specify such things as typeface, font size, colour, positioning and spacing.
Whilst there are lots of positives to using CSS it’s important to understand some of the disadvantages. Additional download of the CSS slows down delivery of page initially. They use different syntax to HTML so something else to learn! The CSS can also be easily overwritten as open text based so have little security. They can also complicate templates if using dreamweaver or a CMS. Nevertheless, the pros far out weigh the cons.
The cascading term explains how links defined later override earlier links. For example, a link to a web level CSS file may precede a link to a site level file, which in turn precedes a page level style definition. Assuming that all three of these define level 1 headings as different colours, for instance blue, red and green, the bottom link will take precedence and the headings will end up green.
Links to my CSS example1 and example 2
Here's Zen Garden's fantastic example of the power of CSS.
Subscribe to:
Posts (Atom)