Monday, 21 December 2009

References and Resources

Weblinks
Link to my Blog

Link to my Webspace

Link to JavaScript

Bibliography:
I've listed below the main sites used online. I researched many more too numerous to list. Using the web sites helped to define the importance of IA, IR and many other such subjects this course has debated and discussed.

http://www.webmonkey.com/reference/Unix_Guide
For research into using Unix.

http://www.city.ac.uk/tsg/unix/
Research to unix basics.

http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/
Mass production of information. The need for top class Information Architecuture

http://en.wikipedia.org/wiki/
Used for further understanding of digital information definition and Information as a concept definition

http://www.amazon.co.uk/gp/product/0596514433/ref=cm_rdp_product
Web 2.0 Architectures: What entrepreneurs and information architects need to know

http://www.w3schools.com/
Great all round tutorial guide and help

http://www.webmonkey.com/
Good tutorials, easy to use

http://www.siriusweb.com/tutorials/gifvsjpg/
Good source for background info on images. GIF v JPG

http://www.scantips.com/basics09.html
Comparison GIF, JPG and PNG

http://dublincore.org/documents/dcmes-xml/
General background for information retrieval, indexing and XML information

http://www.brighthub.com/internet/web-development/articles/25619.aspx?p=2
CSS - Pros and Cons

http://www.csszengarden.com/
Great site clearly demonstrating the impact of CSS and its capabilities

http://www.bing.com/
http://www.google.com/
http://searchenginewatch.com/
http://www.searchtools.com

Reference Sources
Belkin, N. J., R.N. Oddy, and H.M. Brooks (1982).
ASK for information retrieval: part 1. background and theory Journal of Documentation 38(2).

Wilson, T.D. (1999) Models in information behaviour research. Journal of Documentation 55 (3), 249-270.

Rosenfeld, L. and Morville, P. (2007), Information Architecture for the World Wide Web (3rd Edition), Sebastopol, CA.: O'Reilly, 504 pp.
The second edition of an excellent introduction to the design, development and delivery of large scale web sites.

Monday, 14 December 2009

Week 10 - Information Architecture

Information Architecture (IA) is about the art and science of organising content, ordering of information, and the categorizing and labeling of this data, content and information. Web site design and build is critical to the success of a site but it’s not IA itself. IA is centred around the findability and usability of information.

We constantly expand upon traditional methods used in libraries but the www reaches further and wider and this evolves with improved methods and techniques. Louis Rosenfeld and Peter Morville 2007 have much to say on the issue of IA one of their obvious suggestions being that web site producers are experienced web consumers.

We all know what we like and don’t like about particular websites. Information architecture plays a critical role in this.

Some key considerations for quality IA:

Organisation systems – Alphabetical, Chronological, Geographical.
Ambiguous – topical, task or audience specific. These can be good for browsing, personally I often defer to search in these instances.
Labelling, thesauri and synonyms – important to index the content, sometimes narrow specific terms at other times broad general terms.
Navigation – intuitive and scalable. Links provided throughout the site to hold and capture the user’s imagination.
Search – reaching from simple search to more advanced, tailoring user’s requirements.
Visual appearance – good design and aesthetics, clarity is key.
Writing Styles – tone and style for the audience, these can differ substantially.
Personalisation and Customisation can both offer some great features to the user - need to bespoke for the audience.

The DITA module provides a taster and background to a broad range of topics. The impact on the digital structure of content has undoubtedly had an impact on the way I work day to day, even the simple things such as accurate filing and folder systems, down to better understanding of the implications of changing business requirements and matching this to IT support and potential development of our online services.

Week 9 - Client side programming

Client side processing is often used when a web page requires high levels of interaction, with frequent changes and updates. Server side processing would be used if interaction between the client and the server were needed for each query, think of Streetmap.co.uk Client side applications generally use less information, are quicker to respond and can therefore be interactive. Certain forms of data transfer such as streaming media use both server and client to ensure speed and accuracy. Plugin software can sometimes be downloaded to speed up the process.

Getting the computers to talk to each other can be achieved using a programming language. The languages have standard terms and contain instruction. In our session we reviewed JavaScript technology, very powerful but very unforgiving. Code must match syntax.

Almost all programming languages follow seven fundamental concepts:

Variables – the buckets that hold the data.
Input and Output – Javascript manipulates the structure and content of the HTML and CSS both ways.
Arrays – think of it as an ordered list of buckets (variables).
Sequence – as the word implies one things happens after another.
Selection – Making code action only when certain conditions are met, so if this, then that.
Iteration – repeating an action, often the While statement.
Procedures – (may also be called functions, methods, sub-routines) the ability to provide commands in your own language.

We started to build a simple JavaScript in the lab to answer the exercise laid out in this section. Time in the lab was limited but we had further discussion with colleagues, both face to face, via discussion boards on CitySpace and email. Resulting in a successful completion of the excerise. Take a look. File put in unix folder separately to display JavaScript. Interacivity lost in transferring.

Andy’s advise of taking it step by step is critical. How to eat an elephant - small size chunks.

Monday, 30 November 2009

Week 8 - Information Retrival

Defining Information Retrieval (IR) has many variations. This seeking can be conscious or sub conscious. Information-seeking behaviour is the process of finding information relevant to the task at hand, given the state of the cognitive environment (Wilson 1999)

Formal definititions of IR can be grouped:

The Users perspective – resolving their anomalous state of knowledge (ASK) – Belkin, Oddy and Brooks (1982).
The System – Software and hardware. Persistent storage, processing and retrieval.
The Sources – Presentation.. Eg. Reuters, AFP, Dialogue who supply information.

Needs can also be classified:

Known Item retrieval – I want to know about the film ‘Orient Express’.
Fact retrieval – Time of trains from Weybridge to Waterloo.
Subject – How often did the trains run on time last year?
Exploratory – What information can be found about types of trains in Britain?

Indexing keywords or concepts allows users to find what they’re looking for. It is important to identify appropriate fields. In the example above this could be destination stations, departure stations, times of trains, types of trains.

Metadata such as the Dublin Core is one form. One must also identify the words (parsing). How you identify: digits, non a-z characters in words, Case, compound words and characters.

It may be useful to remove stop words, to, be, or, not, and etc. Stemming can also prove valuable such as: water, waters, watered and watering all conflate to water.

Specifying a list of synonyms can also provide more accurate results. At NPIA we refer to The Neighbourhood Policing team, aka Citizen Focus team or Local Force Policing. All relevant terms.

Index structures used are with various advantages and disadvantages. There are also two main types of searching, exact and best match. Exact for targeted specific or narrow searches, best match is when the user asks for various possibilities.

To retrieve information from my blog I needed to index the information appropriately by applying appropriate tags. All my entries have been tagged with DITA and relevant subject tags. Thereby providing quick access for each week’s entries.

Friday, 27 November 2009

Week 7 - Databases

The original file approach was slow and cumbersome with risk of error high, often resulting in duplicated information. The next database management system (DBMS) was developed using a centralised approach. Data is stored once and centrally administrated.

The DBMS acts as the interface between the users and the data. It provides improved security via the Administrator who can grant or revoke rights. It provides a comprehensive collection of data instead of multiple files and also offers support via recovery routines and centralised back-up

The Relational Data Model consists of two dimensional rows and columns (columns sometimes called field or attributes) one of these is generally assigned the primary key and no two records can contain the same info.

Most DBMSs can automatically generate a numerical primary key thereby avoiding problems such as same surname. Tables can be related to each other. In referencing the primary key in another table, this is then called the foreign key. This acts as an index.
The most common language used for DBMS is SQL (Structured Query Language) Here’s a simple example of a table depicting the structure of our business within the National Policing Improvement Agency (NPIA), ‘created’ in SQL.

Create table Business Units (Bus_Unit_No int primary key, Bus_ Name char [32], Bus_Dir char[32], Location char [16] ) ;

Insert into Business Unit value (1, ‘Training’, ‘Professional Development’, ‘London’) ;

SELECT statement begins the query:
SELECT * (selecting all table)
FROM Business Units
WHERE Bus_Unit = 1
AND Location = London

Various comparison operators can be used with WHERE, =, <, > etc.
% operates like a wild card eg WHERE Bus_Unit LIKE ‘C%’ would search for all units beginning with C. Logical operators can also be used such as: and, or, not

DBMS becomes much more powerful when joining tables. It’s important to match primary keys to foreign keys to avoid useless rows of information.

Wednesday, 11 November 2009

Week 6 - DOM & CSS

The Document Object Model is a concept or thought process used when designing documents assuming they follow the usual hierarchical structure. Using root structure, documents can have a tree of nodes, element nodes can have subnodes referred to as children but text nodes cannot. Elements can also have an attribute with a value and this is called an 'id' if it has such an 'id' it must be unique. DOM helps to clarify data hierarchies and the use of Cascading Stylesheets (CSS).

CSS provide a language used to apply consistent formatting to web pages. They separate out the presentation semantics, the look and feel, from the actual content itself. They’re most commonly used to style web pages in HTML and XHTML but can also be applied to XML.

Designers used to arrange graphical content on a page using rasters with graphics packages, however they use lots of space and render content difficult to search. CSS provides the rules for the layout of information using specific syntax. They specify such things as typeface, font size, colour, positioning and spacing.

Whilst there are lots of positives to using CSS it’s important to understand some of the disadvantages. Additional download of the CSS slows down delivery of page initially. They use different syntax to HTML so something else to learn! The CSS can also be easily overwritten as open text based so have little security. They can also complicate templates if using dreamweaver or a CMS. Nevertheless, the pros far out weigh the cons.

The cascading term explains how links defined later override earlier links. For example, a link to a web level CSS file may precede a link to a site level file, which in turn precedes a page level style definition. Assuming that all three of these define level 1 headings as different colours, for instance blue, red and green, the bottom link will take precedence and the headings will end up green.

Links to my CSS example1 and example 2

Here's Zen Garden's fantastic example of the power of CSS.

Thursday, 29 October 2009

Week 5 - eXtensible Markup Language (XML)

XML is an extensible mark up language that is a set of rules used to describe or create information. It is not the language itself. Produced by the world wide web consortium it is particularly good at assisting with the transfer of information without losing it's meaning. It is not designed for search but one could transfer using XML into a database to allow searching.

XML consists of a series of elements; these may also have attributes with certain values.

Elements – everything contained within a tag or between a pair of matching opening and closing tags.

Attributes – name/value pairs that assign properties to the given occurrence of a tag. (Course Notes: Lecture 5. 2.2)

A simple XML file can just define the semantics of information and no more, so only what the data means with nothing further added. Document Type Definition (DTD) is a collection of declarations providing an unambiguous way of describing the XML, the legal structure, elements, attributes and values that can be used. All XML should also be well formed, conform to a syntax set of rules eg. root elements should be used and closed. It should also be valid, conform to the grammatical rules set out in the DTD.

An XML based language specifically appropriate to my field of work is the Document Type Definition (DTD) in which we might store books within the National Policing Improvement Library. Here's a list of some of the books I’ve used so far in my MSc course.

The second example is a copy of code to deliver an RSS feed of vacancies on the NPIA main website. Simon, a web developer in my team, walked me through this step by step to help with my understanding and education for this DITA course. I found this invaluable. Makes so much more sense when it’s real.

Sunday, 25 October 2009

Week 4 - Graphics & Images

This session introduced two ways to capture graphical information, Raster and Vector.

Rasters work well for complex images. They consist of discrete units, cells, that divide space and so information recorded can be very blocky in appearance. A sequence of cells, usually in a grid is allocated a numeric value that defines the shape of what it's representing. This value may also represent colour and shade. Raster cells are often referred to as 'pixels' picture elements.

Vector divides information into discrete elements, records position and shape and they are useful for images that have defined boundaries. Geometric elements are recorded as to their location using a co-ordinated system.

The Graphics Interchange Format (GIF) and the Joint Photographic Experts Group (JPEG) are the two most commonly used raster formats. GIF is an 8 bit format using 8 bits for every pixel, 256 distinguishable colours. It is best used at a 2:1 or 3:1 compression ratio. GIF is generally more suitable for less complex images such as graphs or diagrams.

JPEG is defined by ISO standard 10918 1991. As a 24 bit format is can contain 16 million colours. It can sacrifice some of the original data using complex computer compression techniques. For this reason it is often recommended to save originals before loading on www. It can compress to 20:1 and often alteration of colour cannot be seen to the human eye. They generally work well with photography and detailed imagery.

The use of Portable Network Graphics (PNG) is not as prolific but is on the increase and a useful alternative. Designed to use the best of JPG and GIF, it is wise again to store a master copy.

GIF, JPEG and more often now PNG can be included in web pages. These can be embedded within the document or included as links to a separate source. Appropriate img tags are used to do this.

Monday, 12 October 2009

Week 3 - Internet & www

Using the Internet as the highway and www as the car metaphor puts things into perspective for me. Email is just another car that uses the same highway.

The internet provides the infrastructure that allows computers to communicate around the world using the Internet protocol suite (IP). Originally set up by the American Military as an 'intelligence' network. The www is a service available on the internet that allows for hyperlinked documents to be stored and viewed by computers using a set of protocols or instructions. It was invented by CBRN back in the 1990's. Both Internet and www were funded using public money.

The www needs client machines to function and operate eg. a laptop, desktop, mobile devices etc that respond to user input. They in turn talk to a high powered Server that waits for requests from the 'client'. They operate using Transmission Control Protocol/Internet Protocol (TCP/IP). It splits the data into ‘packets’, which are safer and provide more security. The packets then meet up client side to provide the page. A web site can have millions of requests/second. eg. one web page may have 100 file requests to pull together all the information. It is possible to have one machine that acts as server and client.

Computers are identified by their Internet Protocol (IP) address, in effect a unique number. Domain name servers then interpret these numbers into a given name or Uniform Resource Locator, URL.

Hypertext is a natural language used digitally that incorporates links from one bit of information to another, allowing the user to move between documents in a non linear fashion.

Links embedded in pages are added as meta information using markup language. HTML is an example of such markup language that allows links to documents using IP.

Thursday, 8 October 2009

Week 2 - Digital Representation

Session 2, learning the basics of what makes a computer tick.

Data can be stored, represented and managed in different ways. Computers use 2 electronic states, positive and non positive. So they use a binary system, just 2 digits, bits, usually represented by 0 and 1. This is stored inside the computer in various sizes ranging from 1 to 8 bits, or larger. A collection of 8 bits is called a byte. Computers are generally built to deal with multiples of bits 8,16,64,128

This binary information can also represent other kinds of information. Software applications and operating systems need data to be structured in a specific way in order to interpret and process, this is referred to as the file format. ASCII - the American Standard Code for Information Interchange provides a common coding system. It uses 7 digit binary sequences to interpret characters that can then be displayed as text or, more accurately, ASCII 'text' format.

In order for the computer to interpret and apply meaning to information it needs data about the data! Metadata. Markup is a way of including the metadata. This can be semantic - what the data means (sometimes includes presentation). Presentational – purely how it's displayed.

Collections of digital information of a certain topic or theme and in distinct packages, files and folders, just like a traditional library.

Documents can be made up of one file or more. Files may be in more that one document. One file can also contain information from another file. If information is embedded in the document it’s 'file-centred' view, if simply linked it’s 'document-centred'.

This organised hierarchical structure is the beginning of information architecture. Groupings of files are often referred to as folders or directories. The top level is known as the 'root', a unique location in the hierarchical structure, this can be folders, within folders, within sub-directories. Tracing back to the location is known as file path name.

Sunday, 4 October 2009

Week 1 - Blogging & DITA

Daunting. Week 1 as a mature student, looking forward to the challenge and here goes with the first session, learning all about DITA and how to use a blog to record progress!

Digital technologies, or computers don't deal with analogue words or pictures/images, they need numeric information to translate and interpret so it needs to be digitally converted.

Information as a concept has many meanings, (Wikipedia). It's readily available and accessible in various formats and devices.

The technologies are the systems and machines that process and manipulate the information.

Architecture describes the way in which information is organized. How it is collected, organised and displayed impacts on the way it is used. Rosenfeld and Morville, 2002 say that 'fine informtion architecture results in good digital information solutions that work well'.

Using a blog to record weekly activity has been a great way to learn the tool. They are generally intuative, easy to orgnanise and present information.

Content in a blog is usually displayed in reverse chronological order with the latest entry or post appearing on the front page providing permanent links to previous posts. Originally used as online diaries and pointing users to other sites via links, blogs have now extended into individuals or groups using them as a mechanism for delivering news, providing niche or micro-journalism, or as a means of broadcasting information eg. from a live event. Even used to extend the reach of one’s CV or advert.

Information can be disseminated in various ways, by providing links through to other blogs ‘blogroll’, by syndication or using trackback. Content delivered on a blog can often be short and precise with little or no pre-amble delivered on a regular basis, others are longer in length and updated less frequently.

Lastly, somewhat frustrating copying from blog to PDF but think I got there in the end. Using UNIX and FTP has also been interesting!!