Thursday 29 October 2009

Week 5 - eXtensible Markup Language (XML)

XML is an extensible mark up language that is a set of rules used to describe or create information. It is not the language itself. Produced by the world wide web consortium it is particularly good at assisting with the transfer of information without losing it's meaning. It is not designed for search but one could transfer using XML into a database to allow searching.

XML consists of a series of elements; these may also have attributes with certain values.

Elements – everything contained within a tag or between a pair of matching opening and closing tags.

Attributes – name/value pairs that assign properties to the given occurrence of a tag. (Course Notes: Lecture 5. 2.2)

A simple XML file can just define the semantics of information and no more, so only what the data means with nothing further added. Document Type Definition (DTD) is a collection of declarations providing an unambiguous way of describing the XML, the legal structure, elements, attributes and values that can be used. All XML should also be well formed, conform to a syntax set of rules eg. root elements should be used and closed. It should also be valid, conform to the grammatical rules set out in the DTD.

An XML based language specifically appropriate to my field of work is the Document Type Definition (DTD) in which we might store books within the National Policing Improvement Library. Here's a list of some of the books I’ve used so far in my MSc course.

The second example is a copy of code to deliver an RSS feed of vacancies on the NPIA main website. Simon, a web developer in my team, walked me through this step by step to help with my understanding and education for this DITA course. I found this invaluable. Makes so much more sense when it’s real.

Sunday 25 October 2009

Week 4 - Graphics & Images

This session introduced two ways to capture graphical information, Raster and Vector.

Rasters work well for complex images. They consist of discrete units, cells, that divide space and so information recorded can be very blocky in appearance. A sequence of cells, usually in a grid is allocated a numeric value that defines the shape of what it's representing. This value may also represent colour and shade. Raster cells are often referred to as 'pixels' picture elements.

Vector divides information into discrete elements, records position and shape and they are useful for images that have defined boundaries. Geometric elements are recorded as to their location using a co-ordinated system.

The Graphics Interchange Format (GIF) and the Joint Photographic Experts Group (JPEG) are the two most commonly used raster formats. GIF is an 8 bit format using 8 bits for every pixel, 256 distinguishable colours. It is best used at a 2:1 or 3:1 compression ratio. GIF is generally more suitable for less complex images such as graphs or diagrams.

JPEG is defined by ISO standard 10918 1991. As a 24 bit format is can contain 16 million colours. It can sacrifice some of the original data using complex computer compression techniques. For this reason it is often recommended to save originals before loading on www. It can compress to 20:1 and often alteration of colour cannot be seen to the human eye. They generally work well with photography and detailed imagery.

The use of Portable Network Graphics (PNG) is not as prolific but is on the increase and a useful alternative. Designed to use the best of JPG and GIF, it is wise again to store a master copy.

GIF, JPEG and more often now PNG can be included in web pages. These can be embedded within the document or included as links to a separate source. Appropriate img tags are used to do this.

Monday 12 October 2009

Week 3 - Internet & www

Using the Internet as the highway and www as the car metaphor puts things into perspective for me. Email is just another car that uses the same highway.

The internet provides the infrastructure that allows computers to communicate around the world using the Internet protocol suite (IP). Originally set up by the American Military as an 'intelligence' network. The www is a service available on the internet that allows for hyperlinked documents to be stored and viewed by computers using a set of protocols or instructions. It was invented by CBRN back in the 1990's. Both Internet and www were funded using public money.

The www needs client machines to function and operate eg. a laptop, desktop, mobile devices etc that respond to user input. They in turn talk to a high powered Server that waits for requests from the 'client'. They operate using Transmission Control Protocol/Internet Protocol (TCP/IP). It splits the data into ‘packets’, which are safer and provide more security. The packets then meet up client side to provide the page. A web site can have millions of requests/second. eg. one web page may have 100 file requests to pull together all the information. It is possible to have one machine that acts as server and client.

Computers are identified by their Internet Protocol (IP) address, in effect a unique number. Domain name servers then interpret these numbers into a given name or Uniform Resource Locator, URL.

Hypertext is a natural language used digitally that incorporates links from one bit of information to another, allowing the user to move between documents in a non linear fashion.

Links embedded in pages are added as meta information using markup language. HTML is an example of such markup language that allows links to documents using IP.

Thursday 8 October 2009

Week 2 - Digital Representation

Session 2, learning the basics of what makes a computer tick.

Data can be stored, represented and managed in different ways. Computers use 2 electronic states, positive and non positive. So they use a binary system, just 2 digits, bits, usually represented by 0 and 1. This is stored inside the computer in various sizes ranging from 1 to 8 bits, or larger. A collection of 8 bits is called a byte. Computers are generally built to deal with multiples of bits 8,16,64,128

This binary information can also represent other kinds of information. Software applications and operating systems need data to be structured in a specific way in order to interpret and process, this is referred to as the file format. ASCII - the American Standard Code for Information Interchange provides a common coding system. It uses 7 digit binary sequences to interpret characters that can then be displayed as text or, more accurately, ASCII 'text' format.

In order for the computer to interpret and apply meaning to information it needs data about the data! Metadata. Markup is a way of including the metadata. This can be semantic - what the data means (sometimes includes presentation). Presentational – purely how it's displayed.

Collections of digital information of a certain topic or theme and in distinct packages, files and folders, just like a traditional library.

Documents can be made up of one file or more. Files may be in more that one document. One file can also contain information from another file. If information is embedded in the document it’s 'file-centred' view, if simply linked it’s 'document-centred'.

This organised hierarchical structure is the beginning of information architecture. Groupings of files are often referred to as folders or directories. The top level is known as the 'root', a unique location in the hierarchical structure, this can be folders, within folders, within sub-directories. Tracing back to the location is known as file path name.

Sunday 4 October 2009

Week 1 - Blogging & DITA

Daunting. Week 1 as a mature student, looking forward to the challenge and here goes with the first session, learning all about DITA and how to use a blog to record progress!

Digital technologies, or computers don't deal with analogue words or pictures/images, they need numeric information to translate and interpret so it needs to be digitally converted.

Information as a concept has many meanings, (Wikipedia). It's readily available and accessible in various formats and devices.

The technologies are the systems and machines that process and manipulate the information.

Architecture describes the way in which information is organized. How it is collected, organised and displayed impacts on the way it is used. Rosenfeld and Morville, 2002 say that 'fine informtion architecture results in good digital information solutions that work well'.

Using a blog to record weekly activity has been a great way to learn the tool. They are generally intuative, easy to orgnanise and present information.

Content in a blog is usually displayed in reverse chronological order with the latest entry or post appearing on the front page providing permanent links to previous posts. Originally used as online diaries and pointing users to other sites via links, blogs have now extended into individuals or groups using them as a mechanism for delivering news, providing niche or micro-journalism, or as a means of broadcasting information eg. from a live event. Even used to extend the reach of one’s CV or advert.

Information can be disseminated in various ways, by providing links through to other blogs ‘blogroll’, by syndication or using trackback. Content delivered on a blog can often be short and precise with little or no pre-amble delivered on a regular basis, others are longer in length and updated less frequently.

Lastly, somewhat frustrating copying from blog to PDF but think I got there in the end. Using UNIX and FTP has also been interesting!!