Sunday, 30 October 2011

DITA Coursework part 1

Alexandra Santos  (100059670)

DITA Coursework part 1

‘The web has grown exponentially over the past decade. Identifying, organizing and retrieving information becomes more complex as the size of the web increases.’ (Chowdury, 2007, pp.132)

As future information professional I feel that there are two great challenges to face: first in understanding the importance of managing efficiently the storage of information resources and second in using the right tools and technical knowledge in order to improve its accessibility. Sessions 2, 3 and 4 of this module made me reflect on the above mentioned.

 ‘Can the Web change the way people work together and advance knowledge in a small company, a large organization, a country? If it works for a small group and can scale up, can it be used to change the world? We know the web let us do things more quickly, but can it make a phase change in society, a move to a new way of working- and will that be for better or for worse?’ (Berners-Lee, 2000, pp. 216)

Over a decade has passed since Tim Berners-Lee posed these questions and undoubtedly changes have been substantial.  While some studies emerge about greater impacts on the way we think or even on the way we establish human relationships (Turkle, 2011), there are quite evident transformations in the way we communicate, work and exchange information resources. Due to the ever-changing  nature of the web and its resourcefulness there are a number of issues that have been arising to libraries and other information centers, such as dealing with the fast pace of changes, distribution and  control of resources.


Weaving is easy

I became for the first time acquainted with the essence of mark up languages: by highlighting particular sections of information resources through agreed codes and using the right technologies to interpret it we can establish precious links.  After some familiarization with the hypertext notion and HTML I quickly learnt good ways to store information in a computer, link files to documents and understand relationships between data. We were then able to get crucial information about how data was structured and how it was presented, basic concepts quite essential for our information professions.I learnt how to create hyperlinks to files either within the same document, other documents or in the Internet.  For instance using a URL that would retrieve specific files in my computer:
or a URL address from a specific webpage. Then I managed to incorporate them in my index HTML page:


From micromanaging file systems to the Database approach

After having worked in a company where almost every month there were issues regarding errors in our wages, paid leave and other staff details I thought communication between departments was the main issue. Clearly it was but on a different level: the different departments did have most of the right information stored in a specific computer file but had to rely on people to transmit constant updates to each other, this left space for human error to occur. It was also clear a lack of an electronic centralized system that would store, manage and make all this data easily accessible.
I now realize that storing structured data in tables using a relational model can avoid duplication of information, reduce updating errors, improve precision and help users quickly find the right information without having to sift through all the data. Libraries and other information related centers have been applying for quite some time this Database Management System to their bibliographic records:

In essence, the database management approach aims to identify and store discrete data elements that represent the attributes (e.g. author, title, etc) of each specific instance of an entity (i.e. a resource type, such as a book or an article) in a collection. (…) The backbone of a database is the entity-relationship diagram that conceptually represents the various constituent entities, their attributes and, more importantly, their relationships.’ (Chowdhury, 2007, pp.21)

Using this model and trying to search for an exact piece of information in a bibliographic relational database can become quite a struggle, especially when having to use the query language SQL and the right syntax (DITA Lab 03). From simple queries to more complex ones I was able to gather the right data but only after long hours of trial/error. Here are some of the examples of problems encountered:

. Using the underscore to match a character in a specified position; using = and Like commands can bring up different results especially if you need to use a wildcard character like %; when typing the ISBN the need to use quotation marks together including the dashes as an ISBN works as a textual reference and not as a whole number; using “…” instead of ‘…’ and most importantly how to join properly several tables in order to get the right relationship established.

As a bookseller I daily deal with such a structured database (Bertline) that I see as a priceless search tool. Nevertheless it has its many limitations: it can only return an exact answer from each field - exact title, author, ISBN, and sadly not yet designed for repeatable fields. Here is where my colleagues and I use almost simultaneously an example of a text retrieval system and yet the traditional booksellers biggest competitor – Amazon. Amazon’s search engine allows us to search a particular book by typing in the search field author and/or title and/or publisher and/or even subject, misspell a title or an author and it is fast retrieving information.

Unstructured Information – the quest for the right information

‘The task of information retrieval is to find objects in the collection that match the query. Since a computer does not have the time to go through the entire collection for each search, looking at every object separately, the computer must have an index of some sort that enables it to retrieve information by looking up entries in indexes.’ (Arms, 2001, pp. 45)

In Information Retrieval process our tasks as information professionals can become quite interchangeable. When there is a need for information we become users; when we contribute to developing software/hardware system as a support for IR; when we provide users specific information by displaying the source. When contributing to information accessibility, we need to have a clear and careful view in how to manage data in order to become retrievable. One of most useful tools is Indexing:

‘In a word, metadata. Metadata is the primary key that links information architecture to the design of database schema. It allows us to apply the structure and power of relational databases to the heterogeneous, unstructured environments of web sites and intranets. By tagging documents and other information objects with controlled vocabulary metadata, we enable powerful searching, browsing, filtering, and dynamic linking.’ (Rosenfeld and Morville, 2007, pp. 74)

As constant information seekers we all have different information needs. Considering Broder’s taxonomy (Broder, 2002) we can perform navigational, informational or transactional queries and then measure the results. There can be a structure in the way we can search for information: either by using natural language queries or by using the Boolean model.  Although I believe nowadays this last doesn’t make much sense especially in a search engine like Google and that natural language queries together with browsing can furnish more satisfactory outcomes.
There are two approaches in evaluating IR – qualitative in order to evaluate the user’s satisfaction and quantitative in trying to calculate precision and recall. This last option gives a priceless tool to information professionals like us if we ever want to test the efficiency of search engines available for users (MacFarlane, 2007).


The Grail

The Database Management System approach in this quest for the right information gives an effective contribution for IR systems and unstructured information.  Also the use of IR techniques can narrow and facilitate the search but still it is based in words matching, rather than the meaning. Search engines usually give us the sites with more hits and not necessarily the most relevant for us, we get pages based on the high frequency of a term but it can still not answer our query.  Things evolve fast in the WWW and such as Tim Berners-Lee (2000, pp. 169) foresaw: the future (being the present already) is the semantic web.






References
Arms, W. Y., (2001) Digital Libraries. 2nd ed. Cambridge MA: MIT Press.
Berners-Lee, T., (2000) Weaving the Web: The Past, Present and Future of the World Wide Web by its inventor. 2nd ed. London: Texere.
Broder, A., (2002). A taxonomy of web search SIGIR Forum Fall, 36(2). Available at http://www.sigir.org/forum/F2002/broder.pdf [Online: visited 19th October 2011]
Chowdhury G.G., and Chowdhury,S., (2007) Organizing Information from the Shelf to the Web. London: Facet Publishing.
Deitel, H.M., Deitel P.J., and Neito, T.R., (2002) Internet and World Wide Web: How to Program, 2nd ed. New Jersey: Prentice Hall.

Law Librarian Blog, (2011) Launch of Schema.org: Structured Data Markup Using Microdata for Web Search Engines, Law Librarian Blog [blog], 17 June. Available at: http://lawprofessors.typepad.com/law_librarian_blog/2011/06/launch-of-schemaorg-structured-data-markup-using-microdata-for-web-search-engines.html (visited 21st October 2011).

MacFarlane, A., (2007) Evaluation of web search for the information practitioner. Aslib Proceedings, 59(4/5), pp. 352-366.
MacFarlane, Andrew, (2011) Lecture 04: Information Retrieval. London: City University
MacFarlane, Andrew, Butterworth, Richard and Dykes, Jason (2011) Lecture 02: The Internet and the World Wide Web. London: City University
MacFarlane, Andrew, Butterworth, Richard and Krause, Anton (2011) Lecture 03: Structuring and querying information stored in databases. London: City University
Mi Islita, (2009) Document Indexing Tutorial. Available at: http://www.miislita.com/information-retrieval-tutorial/indexing.html [Online: visited 20th October 2011].
Mizzaro,S. (1997). Relevance: The whole history. Journal of the American Society for Information Science 48(9),pp. 810-832.
Morville, P. and Rosenfeld, L., (2007) Information Architecture for the World Wide Web. 3rd ed. Cambridge: O’Reilly.
Turkle, S., (2011) Alone Together: Why We Expect More From Technology and Less from Each Other. New York: BasicBooks.

Thursday, 13 October 2011

Dita Lab Session 02


While writing this blog I make use of the Internet, I can visit websites and also access data from Moodle, the university online learning platform. I could very well be in Hawaii right now and still be able to access information and share documents the same way. Provided they would be connected to this network of computer networks that uses the agreed protocol world wide web I could send or receive information, transfer multimedia files and link to other information. The server software HTTP (Hyper Text Transfer Protocol) would provide me the data and in order to read it I would be using a web browser, like Internet Explorer for instance. 
Email, Telnet, SHH are examples of different protocols.






Task: Achitecting our own simple Information

 From creating our humble index web page we managed to weave more pages using paths that allowed us to travel through information and to share images, text and web pages.


http://chronotext.org/Isaiah/

With these exercises I was able to learn HTML, the mark up language that forms the platform for the world wide web and allows to establish links and share information that can be viewed remotely. After becoming familiarized with HTML I went on to create a humble HTML document by using the right references, adding images and linking some other web pages to it. Oh the sense of achievement after opening it in a browser!:)

I learnt how to create hyperlinks to files either within the same document, other documents or within the internet, using an URL that would retrieve specific files in my computer like this file:///C:/Users/ME/Desktop/Ditalab2/first.html or an URL address from a specific web page, that way managing to incorporate them correctly in an index HTML page.

After finalizing my HTML page

http://www.student.city.ac.uk/~abkb860/Ditalab2/greatfinal.html

it came the publishing stage: where first I needed to map my HTML file into a public directory: the W:/ drive, in order to become accessed by a Web Server (quite easy to get) and by using Telnet program I was able to publish it. I didn't quite get it right straight away as I realized I had the html file name in the URL but forgot that I transferred the whole folder to the W:/ drive, so that way I had to add the folder name too.

Apart from HTML I also had a fiddle around with CSS, it was quite interesting to see the different looks a webpage assumes by using different style sheets. I had fun particularly with Blackle the energy saving search option from Google, since the inverse style sheet uses white on black to facilitate reading. This is how it would look...rather odd.

Thursday, 29 September 2011

Dita Lab Session 01

In this first session of DITA Lab I followed the tasks assigned to us. Initially they seemed to me quite basic but after revising them I realized some things I have missed...

Autumn leaf that depicted my weather report


I got a general picture of how information is "processed" in a computer, from the smallest unit to a document, how it is translated into files and accessed through certain software programs as you can see in this humble graphic of mine:




I also reflected on the importance of a file extension and how this can help the computer map the file with the appropriate program; how Notepad and Wordpad allows us to view/store files using ASCII but not metadata. For this last purpose there are more complex formats like Word or HTML that can gives us information about layout, style, font and others by using tags. Not only have we the ASCII text but also a series of meta information like this:


mso-font-signature:-520081665 -1073717157 41 0 66047 0;}

@font-face

{font-family:Consolas;

panose-1:2 11 6 9 2 2 4 3 2 4;

mso-font-charset:0;

mso-generic-font-family:modern;

mso-font-pitch:fixed;

mso-font-signature:-520092929 1073806591 9 0 415 0;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

{mso-style-unhide:no;

mso-style-qformat:yes;

mso-style-parent:"";

margin-top:0in;

margin-right:0in;

margin-bottom:10.0pt;

margin-left:0in;

line-height:115%;


Documents in HTML format can store several files if they contain images, formatted text, sound files and beyond.
Starting from a file-centred view we progressed towards a document-centred view.

Interesting task was the one where we inserted an image in the file we were working but to make it work as a link, so the image data does no longer need to be in the document, just a reference link to it. (In case the image is deleted from the computer the document won't be able to show it.) It's a useful tip in terms of computer storage.

Did some experimenting and was able to find among other things the "mentioning of an image" in an HTML document viewed in Notepad by using the Find key and typing /jpg./, easy and fast!