Before the implementation and uses of the Semantic Web can be fully explored, it is best to understand at least some of the basic rules and technologies on which it is based. All resources in the semantic web must be accessed by the same URIs that are already used by the World Wide Web. The documents themselves will use markups such as Extensible Markup Language (XML) and Resource Description Framework (RDF) to encapsulate their data.
XML allows the user to define their own tags, the equivalents of tags such as <p> and <span> in HTML and XHTML documents. XML itself does not do anything beyond encapsulate the data between the tags, but the tags can now have much more specific meanings than HTML generics like the aforementioned paragraph and span tags (W3C [1]). For example, you may write a webpage and list yourself as the author. Simply putting your name in a span tag in XHTML is fine if a human is reading your web page. However, if an automatic bibliography machine were to have your webpage passed to it, for instance, it would not be able to tell the difference between the span tag containing your name and any of the other span tags on the page. To fix this problem, you may use some sort of agreed upon "author" tag defined in an XML namespace that's linked to your document (Bray). The bibliography machine can then easily pick out the author's name from the rest of the text on the page. Whereas the span tag indicates only part of the structure of the document, the author tag gives machine-readable semantic meaning to the document's contents.
RDF goes a step beyond XML. Both the strength and drawback of XML is that it is based on data being classified in a strict hierarchy (Herman). For example, an author is kind of person. However, strict hierarchies sometimes break down when confronted with unusual situations. An organization could also be an author. If the previously mentioned hypothetical bibliography machine were to only look for author tags that were contained within person tags, it would miss those contained within organization tags. The two occurrences of the author tag are from two different Document Object Module trees. This is where RDF comes in. RDF is format that is actually used to denote things such as author, title, and modification date on web pages and is specifically designed to be read and understood by computers (W3C [2]). All RDF statements contain a subject, predicate, and object which are represented in tag from by a resource, property, and property value. Without going into how this would actually be represented in an RDF document, in the statement "http://webspace.utexas.edu/pcw288/www/semantic/ was authored by Conrad Williams", the resource would be the web page (all resources are URIs), the property would be author and the property value would be Conrad Williams. The overall point of RDF's loose, generalized structure is to allow data integration even then the data is stored under different schemas (Herman) while XML tends to be more application specific. All of the said, the most common way to represent RDF data is as an XML document under the XML/RDF syntax (W3C [2]). RDF can also be implemented through XHTML attributes. This is known as RDFa and is used in the Creative Commons License at the bottom of this page.
RDF is a more powerful descriptive format than XML, but even it has its failings. An even more powerful descriptive format is Web Ontology Language (OWL, despite the fact the words aren't in that order). OWL is often implemented in the form of RDF/XML documents. OWL and its three sublanguages are even more machine-interpretable than RDF and allows for very fine-grained descriptions including relations between classes in the same level of a document and cardinality of content (McGuinness). Continuing with our example of an article in some publication, suppose that we want to define show that an article is coauthored. RDF may us to define two authors for the document, but not their relationship. OWL allows us to describe the relationship between the authors. Under OWL's syntax, if the relationship "coauthorWith" is given the property characteristic "SymmetricProperty" then the relationship can be inferred by whatever program that is interpreting the data to work both way (i.e. if "x" is coauthor with "y", "y" is coauthor with "x"). OWL also allows for equivalent classes to explicitly noted, such as "writer" and "author", something that RDF is incapable of.
With potentially vast amounts of data distributed across many servers, it seems logical to devise a way to query all of the information about a specific topic at the same time. SPARQL Protocol and RDF Query Language (SPARQL, a recursive acronym) is a Structured Query Language-like (SQL-like) format for querying data stored in or viewed in RDF format (Prud'hommeaux). This allows for "federated search", the querying of multiple searchable resources. For example, a microformat called Friend of a Friend (FOAF, discussed in detail later), could be used by websites all over the world to list names and email addresses of anyone associated. SPARQL could then be used to retrieve this data in the form of a simple query statement like one that would be used with a traditional database.
<< Introduction Implementations >>'The Semantic Web' by Conrad Williams is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
E-mail the Author