The Semantic Web

Implementations

The aforementioned Semantic Web technologies, and others, will shape the continuing evolution of the Web. Several current and projected implementations of Semantic Web technologies include FOAF and other social tools, Google Scholar and other data integration services, and implementations in the medical field.

The most recent 'revolution' on the World Wide Web has been social networking, dubbed 'Web 2.0'. Websites like Facebook, LinkedIn, and LiveJournal connect real life friends and people with similar interests who've never actually met. One of the problems with websites like these is that the network of friends created on one site is not easily transferable to another social networking site. An RDF and OWL format called Friend of a Friend (FOAF) seeks to solve this problem using Semantic Web technologies. FOAF allows for descriptions of people, their interests, their friends, and the links between all of them to be stored in a machine-readable, portable format that can be extended, merged, and re-used (FOAF project). FOAF has yet to be widely implemented by the major social networking sites, particularly Facebook, but is currently being used by some smaller sites like MyOpera (W3C Wiki). FOAF is being pushed by Tim Berners-Lee and the W3C because, as Berners-Lee says, "It's not the Social Network Sites that are interesting -- it is the Social Network itself. The Social Graph. The way I am connected, not the way my Web pages are connected" (Berners-Lee [2]).

Another way of adding semantic value to web content is through the use of microformats. Microformats are very simple data formats that can be implemented through already existing and widely implemented technologies such as XHTML and XML (Microformats.org). Microformats called hCard, a XHTML version of the vCard format, and hCalendar are starting to be implemented by major websites. As of February 2011, Facebook has added the hCalendar and hCard microformats to every 'event' page stored on the site. This, in turn, lets the user easily export to any of several calendar programs such as Google Calendar or iCal (Microformats Blog).

Scholarly Databases

Social networking will not be the only area impacted by Semantic Web technologies. Semantic Web technologies also allow for more efficient data access by integrating data from multiple sources. In particular, various scientific fields are already benefiting from this function. One data integration service that at least partially relies on semantic web technologies is Google Scholar. The indexing system that Google Scholar looks for Highwire Press-style tagging, and a few other formats, in HTML meta tags (Google). Highwire Press tags are a format devised by Stanford University for disseminating scientific information and is usually produced by Highwire Press software (Stanford). For example, the tag '<meta name=citation_title" content="Article Title" />' indicates the title of the article to be indexed. Many websites that host scientific documents use this and other citation styles that can be indexed by Google Scholar, allowing Google Scholar to return relevant, predominantly peer-reviewed results for scientific queries. A lesser known, but perhaps even more astounding use of data integration in scientific research is WorldWideScience.org. WorldWideScience is a truly federated search. Whereas Google Scholar indexes human readable pages on the 'surface web', WorldWideScience digs into the 'deep web' of databases that aren't normally accessible (WorldWideScience). These types of federated database calls are much less taxing on the distributed system than, say, forces every one of them to generate an HTML page like Google's crawlers do (OSTI).

Medicine

Semantic Web technologies are also poised to make on impact on the medical field. Clinical Research has been hampered by the highly fragmented nature of medical data storage and gathering (Ogbuji). The rapid advancement of the medical field has led to a proliferation of special purpose databases that are not easily integrated. Patient data can easily span multiple, geographically separated sources. To combat this problem, the Cleveland Clinic turned to Semantic Web technologies. The Cleveland Clinic developed a program called SemanticDB that uses XML and RDF to unify all of the distributed data sources. Patient records are available in both human-readable and machine-readable RDF formats. Powerful operations, including querying and form-based data entry, can be performed on patient records (Ogbuji). Systems like this will become increasingly important in the coming years and more and more medical data becomes available on computers under President Obama's push toward electronic medical records (Goldman).

Public health surveillance can also benefit from Semantic Web technologies. Public health surveillance is the ongoing collection and analysis of data to identify and respond to community health problems in a timely manner (Mirhaji). The University of Texas School of Health Information Sciences developed a prototype system called Situation Awareness and Preparedness for Public Health Incidents using Reasoning Engines (SAPPHIRE). SAPPHIRE is the only biosurveillance system that integrates data from many different sources and pieces them together to form constructs that are useful in environmental protection and environmental epidemiology in addition to public health. SAPPHIRE can even be quickly reconfigured to take in real-time data from new sources. During hurricane Katrina, SAPPHIRE was extended to also include information from just-in-time PDA based questionnaire, a clinical information system from Katrina shelters and surveillance reports captured by the Houston Department of Health. All of this was implemented within eight hours of shelters opening, probably the most impressive response time of any entity involved with the Katrina aftermath (Mirhaji). The adaptability and multidisciplinary use of SAPPHIRE ensure that it lives up to its name as a shining jewel among the implementation of Semantic Web technologies.

Another Potential Implementation

The future of the interplay between science and the semantic web is even brighter. The web has made vast amounts of data available to many people, but pictures of tables in PDF files and data encapsulated in generic HTML table tags is offensively useless beyond the relatively small amounts that humans can process in a given time. The future lies in hosting semantically meaningful raw data that can be accessed by computer programs and brought together with other data from other sources for the purpose of scientific advancement. The beginnings of this revolution have already begun with Data.gov. This government sponsored website provides descriptions of the Federal datasets, information about how to access the datasets, and tools that leverage government datasets (Data.gov). More datasets, available in formats including XML, RDF, and others, are added periodically as this vast resource continues to grow. Scientific data produced outside of government agencies should begin to become available in a decentralized way. A hypothetical example would be making the raw data about the various biological and paleontological specimens around the world available. By encapsulating this data in and RDF or OWL format, you could make comparisons across species and other taxons and across geologic time with relatively simple queries. The truly great thing is that you wouldn't be limited to a small subset of the type of specimen you wanted to research. Instead, you would effectively have access to every specimen of the type needed in the entire world. For a concrete example, if you wanted to study the variation in the shape of the skull among various dog breeds, you would make a query that returned, at the very least, a small set of agreed upon, 'important' measurements many, if not all, of the museum preserved specimens for which the breed was known. Websites like Digimorph.org can approach this by hosting measurements (or, in Digimorph's case, something that can be measured), but the benefits will not be fully realized until federated search of all databases of this type is possible. Other types of scientific data that's specific to different regions, such as ecological and geological data, could be hosted on servers within those regions.

<< Technologies Conclusions >>