Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • All entities (resources in the linked data jargon) in the data are named with URIs (Uniform Resource Identifier).
  • The names should in most cases be HTTP(S) URIs, as this allows a standardized way to resolve the names (i.e. access the resources).
  • When a client resolves the name, relevant information about the resource should be provided. This means for example, that a human user receives an easily human-readable representation of the resource, while a machine receives a machine-readable representation of it.
  • The resources should refer (be linked) to other resources when it aids in discoverability, contextualizing, validating, or otherwise improving the useability of the data.

It is crucial to understand that by nature, linked data is atomic and always in the form of a graph. What all the resources are, depends entirely on the use case at hand. There is a deep philosophical distinction though between linked data and traditional data modeling. Traditionally data modeling is done in a siloed way with an emphasis on modeling records: records of income taxes, places of residence, medical history. Through unique identifiers these records can be connected to the individual in question, but there is a clear separation of concerns: the data model is typically somewhat denormalized and serves a process or system view of the domain at hand.

With linked data the records view of the world is certanily possible to model and in some cases the appropriate solution, but in principle, linked data due to being in graph format allows statements of the world to be expressed in a much more natural way, particularly in RDF (Resource Description Framework) which is the lingua franca of linked data. With RDF, the data is expressed as triples, which you can think of simply rows of data in three columns. The first column (subject) determines the perspective, i.e. from what point of view we are talking about. The second column (predicate) determines the context or theme we are talking about. The third column (object) determines the target to which the context is applied to. As an example:

Naming Things

As mentioned, all resources are named ("minted") with URI identifiers, which we can then use to refer to them when needed. URLs and URNs are subset of URIs, so any URL - be it for an image, web site, REST endpoint address or whatever - is already ready to be incorporated to the linked data ecosystem. URNs (e.g. urn:isbn:0-123-456-789-123) can be used as well, but unlike the aforementioned URLs they can't be directly resolved.

There is a deep philosophical difference and reasons between how and what things are named in linked data compared to traditional data modeling e.g. with UML, but covering this requires going through the elementary principles first.

Linked Data is Atomic

It is crucial to understand that by nature, linked data is atomic and always in the form of a graph. The lingua franca of linked data is RDF (Resource Description Framework), which allows for a very intuitive and natural way of representing information. In RDF everything is expressed as triples (3-tuples): statements consisting of three components (resources). You can think of triples as simply rows of data in a three column data structure: the first column represents the subject resource (from whose point of view the statement is made), the second column represents the context resource of what is being stated by the subject, and the third column represents the object or value resource of the statement. Simplified to the extreme, "Finland (subject) is a (predicate) country (object)" is a statement in this form.

Image Added

If all data is explicitly in RDF, it means we have a fully atomic dataset where everything from the types of entities down to their attributes exist as individual resources (nodes) connected by associations (edges). If we expand the example above to also include statements about the number of lakes and Finland's capital, we could end up with the following dataset:

Image Added

As you can see, there is no traditional class/instance structure with inner fields. Finland as an entity does not have fixed attribute slots inside it for the number of lakes nor its capital: everything is expressed by simply adding more associations between individual nodes. As mentioned above, everything is named with an URI, so a more realistic example would actually look like this:

Image Added

The triples in this dataset can be serialized very simply, or stored e.g. in a three column tabular structure:

subject resourcepredicate resourceobject resource
<https://finland.fi/><https://foobar/isA>

<https://foobar/Country>

<https://finland.fi/><https://foobar/numberOfLakes>

"187888"^^xsd:integer

<https://finland.fi/><https://foobar/hasCapital>

<https://finland.fi/Helsinki>

A small exception to the naming rule is that the literal integer value of 187888 does not have an identity (nor do any other literal values).

You might have already guessed that this kind of data structure becomes cumbersome when it is used for example to store lists or arrays. Both are possible in RDF, but  the flexibility of linked data 

Everything Has an Identity

Another 

The literal numeric value 187888 is also a resource (node), but it does not have an identity 


What all the resources are, depends entirely on the use case at hand. There is a deep philosophical distinction though between linked data and traditional data modeling. Traditionally data modeling is done in a siloed way with an emphasis on modeling records, in other words a data structure that describes a set of data for a specific use case. As an example, different information systems might hold data about an individual's income taxes, medical history etc. These data sets relate to the individual indirectly via some kind of a permanent identifier, such as the finnish Personal Identity Code, but the identifier nor the records are meant to represent the concepts 

records of income taxes, places of residence, medical history, etc. Through unique identifiers these records can be connected to the individual in question, but there is a clear separation of concerns: the data model is typically somewhat denormalized and serves a process or system view of the domain at hand.

With linked data the records view of the world is certanily possible to model and in some cases the appropriate solution, but in principle, linked data due to being in graph format allows statements of the world to be expressed in a much more natural wayAs an example:

subject resourcepredicate resourceobject resource
<><http://www.w3.org/1999/02/22-rdf-syntax-ns#type><https://schema.org/LandmarksOrHistoricalBuildings>
subjectpredicateobject
Matti Meikäläinenis aFinnish citizen.

As a graph, this data would look like:

...

In this example, "Matti Meikäläinen" is an individual ("instance"), and "Finnish citizen" is a class where all individuals classified as Finnish citizens belong to. "Is a" functions as an association denoting the class membership. But as stated earlier - the 


Core Vocabularies (Ontologies)

...