Workshop Agreement version of November 15 2006
From MetaLexWiki
- Abstract
- This is the proposed CEN Workshop Agreement on MetaLex XML, an Open XML Interchange Format for Legal and Legislative Resources, for the meeting of december 6 2006 in Paris.
- Date
- 19:09, 15 November 2006 (CET)
- This version
- http://www.metalex.eu/wiki/index.php/Workshop_Agreement_version_of_November_15_2006
- Latest version
- http://www.metalex.eu/wiki/index.php/Workshop_Agreement
- Version history
- http://www.metalex.eu/wiki/index.php?title=Workshop_Agreement&action=history
- Editors
- Alexander Boer (aboer@uva.nl, of the Leibniz Center for Law)
- Fabio Vitali (fabio@cs.unibo.it, of the University of Bologna)
- Emile de Maat (e.demaat@uva.nl, of the Leibniz Center for Law)
- Status
- The metadata section and the content models section are not consistent with eachother. The numbering attribute in the content models section is contentious. The current version of the schema does not implement the attributes suggested by the metadata section. Some of the texts in the MetaLex Concepts section need to be cleared up. The OWL schema is not yet finished, and the description of actions in the metadata section is not consistent with the OWL file.
Contents |
Introduction
The CEN workshop on an Open XML interchange format for legal and legislative resources aims to develop a CEN Workshop Agreement (CWA) on an Open XML interchange format for legal and legislative resources; A CWA is accepted by the CEN and associated standard organisations as a publicly available specification (PAS or pre-norm) for the period of three years, after which the agreement must be renewed or upgraded to a norm.
Goals of the Workshop
The business plan agreed by the workshop states the objectives of the workshop. This document is a draft specification to be submitted for workshop agreement by december 2006. Additional activities are expected, and some of the categories of legal documents and entities explicitly excluded here may be included in later activities.
Origin
The MetaLex schema is intended as an interchange format between other, more jurisdiction-specific XML standards. As such it is very abstract. The MetaLex/CEN schema is based on best practices from amongst others the previous versions of the MetaLex schema, the Akoma Ntoso schema, and the Norme in Rete schema. Other relevant parties are i.a. LexDania, CHLexML, FORMEX, etc. In addition to these government or open standards there are many XML languages for publishing legislation in use by publishers. Many of the participants of the workshop have also been involved in the Legislative XML workshops (see for instance the archive of the frontpage of the MetaLex website for previous calls for participation and online proceedings and presentations).
Scope of this Agreement
This proposal contains agreements about:
- the abstract content models supported by the standard,
- the way a text fragment is cited,
- the way metadata is added to a document, and
- a generic model for organizing metadata in RDF.
Terminology
- Bibliographic entity: A work, expression, manifestation, or item, as intended by the IFLA-FRBR, or any part of these.
- Work: A work or work of law is the abstract collection associated to the set of provisions that can be described and named as a single entity, that was originally created by its originator (a legislator) in a single creative process. The concept of work was taken from the IFLA-FRBR.
- Expression: An expression, version, or variant is (one of) the realization(s) of a work in a specific collection of actual sentences and words and punctuation and (where appropriate) presentation choices, made by the originator of the expression. For instance, each consolidation of a formal act of law is an expression of that work. For instance, the English, Dutch, Italian, and German versions of a European directive are different expressions of the same work. The concept of expression was taken from the IFLA-FRBR.
- Manifestation: A manifestation is one of the physical or electronic embodiments of an expression of a work. Thus, a specific XML representation, a PDF file (as generated by printing into PDF a specific Word file with a specific PDF distiller), a printed booklet, all represent different manifestations of the same expression, version, or variant of a work. The concept of manifestation was taken from the IFLA-FRBR.
- Item: An item is one specific exemplar of a manifestation: a specific copy of a book on a specific shelf in a library, a file stored on a computer in a specific location, etc. Items stored on a computer can be easily copied to another location, resulting in another item, but the same manifestation. The concept of item was taken from the IFLA-FRBR.
- Source of law: A source of law is a document that can be, is, was, or presumably will be used to back an argument concerning the existence of a norm or definition in a certain legal system, or, alternatively, a writing used by a competent legislator to communicate a norm to a certain group of addressees. The classification as a source of law can be derived from the intent and legislative competence of its originator, or from the way it is used in the processes of law. Source of law is a familiar concept in law schools, and may be used to refer to both legislators (fonti delle leggi, sources des lois), legislation and case law (fonti del diritto, sources du droit), custom, etc. In this workshop it strictly refers to bibliographic entities, i.e. documents. A single source of law is an expression, unless otherwise specified.
MetaLex Concepts
This section is informative.
Legal and Legislative Resource, and Sources of Law
The CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources, declares, by way of its title, an interest in legal and legislative resources, but the scope statement of the first workshop agreement limits the applicability of the proposed XML standard to sources of law and references to sources of law.
As understood by the members of the workshop, the source of law is a writing that can be, is, was, or presumably will be used to back an argument concerning the existence of a norm or definition in a certain legal system, or, alternatively, a writing used by a competent legislator to communicate a norm to a certain group of addressees. Both the legislator and the user of the source of law understand it as a medium used for communicating the existence of legal norms, as well as auxiliary declarations required for the proper understanding of the legal norms communicated between legislator and user. Because the CEN Workshop is concerned only with an XML standard, it chooses not to appeal to other common ingredients of definitions of law that have no relevant counterpart in Information Technology, like justice, coercion, judges, etc.
Source of law is a familiar concept in law schools, and may be used to refer to both legislators (fonti delle leggi, sources des lois), legislation and case law (fonti del diritto, sources du droit), custom, etc. It should be noted that many romance languages make a distinction between the legislator as source of law, by way of speaking or writing, and the law as source of right(s), which is presumably what the existence of the law brings about.
In its broadest sense, the source of law is anything that can be conceived of as the originator of legal norms. In the context of this workshop it strictly refers to communication in writing, and in a sense covers the fonti del diritto in Italian and sources du droit in French. There are two main categories of source of law in writing: legislation and case law.
As understood by the members of the workshop, legislative resources and legal resources are also writings.
The notion of a legislative resource includes legislation, and all writings produced by the legislator explaining and justifying legislation. The legislator is a legal person: it exists separately from any natural persons and organizations involved in the process of drafting and evaluating legislation. It is the formally correct completion of certain processes, usually dictated by law, that makes the legislator the formal author of a writing, and at the same time identifies the addressees to whom it applies. Obviously, the persons and organizations involved in the process of legislating may produce writings that are clearly precursors or legally required ingredients to the end product. These writings are also included in the notion of a legislative resource, but in this case it is not easy to give straightforward rules for deciding whether they are, or are not to be considered legislative resources. Different jurisdictions will have different theories on this subject.
The notion of a legal resource is potentially very wide in scope: it includes at least sources of law and legislative resources. It also includes all writings that create a legally recognized fact. All writings required in a court procedure, in legislating, or in exercising a(nother) declarative or proclamative power (granting permits, taking administrative decisions, civil marriage ceremonies, etc.) are potentially legal resources, as well as writings required for compliance with duties to inform (tax application forms, etc.). These writings are only of interest to the CEN Workshop if:
- large quantities of these writings have to be stored;
- there is an interest in information retrieval from these writings;
- there is a certain amount of standardization of the structure of the writing;
- the writing exhibits certain features that are common accross many different jurisdictions.
- no other, more appropriate, standard exists;
Legal and Legislative Activities
Each relation between two expressions of a source of law, or two XML manifestations of an expression, is mediated by some action in which both bibliographic entities are direct or indirect participants. A modification for instance relates the modified text, the text to be modified, and the modifying text if one is required for modiciation. Similarly, one could also consider an annotation action that results in a new XML manifestation of the same expression with additional metadata.
The general principle is that it should never be necessary to actually make changes in an XML manifestation once it exists. Metadata should adhere to the principle of monotony: adding metadata should not invalidate existing metadata.
There are two ways to associate a description of an action to an XML document: by embedding it (which is an annotation action) or by making an RDF description of the action, referring to the expressions or manifestations to which the action applies.
Note that it makes a difference at which date an XML manifestation is serialized. Let T1, T2, T3, T4 be a chronological order of four dates. At date T1, source S may for state that it will enter into force at date T3. At T2 S is however modified to state that it will enter into force at date T4, which actually happens at T4. If we now consider a rendering at T1 and T4, then we consider the metadata at T1 to be hypothetical because it refers to the future, while the metadata at T4 is actual. The previous metadata at T1 should however not be considered invalid, because retaining it has actual value to reconstruct what beliefs were legitimately held about S at T1. If the metadata is embedded in the manifestation, the metadata itself can be dated to the date of serialization of the manifestation. If the metadata was serialized in RDF, then it has to be added explicitly as a qualification of the statement. Another relevant qualification of a metadata statement is its author.
This approach was already described and implemented in Boer et al. (2004) for MetaLex, and was earlier described as an approach for metadata in Lagoze et al. (2000).
Legal Subjects, Legal Actions, and Legal Competence
There are several closely related common methods for describing a legal order: classification of legal subjects, classification of legal objects, and classification of legal norms.
Classification of legal subjects starts from the group of intentional agents that constitute the legal order, and classifies them by their roles in the legal order. Relevant differentiae are the kinds of actions these agents can engage in, the legal consequences that these actions have, and the ways in which they are liable or privileged towards other agents (i.e. what other agents can do to them). A related notion is that of jurisdiction over the person: for each source of law we can ask ourselves which group of agents it applies to.
The classification of legal objects identifies which behaviours (for instance those involving driving a vehicle, those involving sales transactions) and matters (for instance the design of ships) are the object of the law. This is related to the (unfortunately named) subject-matter jurisdiction.
The legal norm does one of three things: it directs behaviour, by way of a prohibition or an obligation, it allows behaviour, or it attributes the legal competence (power, potestative right) to bring about certain legal consequences. It has three components: a description of the intended addressees (those to whom the norm applies: derived from German normadressat), a description of the subject matter (the legal object: a behaviour of the addressee or some matter that can be brought about by the addressee), and a legal qualification of the subject matter (for instance as a prohibition, obligation, permission, or attribution of a competence, or more complex patterns involving these). The CEN Workshop will not attempt to define a final jurisdiction-independent theory of any of these components.
Legal subjects are at least natural persons and legal persons. Legal persons are entities with legal personality (for instance a limited company, or a municipality). Entities with legal personality generally speaking act though agents representing them, since they cannot act by themselves. There are also legal subjects that are neither natural persons nor legal persons. A criminal organization for instance typically has no legal personality, but can be held responsible for its behaviour. A constituting part of a legal person can in some cases also be a legal subject without personality. Other special legal subjects, like the parliament, may also be excluded from legal personality based on the assumption that their existence preceeds the present legal order but it has a role in bringing it about.
The kinds of legal action we are interested in are usually performed by public legal subjects: organizations dedicated to some public interest. We also distinguish public persons, or persons brought about by public law, public bodies, their constituting parts, public offices, that are held by a natural person for a specific period, as well as any auxiliary public legal subject that does not belong to any other class.
The object of interest to the legal order is the behaviour of legal subjects. Part of this behaviour consists of actions. An action is an intentional behaviour in the sense that it occurs because the legal subject has the intention to bring some effect about. This doesn't mean that the action cannot have unintentional effects: I may want to pick up a glass, but instead unintentionally knock it over. In this case I unintentionally knocked over the glass, but acted with another intention, i.e. to pick it up.
Of special interest for the CEN standard are actions that only public legal entities can engage in that are relevant to the production of sources of law, and public competences used in the process. The notion of competence is important for two reasons:
- In the discussion of legal subjects it is often unclear whether one is talking about the (natural) person itself or the role he fills at the moment of acting. The minister is competent to produce ministerial guidelines, but the person filling this office can only use the competence when acting in his capacity as minister, and loses the competence when he stops being minister. The minister has decided to grant a pardon to illegal immigrants and the minister has decided to divorce his wife are fundamentally different actions: in the first case he is acting as minister (the office), uses a public competence, and creates legal consequences, while in the second case he is a natural person making a decision. If the law states that the minister sets restrictions additional to this law on some subject matter, it should be clear that the competence attributed here remains with the office of minister and not with the person filling the office at the moment of enacting the relevant law. The agent acting is in this standard obviously always the office holder if such ambiguity exists. The office may however hold competences of a different character. The minister may hold the competence to create internal guidelines for his employees, the competence to create ministerial directives on specific types of legal object, and the competence to make limited changes in formal law that can otherwise only be changed by parliament (for instance indexing tax law for inflation). It should be clear which competence he was exercising.
- In most jurisdictions the fact that a certain directive was based on a competence attributed in a certain source of law is relevant to deciding:
- the relative priority of the directive in the legal system in case of apparent conflict with other directives;
- the addressees;
- the exact scope of the subject matter;
- the potential legal consequences of violation.
Provided that we take the position that the user will not generally question the validity of a legal action on a source of law, some subset of {action, agent, competence} is usually sufficient to store metadata about the lifecycle and position of legislation.
Separating Content and Metadata
A guiding principle of the workshop is that content is described by XML Schema, and metadata by the Resource Description Framework (RDF). XML Schema refers to an accessible world consisting of XML data structures that can (usually) be retrieved in some way, and RDF to an in principle inaccessible one consisting of things out there that can only be referred to by way of a symbol that functions as a mental substitute for the thing itself.
XML schema uses the same symbols – uniform resource identifiers – as RDF. The fundamental difference is that the URI is used in XML schema is used to attach identifiers to XML data structures in order for software to refer to XML data structures, and in RDF to refer to things in a usually inaccessible domain of reference out there that are described by way of XML data structures.
A set {a, b, c} in a knowledge representation language could be a representation
of the three legs of a specific barstool out there, in which case a, b, c are only
identifiers for real things that are inaccessible. But in some cases a knowledge
representation language like RDF does indeed refer to things that are accessible, or
the distinction between the real thing and the description of the thing is immaterial.
This is for instance the case with fully abstract notions like a mathematical set.
There is no material difference between the set {a, b, c} and the model theoretic
representation {a, b, c} of this set, since this mathematical set only exists by
virtue of the fact that it is being described.
Something similar is the case with documents: for most purposes there is no
material difference between the accessible XML structure and the document. Only
when one considers issues like how many copies of the document are around one
starts noting a material difference. It has therefore become a convention that the XML data structure referred to by a URI is a manifestation of the document, and not some description of it.
Compare the following two XML statements:
- <name xml:base=‘‘http://www.metalex.eu/people.xml’’ xml:id=‘‘aboer’’>Alexander Boer</name>
- <xml-name-element rdf:about=‘‘http://www.metalex.eu/people.xml#aboer’’><rdf:value>Alexander Boer</rdf:value></xml-name-element>
The first one is an XML element defined in some XML schema with a URI identity
marker http://www.metalex.eu/people.xml#aboer. It identifies the XML element <name>Alexander Boer</name>,
and not its value Alexander Boer or the fact that the value is a name, or its carrier a person. The second statement is RDF, and encodes a (rather trivial) statement about the thing referred to by the identity marker http://foo.org/bar, being <name>Alexander Boer</name>.
This difference in interpretation cannot be inferred from the XML but is a difference in the semantics of basic XML (specifically xml:base and xml:id) and RDF.
The interface between metadata and the XML structure consists
of these shared URIs. The XML structure is what the URI refers to, and
in RDF you describe the thing referred to by the URI. Obviously it is possible to describe everything, including the structure, contained text blocks, references, etc. in RDF.
The big question of course is which part of this RDF data belongs to the standard. The kind of entities in the XML document structure are often classified as follows:
- Form
- A law can be recognized by certain required phrases and formulas. These formal requirements usually come from the constitution, administrative law, and guidelines on legislative drafting. The formal requirements are influenced mostly by considerations of consistency of language, ease of access for the reader, and giving cues to the reader to help him position the document in relation to activities of public bodies. The formal structure of the document provides a context for the interpretation of the content of the document. For other legal resources, the structure may be less formal and in the end only ordinary text structure elements may be applicable (e.g. chapters, sections, paragraphs and sentences).
- Role
- Although we may look at the phrases and formulas in a written decision to classify a document as a law, we know that it is not the structure of the document that makes it a law, but the role the document plays in the activities of public bodies - most importantly the activities that produced the document. This is often called metadata in the narrow sense: it is not in the document, but describes the relation between the document and its environment. The date of publication of a document is an example.
- Content
- Knowledge representation for knowledge-based systems and knowledge management is concerned with the question what the text contained in the document (interpreted in the context of form and role of the document) means for our activities. Is it a source of constraints on your activities, or maybe an instrument for predicting other people's behaviour towards you, or making other people do desirable things for you, or does it give you the right to do things you would otherwise not be allowed to do?
Role is usually strictly considered to be in the domain of metadata, while content is considered the domain of knowledge representation languages. This distinction between role and content makes complete sense if you consider for instance a medical textbook: its content is completely unrelated to contextual information describing its role in activities involving the book like its publisher, author, date of publication, the schools that use it, etc. There is no problem with using different standards for both.
In law this is different, and this is the main reason that more general standards often don't seem to apply to law. The content of one source of law may be the context in which the other source of law is interpreted. A simple example is the modification of a source of law Sin pursuant to a directive to modify it in the modifying source of law Strans, resulting in a modified source of law Sout. We can conceptualize this as an action producing Sout, using Sin and Strans as an instrument.
Alternatively we can conceptualize it as a process working on the normative system: Strans causes a state transition from Sin to Sout. Let Strans state that at Ttrans, Sin will be replaced in the normative system by Sout. Firstly, observe that the modification is achieved by Strans using the same legislative instruments for working other effects: either by a directive or by a declaration. If the legislator uses a directive, then it is possible (for some editor, or for the addressees who have to make the modification for themselves) to violate it by not making the modification. If the legislator declares it, it happens automatically as a delayed effect of the action of the legislator.
The transforming action therefore breaks down the distinction between role and content. Also observe that Ttrans – the timepoint at which the modification goes into effect – is context, and therefore metadata, of Sin and Sout, but part of the content of Strans. Ttrans is therefore a property of Sin (opening the time interval in which it exists) and Sout (closing the time interval in which it exists), and is therefore described by the XML structures of Sin and Sout. The actual source of Ttrans is however Strans, and a change in Strans (before its one-shot execution) also changes the metadata of Sin and Sout: for maintenance purposes it is therefore much more convenient to keep the triple {Strans, Sin, Sout} together, and it is only natural to explicitly organize them by their thematic roles in a description of the action itself. Hence the focus on actions.
This applies to all activities pertaining to the lifecycle of a source of law. This also applies to all activities undertaken based on the competence to act attributed, or delegated, or subdelegated, by a source of law. Attribution, delegation, and subdelegation of the competence to legislate are themselves activities subject to explicit regulation. Another example is the notion of applicability: (parts of) legislative documents may also set conditions on the applicability of other (parts of) legislative documents.
Most of the specifically legal classifications we would otherwise be tempted to use as a descriptive name of XML elements on law, or as a property of such an element, turn out to be roles of the involved bibliographic entities is actions, in business processes, etc.
The CEN standard strips bibliographic entities used in law from their role. The content models are very generic, but at the same time also slightly different from presentation-oriented content models (XHTML, etc.) etc. in their choice of granularity. It focuses on the structure that is relevant to and intended by the legislator, and not the editor of the specific manifestation of the text we are looking at.
Content Models instead of Elements
An MetaLex XML element is characterized by a name, a content model, and zero or more attributes.
According to the philosophy of descriptive markup (cf. [1]), the name of an XML element is usually semantically-charged (i.e. it provides a hint as to the meaning of the text fragment, or its role within the whole of the document). Additional information about the content of the element (e.g. explanatory, parametric, collateral data) goes into attributes. The content model is an algebraic expression of the elements that may (or must) be found in the content of the element.
Generic elements, on the other hand, are named after the content model: they are merely a label identifying the kind of content model. Generic elements can be reused in different situations but they provide no description of their content and role. It is possible to add specific attributes providing hints as to their meaning and role.
All XML vocabularies contain a mix of descriptive and generic elements, and, depending on the foreseen uses of the documents, emphasize one of the approaches. For instance, vocabularies with precise procedural semantics
(e.g. XSLT, SVG) do not depend on generic elements, while vocabularies intended
for diverse content (for instance XHTML) employ generic elements. Consider for instance that in XHTML 2.0 both <a> and <img> elements are being replaced or phased out in favour of generic substitutes using attributes.
Current validation languages (e.g., XML Schema) do not allow validation rules to be associated to attribute values, so element names are currently the only way to associate validation rules to documents. This is a cause of pollution of principles, forcing semantically-charged elements to assume a rigid content model, while generic elements take care of odd situations that where not foreseen when the content models where designed.
Legislative documents are in a strange position with regard to standards. On the one hand, legislative drafting technique has a long tradition, and often its own standards of what legislative documents should look like. This makes descriptive markup combined with strict content models very tempting. On the other hand, there are so many exceptions that can be found in concrete examples throughout the legislative history of the average state that we sometimes just want to give up on precise description altogether and resort to incredibly generic elements, in particular because there should be not one iota of difference between the original expression of the legislator and the XML manifestation of that expression.
If we are too normative we end up preventing the markup of a possibly large range of documents, especially legacy documents. On the other hand, giving up on description of elements we end up preventing the analysis of the content of legislative documents, and permitting a chaotic and anarchist approach to legal document management that will prevent any further application of sophisticated automated assistance.
To further add to the complexity of this issue we have the problem of multilingualism, that guarantees that perfectly descriptive and appropriate element names in the language that the proposed vocabulary is supposed to cater to are incomprehensible gibberish in all others.
The approah of the workshop is to provide for a complete and automatic interchangeability of approaches, from generic to descriptive and vice versa. While locally an editorial staff can enjoy the precision and naturalness of descriptive markup catering to a specific jurisdiction and language, other users can obtain full and immediately understandable results using another vocabulary.
This can be achieved by using two special attributes, name and type that provide information about the meaning and the content model of the element. The values for these attributes can be freely used as names for the elements. The name of the element must be identical to the either the value of the attribute name or the value of the attribute type.
This approach is different from the language extensions (implemented using substitution groups) of legacy MetaLex (1.3.1 and before): no central registry of extensions is necessary.
The MetaLex standard is based on a very limited set of content models. All elements having the same content model must have the same value of the type attribute. For content model there is a generic element with the same name.
Suppose for instance that we have an element with the name clause and the content model container. The following elements are equivalent from the point of view of the standard:
- <clause metalex:type="metalex:container" metalex:name="clause"> ... </clause>
- <clause metalex:type="metalex:container"> ... </clause>
- <metalex:container metalex:type="metalex:container" metalex:name="clause"> ... </metalex:container>
- <metalex:container metalex:name="clause"> ... </metalex:container>
If either one of the two basic attributes is missing, the name of the element produces the missing value.
Content Models and Attributes
This section is normative.
Scope
This section is is about the level of granularity of identification of the structure of legal and legislative resources. The choice of a level of granularity should first and foremost make unambiguous reference of text fragments possible. Secondly, it should allow the identification of semantically coherent text fragments. Thirdly, the assignment of elements to text fragments should be unambiguous: the same text should be marked up the same by different encoders. Fourthly, it should make reproducing layout effects possible.
Namespace
MetaLex/CEN is the MetaLex norm agreed upon in the CEN Workshop Agreement. the name MetaLex/CEN distinguishes the CEN/ISSS proposal from earlier versions of the MetaLex schema. The namespace of MetaLex/CEN is: to be decided. The names of elements and attributes in this document are unqualified wherever ambiguity about namespaces does not exist. If ambiguity could arise, the namespace of MetaLex is referred to by the qualified namespace metalex:
Attributes
Attributes give meaning to elements. Meaning in terms of semantics, roles, additional information, metadata. This means than any element with any name, as long as it has the correct set of attributes, can be placed in the document. This assures that conformance, validity and interchange is based on attributes alone, and not element names.
Attribute names are fixed and required, element names are subject to localization, of both language and jurisdiction.
The attributes are the following:
- name
- a semantically-charged name that identifies in a human-understandable way the purpose and meaning and role of the element. This attribute is required for generic elements, and optional for descriptive elements (but if present it must coincide with the element name).
- type
- the name of one of the (few) content models approved for use in this schema. Of course, the content model of the element must also be coherent with the content model specified in the
typeattribute. Furthermore, this attribute is required for descriptive elements, and optional for generic elements (but if present it must coincide with the element name). - id
- an unique id that identifies that element among all of the document. The syntax of the value of the id attribute depends on the
numberingattribute. All elements in the document must have such an id except for globally and locally unique ones (where they are optional). - numbering
- the mechanism by which the id of the element must be created. The values are globally unique, locally unique, globally numbered, locally numbered, globally unnumbered and locally unnumbered. All elements are required to have such attribute. This one is contentious --Aboer 01:09, 12 November 2006 (CET)
- class
- the name of a style class that can be found in a presentation package. Optional attribute for all elements.
- style
- a collection of (CSS) styles that need to be associated to the current element only. Optional attribute for all elements.
- date
- all elements containing or referring to a date or a moment in time must provide a normalized value for that date, conformant to the subset of ISO 8601 used in the XML Schema language, that can be used regardless of understandability of the element's name or the ambiguity of the shown value.
Metadata attributes
For metadata there are two candidate solutions. Consistent with the current XML schema are the following:
- href
- the URI of a related resource. Defined (and required) for reference and metadata elements only.
- appliesto
- a reference to an instance of a class of the ontology. Defined and required for metadata elements only.
- value
- ???
- show-as
- a readable string to be used whenever displayable concepts are being referred to by codes or URI. For instance, all references to class instances in the topology will require a show-as attribute for display purposes. Defined and optional for metadata elements only.
The proposed replacement (consistent with [2]) is:
- about
- the subject (URI reference to an entity) of an RDF/A statement;
- property
- the predicate of an RDF/A statement whose object is a literal value;
- rel
- the predicate of an RDF/A statement whose object is a URI reference to another entity;
- href
- the object URI reference to another entity of an RDF/A statement;
- content
- the literal value that is the object of an RDF/A statement;
The only thing left is the showAs: we need a label for both the predicate and the object at least. Labels are useful for applications that want to show the user a description of a relation and target that is understandable (and not a URI or QName derived from a URI).
The XML schema of MetaLex/CEN is the normative source on the use of attributes, and the values allowed.
xml:base
It must be possible to establish the base of an element, in conformance with the XML Base specification and IETF RFC 2396. The string concatenation base+'#'+id should result in a valid URI, conformant to the addressing recommendations of W3C.
The easiest way to achieve this requirement is to always add an xml:base attribute in scope of the element. The xml:base is in scope if it is on the element itself, or on one of its ancestors.
Content Models
All content models are constrained to just 10 different abstract complex types, of which 5 fundamental (the patterns) and 5 specialized for specific purposes. Three more complex types are added for creating the hierarchy of derivation, but cannot be used for anything but derivation.
These are the fundamental patterns:
- urTcontainer
- a container of a sequence of other elements;
- urThcontainer
- a hierarchical container of nested elements with titles and numbers;
- urTblock
- the largest structure where text and inline elements mix freely, e.g., paragraphs and other (usually vertically-organized) containers of both text and smaller structures;
- urTinline
- an inline container of text and other inline elements (e.g., bold); and
- urTmilestone
- an empty element that can be found in the text (as opposed to urTmeta, following).
These are specialized patterns:
- urTroot
- the container which is actually the root of the document;
- urTmcontainer
- a container of metadata elements;
- urTmeta
- an empty element that is interpreted as metadata;
- urTanchor
- an inline element with anchor properties (it is the source or destination of a hypertext anchor); and
- urTdate
- an inline element with date properties (so that the contained date, in whichever format, can be specified in an unambiguous syntax in its e:date attribute.
The three basic types of which all others are derived are:
- urType
- which specifies the basic attributes for all our elements;
- urContentType
- for the elements that contain content; and
- urMetaType
- for the elements that become metadata.
These types only distribute the correct attributes to the actual content models. All names of these abstract types are prefixed with ur, to signal their abstractness. Abstractness in this context means that they cannot be (directly) instantiated.
Abstract Elements
Abstract elements for each element contained in the abstract complex types:
- ur-root,
- ur-container,
- ur-hcontainer,
- ur-block,
- ur-htitle,
- ur-inline,
- ur-milestone,
- ur-mcontainer,
- ur-meta.
These elements are used as the heads of the substitution groups of elements conforming to the standard. Abstractness in this context means that they cannot be (directly) instantiated, but only substituted. All names of abstract elements are also prefixed with 'ur-' (with a dash).
Generic Elements and Types
The schema contains generic types and generic elements for each abstract type and abstract element, to be used when appropriate. All generic types are prefixed by gen (genTroot, etc.). The generic types are used to define the generic elements.
Generic elements are named after the patterns of which they are substitutions:
- root
- container
- hcontainer
- block
- htitle
- inline
- milestone
- mcontainer
- meta
Generic elements can be instantiated, and conform to the standard.
Concrete Types
Concrete types are included for all abstract types:
- Troot
- Tcontainer
- Thcontainer
- Tblock
- Thtitle
- Tinline
- Tmilestone
- Tmcontainer
- Tmeta
These may be used for defining elements conforming to the standard.
Presentation
The MetaLex schema provides no predefined support for presentation. Document publishers may provide hints as to how the document should be presented. Similarly to the use of CSS styles in HTML, this happens through the use of three different and complementary solutions: element names, element classes and element styles.
- Element names
- Usually presentation is associated to element names. Since we have two issues in MetaLex that affect element names, analogously we expect presentation agents to provide support for both approaches. That is to say, for descriptive elements all CSS styles or XSLT templates may be associated to either elements whose name is X, or whose
nameattribute is X. Analogously, for elements whose name is not known, a default behavior may be provided associated to the type, so default presentation must be associated to elements whose name is equal to the type and the value attribute is not known. Presentation engines will usually create their own instructions for such elements. - Element classes
- special presentation rules may be provided for all elements whose attribute
classhas a certain value. This is similar to what is done in HTML/CSS: usually the presentation rule is used for a number of similarly classed elements, and possibly in a number of different documents. For these reasons, the presentation instructions are usually external (i.e., stored in a separate document) and accessed similarly by all documents that use the same set of classes. Presentation engines may use such instructions, but they may also ignore them and replace them with their others, or no presentation instructions at all. - Element styles
- individual elements could require once-only presentation rules, to reproduce atypical layout decisions in the source document. In this case it is inappropriate to use a shared stylesheet, and the
stylemay be used to store the presentation instruction within the element. It is strongly recommended that presentation engines usestyleinstructions, but they may also ignore them.
Conformance of Elements to the Standard
To conform to the standard you may use generic elements defined by the standard. You may also create new elements conforming to the standard.
Defined Elements
To create an element conforming to the standard that can be used in XML manifestations of sources of law, define a non-abstract complex type, and create an element belonging to the substitution group of one of the abstract elements according to the subtype specified, as follows:
<xsd:element name="article" substitutionGroup="e:ur-hcontainer" type="Tarticle" />
The process of creating the concrete vocabulary of elements therefore works as follows:
- You must use one of the abstract content models provided for your element;
- You may define a restriction of the corresponding concrete type;
- You may not define an extension to the content model of a concrete type;
- You may define an extension of a concrete type for adding attributes;
- You must define the elements as a substitution group of one of the abstract elements and you must identify a type which is either one of the provided concrete types, or the restriction of the content model or extension of attributes of a concrete type that you have defined.
References
This section is normative.
Scope
This section describes the elements of the standard that are meant for the marking of references to bibliographic entities. References to agents, activities etc. should be made using elements specific for these concepts. This applies only to citation practices in the text: not to implicit references (which may be marked using metadata elements).
Examples of references in a text are: The Law on the Income Tax 2001, articles 10 and 12 and articles 10 and 12 of the Law on the Income Tax 2001. An anaphoric or cataphoric reference (the previous clause or provisions based on this Act) is also a citation.
Elements
For the marking of citations, the following elements are defined in the standard:
- cite
- citeGroup
- citeRange
- rangeFrom
- rangeTo
- exception
- exceptionGroup
- exceptionRange
All these elements are inline elements.
A single citation should be marked with a cite element. A range of citation should be marked with a citeRange element, with the citation denoting the start of the range marked as rangeFrom, and the end citation marked as rangeTo.
If a textfragment contains several citations that cannot be split (in the text), such as "articles 10 and 12", then the entire text should be marked as a citeGroup, which contains citeGroup, citeRange and/or cite elements.
A cite or citeRange element can contain a single exceptionGroup element that contains any exceptions to the citation range mentioned. The exceptionGroup contains one or more exception and/or exceptionRange elements denoting the specific exceptions.
Only the cite, rangeFrom, rangeTo and exception elements should contain any actual references, should use the following attributes:
- about
- the reference itself
- rel
- the type of the relation (reference)
- href
- the target of the reference (a work or an expression)
Examples
In case of a single reference, the entire text of the reference is simply marked with cite elements:
<cite>article 10</cite>
In case of multiple references in a text (that cannot be seperated), parts of the text that belong uniquely to one of the references are marked with cite elements. These are then grouped with a citegroup element.
<citegroup>articles <cite>7</cite> en <cite>13, part e</cite>, of the Law on Registration 1977</citegroup>
If the references can be split in the text, than no citegroup element is required.
<cite>article 7 of the Law on Registration 1977</cite> and <cite>article 12 of the Civil Servants Law</cite>
A range can be included by means of a citerange element. The beginning and end of each range are marked in a way similar to cites within a citegroup, that is, each piece of text belonging uniquely to one references is marked:
<citerange>article <rangeFrom>7</rangeFrom>-<rangeTo>11</rangeTo></citerange>
A citegroup can contain several cite,citerange and other citegroup elements, allowing for complex multiple references to be marked:
<citegroup>articles <cite>5</cite>, <citerange><rangeFrom>10</rangeFrom>-<rangeTo>14</rangeTo></citerange>, <citegroup>17, <cite>first</cite> and <cite>second</cite> member</citegroup>, <cite>18, first member</cite> and <citerange><rangeFrom>21</rangeFrom>-<rangeTo>24b</rangeTo></citerange> of the Law on the Income Tax 2001</citegroup>
If there are any exceptions to a range, then these can be marked with an exceptionGroup. The exceptionGroup includes all text regarding the exceptions, with the actual exceptions marked as exception or exceptionRange.
<cite>chapter 2 <exceptiongroup>with exception of <exception>article 34</exception></exceptionGroup></cite>
<citerange>articles <rangeFrom>55</rangeFrom>-<rangeTo>63b</rangeTo> <exceptionGroup>with exception of <exception>article 56, first member</exception></exceptionGroup></citeRange>
Should the exceptions be more elaborate, then they are marked in a manner similar to the structure of citegroup:
<cite>chapter 1 <exceptiongroup>with exception of articles <exceptionRange><rangeFrom>6</rangeFrom> to <rangeTo>9</rangeTo></exceptionRange> and <exception>11</exception></exceptionGroup></cite>
Conformance to the Standard
References must either use one of the generic elements, or a defined element restricting the Tcite or Tccontainer type, or any of the other available types. Defining your own citation elements works in the same way as for other MetaLex elements.
Metadata
This section is normative.
MetaLex metadata can be used to generate RDF triples. Entities are identified using URI. Statements about entities are interpreted as subject, predicate, object triples. This section decribes what counts as a MetaLex metadata statement, how it is stored inside a MetaLex document, and what classes of entities and which predicates (properties) MetaLex distinguishes.
Part of the Metalex standard is an OWL schema. The namespace of the first schema is: http://www.metalex.eu/classes/20061115/metalex-cen.owl#
The namespace of the latest schema is: http://www.metalex.eu/classes/latest/metalex-cen.owl#
Scope
The MetaLex OWL schema defines:
- bibliographic entities
- content models, work, expression, manifestation, and item, and combinations of a content model and bibliographic entity;
- citation
- between bibliographic entities, through the target property;
- activities
- actions and thematic links, and thematic roles of bibliographic entities in at least the actions creation, enactment, repeal;
- agent and competence
- the generic superclasses of jurisdiction-specific legislators and legislative competences.
The description of classes in the MetaLex OWL schema is normative.
The standard also precribes a method for embedding metadata directly in MetaLex.
The Resource Description Framework
The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. RDF is based on the idea of identifying things using Uniform Resource Identifiers, or URIs, and describing resources in terms of simple properties and property values. The URI identifies things, classes of things, properties of things, and values of those properties. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values. RDF statements can be used by an RDF processor to generate RDF triples. RDF is well-supported by existing tools and server software like Aduna Autofocus and Metadata Server, TopBraid Composer, Protégé, SemanticWorks, etc.
An RDF statement has the following parts:
- subject
- the thing the statement describes;
- predicate
- a specific property;
- object
- the thing the statement says is the value of the property, for the thing the statement describes.
Refer to the RDF W3C Recommendation for more information about using RDF. RDF SChema introduces the notion of the RDF Class. Resources may be classified into classes. The members of a class are known as instances of the class. Classes are themselves resources. They are identified by URI and may be described using properties, etc. The rdf:type property may be used to state that a resource is an instance of a class. Web Ontology Language (OWL) is an extension to RDF Schema that allows precise definition of valid instances of a class (similar to the relation between content models and the XML schema). We will refer to an RDF class defined in OWL as an OWL class.
All MetaLex metadata statements can be used to generate RDF triples. Any method of storing RDF statements may be used to store MetaLex metadata. A MetaLex metadata statement is a statement that has as its type a MetaLex class, or as its subject, predicate, or instance a MetaLex class or an instance of a MetaLex class. MetaLex classes are defined in an OWL schema.
Embedded Metadata: Elements and Attributes
MetaLex metalex:meta, and other elements derived from the metalex:Tmeta type, are carriers of RDF/A attributes, and are therefore RDF/A statements. All elements derived are also RDF/A statements, but do note that the element name meta has a special significance to RDF/A processors, just like the name link. MetaLex metalex:meta elements are used to embed metadata that can also be stored in the form of RDF in RDF documents; RDF triples can be generated from RDF/A statements. RDF/A statements may be added to any MetaLex element if the content model allows it.
RDF/A does not have its own namespace: the significance of XML elements and attributes to RDF/A processors is determined entirely by names. All URI references in attributes may be relative to the the xml:base of the element. The object must be a URI: this is a restriction of RDF/A syntax.
An RDF/A element is defined as any XML element that contains one or more RDF/A attributes:
- about,
- property,
- rel,
- href,
- content.
Processing proceeds by examining each RDF/A element in turn. The RDF/A element under consideration at any time is the current statement, and its parent element is the context statement. Note that the context statement does not need to be an RDF/A element. RDF/A also includes a datatype attribute, that can be used to restrict the XML datatype of a literal object. The presence of that attribute does not by itself designate an RDF/A element.
The RDF/A processor generates RDF triples from RDF/A elements. There is exactly one triple generated per rel or property attribute. RDF/A also support a third mechanism using the attribute rev. Thus, an RDF/A element can generate at most 3 RDF triples, disregarding RDF's reification mechanism, one for each of those three attributes.
The predicate of a statement is specified using a property or rel attribute. A property attribute indicates a new statement whose predicate is the value of that attribute. The subject of the triple is decided using subject resolution. The object of the triple is decided using literal object resolution. A rel attribute indicates a new statement whose predicate is the value of that attribute. The subject of the triple will be decided using subject resolution. The object of the triple is decided using URI reference object resolution.
The subject of a triple is usually indicated using the about attribute. If the RDF/A element that includes the predicate attribute does not have an about attribute, then the subject of the triple is determined by about attribute of the first ancestor element that has an about attribute.
If an RDF/A statement is generated by a predicate attribute of a meta or link element, and this element does not contain an explicit about attribute, subject resolution is slightly different. Only the immediate context statement is considered, whether or not it has its own about attribute.
If the context statement is an element named meta or link itself, the RDF/A statement represented by the context statement is reified as the subject of this new RDF/A statement. If the context statement is named neither meta nor a link, two cases should be considered. The context statement may have an about attribute, in which case the RDF/A statement's subject is resolved as the value of this attribute (exactly as if the current RDF/A statement weren't a link or meta). However, if the context statement does not have an about attribute, the subject of the current RDF/A statement is the (URI of the) parent element itself. In the case of the default metalex:meta element, this is generally its containing metalex:mcontainer. This is not the case for other elements derived from the metalex:Tmeta type.
The object of the statement can be set using one of the attributes content or href. If the predicate was set using property then the object will be a literal, and its value will come from the content attribute or the element content. The metalex:meta element type may not however contain element content. Without a datatype attribute, the object literal will either be a plain literal or an XML literal, depending on whether the content attribute or element content is used. The metalex:meta may therefore only be used for plain literal and datatyped values. If the predicate was set with the rel attribute, then the object will be a URI whose value is obtained from the href attribute.
You may not use an attribute with the name rev on elements derived from the metalex:Tmeta type.
Events and Actions
The MetaLex OWL Schema defines a general framework for describing events and actions. The basic categories are:
- Event
- Action
- Transaction
Each of these have participants in certain roles. MetaLex uses a classification in terms of thematic role. The thematic role is the semantic relationship between a verb and an argument (the noun phrases) of a sentence. It is important to realize that thematic roles are based on linguistic criteria, and do not offer an ontologically sound criterium for classifying entities. The classification is therefore somewhat arbitrary, but easy to remember and use.
In the CEN standard we propose to use a simple categorization of thematic roles loosely based on Judith Dick's (1991) representation of legal arguments (see also John Sowa's website). Each occurrent has one or more participants (properties), that are either:
- Immanent or determinant
- a determinant participant determines direction, while an immanent participant is passively present throughout.
- Source or product
- a source must be present at the beginning, but need not participate throughout, while a product must be present at the end but need not participate throughout.
For event and action we recognize the following participants:
- agent
- is determinant and source of the action; a person or some organised group of persons; only actions have an agent;
- instrument
- is immanent and source of the action, and is not changed during the action;
- patient
- is immanent and product of the action, and undergoes some structural change as a result of the action; at the level of bibliographic entities this applies to the work;
- recipient
- is determinant and product of the action: the person towards whom the action was directed; in the case of sources of law this is usually the addressee; only transactions have a recipient;
- result
- is determinant and product of the action: a thing that was created by the action; at the level of bibliographic entities this can apply only to the expression;
- date
- is immanent and product of the action: when it happened, which is in the domain of legislation always a date.
The initial agreement covers three generic actions:
- Creation
- a bibliographic entity (result) is created by an author (agent), at a date. It is not relevant whether the text is a verbatim copy or a modification on an earlier text by another author: the identity of bibliographic entities does not depend on its content. When a bibliographic entity is created, its parts are also created, but the parts can be independently modified, resulting in a new creation. The expression of which an element is the manifestation cannot be created before a containing expression.
- Enactment
- The action of an agent with the competence (instrument) to enact by which an expression enters into force. The trigger be the more or less autonomous execution of an enactment provision (instrument) created by the agent before. The agent responsible for the enactment provision can still be considered to be acting.
- Repeal
- The action of an agent with the competence (instrument) to repeal by which an expression goes out of force. May be the more or less autonomous execution of an repeal provision (instrument) created by the agent.
Example
Consider a Minister of Finance with the competence to index amounts in taxation for the purpose of dealing with inflation. At date T1 he publishes a directive S1 to modify income tax law S2 at date T2 to compensate for inflation, resulting in S3.
The first action:
- agent: the minister of finance;
- date: T1;
- result: S1;
- instrument: the competence based in Scomp.
A background action:
- agent: some legislator;
- date: some time before T1;
- result: Scomp;
The second action:
- agent: the minister of finance;
- date: T2;
- result: S3;
- instrument: S2;
- instrument: S1;
- instrument: the competence based in S1.
It is also possible to replace the second action by an event. The difference is that the minister of finance (as an office) no longer has to exist at T2, which is in this case immaterial. The directive to act in a certain way at a certain time can be violated, while an event of this type is a purely institutional fact that occurs by definition. One of the greater qualities of thematic classification of participants is that it is largely impervious to differences in legal theory.
Conformance to the Standard
Any RDF statement may be made about a MetaLex entity. The definition of a MetaLex OWL class may not be extended by any means, but only restricted. It is recommended to at least embed information about the work and expression the XML structure is a manifestation of, the author of the work, official designations (titles or other means for citation) for the work, and publication, enactment, and repeal information. It is strongly recommended to refrain from defining new OWL classes if a suitable MetaLex class already exists.
Extensions
These XML schema files are not normative.
An XML schema file providing MetaLex/CEN substitutions for MetaLex 1.3.1 (deprecated by the CEN Workshop Agreement) elements can be found at: not available yet
An XML schema file providing MetaLex/CEN substitutions for Akoma Ntoso elements can be found at: http://darius.lri.jur.uva.nl/svn/MetaLexWS/branches/latest/akomantoso-adapter-test.xsd (SVN path)
These examples demonstrate that files marked up in existing XML standards can mostly be made conformant to MetaLex, provided that these standards allow the use of attributes from the MetaLex/CEN namespace.
Schema
MetaLex/CEN is specified in an XML schema file. This XML schema forms a normative part of this specification. The normative XML schema for content models can be found at: not available yet. The XML schema file has been generated from a DTD++ file, which can be found at: not available yet. Because the schema is generated from the DTD++ file, conformance to the DTD++ file is considered an adequate alternative for conformance to the XML schema file. The latest proposed XML schema for content models can be found at: http://darius.lri.jur.uva.nl/svn/MetaLexWS/branches/latest/metalex.xsd (SVN path). The XML schema file has been generated from a DTD++ file, which can be found at: not available yet.
The latest proposed XML schema for references can be found at: not available yet. The XML schema file has been generated from a DTD++ file, which can be found at: not available yet.
The normative OWL schema can be found at: not available yet. The latest proposed OWL schema can be found at: http://darius.lri.jur.uva.nl/svn/MetaLexWS/branches/latest/metalex-cen.owl (SVN path)






