RELAX

RELAX (REgular LAnguage description for XML)

2000 Feb 24

MURATA Makoto

1. Political

More about this, see RELAX FAQ

2. Technical

RELAX Core = Datatypes + Horn clauses + Regular Hedge Grammars

2.1 Datatypes

Borrowed from XML Schema Part 2

Built-in datatypes as well as facets.

2.2 Horn Clauses

A clause describes conditions on tag names and attribute values.

(1) Simple example

foo(x) :- 
  getTag(x)="foo",
  getAttr(x, "bar") ∈ Domain´(integer)
  common(x).

common(x) :- 
  getAttr(x, "lang")  ∈ Domain´(lang).
<tag name="foo">
  <attribute name="bar" type="integer"/>
  <ref pred="common"/>
</tag>

<attList pred="common">
  <attribute name="lang" type="language"/>
</tag>

(2) Condition on attribute values

divAsSection(x) :- 
  getTag(x)="div",
  getAttr(x, "class")="section".

divAsSubsection(x) :- 
  getTag(x)="div",
  getAttr(x, "class")="subsection".
<tag name="div" pred="divAsSection">
  <attribute name="class">
    <enumeration value="section"/>
  </attribute>
</tag>

<tag name="div" pred="divAsSubsection">
  <attribute name="class">
    <enumeration value="subsection"/>
  </attribute>
</tag>

(3) Mutually exclusive attributes

aWithName(x) :- 
  getTag(x)="a",
  getAttr(x, "name")= Domain(NMTOKEN)
  getAttr(x, "href")= φ.

aWithHref(x) :- 
  getTag(x)="a",
  getAttr(x, "href")= Domain(uri_reference)
  getAttr(x, "name")= φ.
<tag name="a" pred="aWithHref">
  <attribute
    name="href"
    type="uri_reference"
    required="true"/>
  <attribute name="name" type="none"/>
</tag>

<tag name="a" pred="aWithName">
  <attribute
    name="name"
    type="NMTOKEN"
    required="true"/>
  <attribute name="href" type="none"/>
</tag>

2.3 Regular Hedge Grammar

A hedge is an ordered sequence of ordered trees.

(1) Regular (String) Grammars, revisited

n ::= ε
n ::= m

n ::= a
n ::= a m

(2) Regular Hedge Grammars

n ::= r

n ::= a [r]

where r is a regular expression containing non-terminals.

(3) hedgeRule in RELAX Core

<hedgeRule
   label="non-terminal-name"
  ... hedge model ...
<elementRule>

LHS: a non-terminal.

RHS: a regular expression containing non-terminals.

(4) elementRule in RELAX Core

<elementRule
   pred="predicate-name"
   label="non-terminal-name">
  ... hedge model ...
<elementRule>

LHS: a non-terminal.

RHS: a predicate name followed by one of the fowlloing:

Recall predicates divAsSection and divAsSubsection. Permissible structures can be described as below:

<elementRule
   pred="divAsSection"
   label="section">
  <sequence>
    <ref label="paragraph" occurs="*"/>
    <ref label="subsection" occurs="+"/>
  </sequence>
<elementRule>

<elementRule
   pred="divAsSubsection"
   label="subsection">
  ... hedge model ...
<elementRule>

3. Future work