How do you structure an XML database?




The initial incantation for XML files

Just like HTML, XML must have a declaration of document type. It looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Notice that everything is written in lower case. XML is case sensitive and favor lowercase. Like in HTML, where you need to declare a character table using charset, you also need to declare a character table in XML. Here they just call it "encoding". Obviously you can choose another code than UTF-8, if you want to, and, as with HTML, you need to have compatible character table and file format.

After this you can go right ahead, defining your tags.


XML tags

Contrary to HTML, you have no predefined tags for XML, you define them yourself. Based on experience, naming the tags by content type, is a smart strategy. When you start having to find and retrieve data using JavaScript, it is a lot easier to find your way around the fields, when they have a descriptive name. XML tags have a start and stop like HTML, e.g.

<Books>
</Books>

Along with the declaration of file type and encoding, it looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Books>
</Books>

Like file names, you can in principle use special characters like the Danish æ, ø and å for naming the tags, but it is generally a bad idea. Browsers and JavaScript aren't always as good at handling special characters, as their programmers claim. Also remember that XML tags are case sensitive, so <Books> and <books> are treated as two different tags.

The first tag being made is the one specifying the name of the database, not to be confused with the file name. It's one the peculiar things about XML, that you just have to live with.

Right now we just have a shell for the database. Now we'll try making it useful. For this we need some fields to be filled out.

A tag is really just a field in a spread sheet. If we stick with our books, we need e.g. a title and an author field for each book. This is constructed having a tag called Book, which is the entry, and the two tags Title and Author, as the fields containing information about each book/entry. It looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Books>

<Book>
<Title></Title>
<Author></Author>
</Book>

</Books>

So, when you need to ad a new book, you just create an entry, e.g. two sets like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Books>

<Book>
<Title>BookTitle 1</Title>
<Author>Author 1</Author>
</Book>

<Book>
<Title>BookTitle 2</Title>
<Author>Author 2</Author>
</Book>

</Books>

You don't need the spacing between the lines as I've done here. Doing it like this, is fully acceptable:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Books>
<Book>
<Title>BookTitle 1</Title>
<Author>Author 1</Author>
</Book>
<Book>
<Title>BookTitle 2</Title>
<Author>Author 2</Author>
</Book>
</Books>

It just becomes a terribly messy text to read, as soon as you have a couple of entries in your database. Therefore I recommend inserting a couple of spacers, so you can find your way around in the code.


Subdivisions in XML tags (multivalued fields)

As the XML database is structured in the previous section, it looks like a standard database as we know them from e.g. Access or OpenOffice Database, or a spreadsheet for that matter. BUT you could be in need of subdivisions, e.g. if you had more than one author on some of the books, or of you wanted to split the names up in first name and surname. This is done by inserting a field tag in another field tag. If we start with having multiple author names, we call the tag AuthorName, then it looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Books>

<Book>
<Title>BookTitle 1</Title>
<Author>
<AuthorName>Author 1</AuthorName>
<AuthorName>Author 2</AuthorName>
</Author>
</Book>

<Book>
<Title>BookTitle 2</Title>
<Author>
<AuthorName>Author 3</AuthorName>
</Author>
</Book>

</Books>

When you have subdivisions Author is what is called a parent tag and AuthorName is a child tag. If you have several child tags for a parent, these are called siblings.

As seen, you don't need to have the same number of subdivisions in the various tags, and when you have this construction, also known as multi valued fields, it is normally because you have a varying number of subdivisions.

If instead we want the author name split up into first name and surname, it looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Books>

<Book>
<Title>BookTitle 1</Title>
<Author>
<FirstName>FirstName 1</FirstName>
<LastName>LastName 1</LastName>
</Author>
</Book>

<Book>
<Title>BookTitle 2</Title>
<Author>
<FirstName>FirstName 2</FirstName>
<LastName>LastName 2</LastName>
</Author>
</Book>

</Books>

The two approaches can of course be combined if required. You design your own database according to the content you need to structure.


If you need to use HTML codes in XML tags

In principle, XML tags can only handle plain text, i.e. text without anything resembling code or parts of code. This includes characters like < and >. If you need something like HTML tags in your XML tag, you need to use what is called CDATA. It is like an extra layer, lining the tag. Start is written <![CDATA[ and stop is written ]]>. If we use the tag "Title" from before, it now looks like this:

<Title><![CDATA[
]]></Title>

Note! CDATA has to be in the tag where you write the text. If you have subdivisions in your tag, you DON'T use CDATA on the parent tag, it goes on the child tag, because otherwise, the JavaScripts can't treat the tags like separate fields.