Eclipse and Java Blog by Michael Scharf: I don't like XML! But what are the alternatives?

Monday, June 18, 2007

I don't like XML! But what are the alternatives?

I think XML is one of the worst formats for data. It is extremely ambiguous. There are so many ways to put a simple data structure into XML. For example:

<book 
    title="The Return of the King"
    author="J.R.R. Tolkien"/>

<book>
  <title>The Return of the King</title>
  <author>J.R.R. Tolkien</author>
</book>

And in the second case, can there be more than one author? And more titles?

XML is not self describing. Look at the XML below. It is very ambiguous (if you don't use one of the many schema descriptions (DTD, XML-Schema, XMI, DSD, ...)).

<data
 x="null"
 y="true"
 z="42">
 <a>NULL<a/>
 <a>false<a/>
 <b>TRUE<b/>
</data>

What is a String? What is Boolean? What is a number? Is x a list? You simply can't infer it from the XML. When you want to write the data, which fields are written as tags which are written as attributes?

There is also a huge problem with IDs and references. From a plain XML file there's no way to figure out what an id of an object is and what references are. A good example are plugin.xml files. They are full of IDs and references but it is so hard to know which string refers which other XML element. Control-click does not work. Why? Because references are difficult to resolve!

Are there any good alternatives?

JSON is much simpler, self-describing, less verbose and much better suited for data storage and exchange. But it has no notion of IDs and references. And it does not name object (record) types (it has only lists, maps, strings, numbers and boolean).

What else is out there? I want something like JSON + an ID/Reference model + named Records...

19 comments:

AnonymousJune 18, 2007 6:39 PM
What do you think of YAML ?

It is nearly of superset of JSON. I am sure that it has internal references. I don't know about external ones though.
ReplyDelete
Replies
AnonymousJune 18, 2007 6:59 PM
maybe you should have a look at yaml: http://www.yaml.org/
ReplyDelete
Replies
AlBlueJune 18, 2007 7:03 PM
JSON is as self-describing as XML is; it's up to you to come up with good names:

{data:{x:"null";y:42;a:["null"]}

Garbage in, garbage out. It doesn't matter what syntax you use for the garbage.
ReplyDelete
Replies
AnonymousJune 18, 2007 7:08 PM
May be UBF will meet your requirements.

Cheers!
ReplyDelete
Replies
SDiZJune 18, 2007 7:17 PM
Try YAML? Their "Anchor/Alias" work like Reference/ID. See http://www.yaml.org/start.html
ReplyDelete
Replies
Ricky ClarksonJune 18, 2007 7:34 PM
S-expressions (what Lisp source is made of) used as data is more readable than XML or JSON, and easier to write parsers for.

(body
..(h1 "This is a heading")
..(p "This is a "
.....(font :color "red" "normal paragraph ")
....."with some markup"))

You can add your own idea of what an ID is if you like.
ReplyDelete
Replies
Michael ScharfJune 18, 2007 7:52 PM
Al,

>{data:{x:"null";y:42;a:["null"]}

> Garbage in, garbage out.
> It doesn't matter what syntax you use for the garbage.

It's not the names its the ambiguity! With JSON I know that a is a list and y is a number. With XML I don't know this.
ReplyDelete
Replies
AnonymousJune 18, 2007 8:17 PM
UBF has two levels - A and B. A is just a transport mechanism. B includes support for contract checking. Also, UBF is easy to parse. You can find more about UBF here:

http://www.sics.se/~joe/ubf/site/home.html

Cheers!
ReplyDelete
Replies
Rebel1June 19, 2007 12:34 AM
BTW, you're so-called "XML" is not even well-formed.

try again when you know something about what you trash...
ReplyDelete
Replies
Michael ScharfJune 19, 2007 1:42 AM
Rebel1,

you are right! I just fixed it :-D!

Michael
ReplyDelete
Replies
Doug SchaeferJune 19, 2007 6:43 AM
I am just reading Terence Parr's book on ANTLR. He agrees with you: "(I implore everyone to please stop using XML as
a human interface!)". His solution, create a domain specific language that is at least human readable and write a translator for it (using ANTLR, of course :). But, again, it won't be self describing. You'll still need a language spec. But at least you won't have to type in all the '<' | '>' s.
ReplyDelete
Replies
AnonymousJune 19, 2007 11:39 AM
Terence Parr just posted to the ANTLR list about an example config parser he built for a talk he's giving in Sydney.

This could be just what you are looking for

Fig
ReplyDelete
Replies
UnknownJune 19, 2007 5:19 PM
Try regular JSON expressions:

http://laurentszyster.be/jsonr/

Regards,
ReplyDelete
Replies
AlBlueJune 19, 2007 10:10 PM
> It's not the names its the ambiguity! With JSON
> I know that a is a list and y is a number. With
> XML I don't know this.

Actually, attributes are always text, and child elements are a mixture of text and nested elements. So you do know something about the types from the format.

In any case, just knowing whether something is a list or an integer doesn't tell you much about the data itself; you've just deciphered the container format. It says nothing about the data itself, which was my point.

Then again, neither XML nor JSON ever claimed to support your concept of human readability. It's just a structural format. You can choose to organise a JSON data structure as a single string with some kind of separator, like a CSV file. It wouldn't be good, but it would still be JSON.

{data:"1,2,3,4,5,6,7,a,b,2,x,6"}

So, my advice; don't confuse format of structure with the choice of how to arrange that structure :-)
ReplyDelete
Replies
Michael ScharfJune 20, 2007 2:22 AM
Al,
> Then again, neither XML nor JSON
> ever claimed to support your
> concept of human readability.
> It's just a structural format.

I did not talk about human readability! I talk about machine readability. I talk about getting a file and converting it into a simple in memory representation of records with lists, strings, boolean aand number attributes. And then converting this data structure back into a file.

With a schema you can read XML, but there are so many schema formats and a XML reader that can deal with all possible schemata gets *very* complex.

I believe that the data should be self describing and you should be able to infer the schema from the data.

The problem of references is not only a problem for humans, it's also a problem for programs.
ReplyDelete
Replies
Nitin DahyabhaiJune 26, 2007 2:55 AM
True, but but when it comes to programs, few people should be trying to implement an XML parser themselves. The meat is to map the output of the parser into the objects you care about. The XML syntax itself and its handling is just gravy.
ReplyDelete
Replies
Michael ScharfJune 26, 2007 6:27 AM
This comment has been removed by the author.
ReplyDelete
Replies
Michael ScharfJune 26, 2007 6:28 AM
Nitin,

but mapping the XML to data structures is the problem! Parsing the XML itself can be done by a library. But there soooo many ways to map data into XML. In something like JSON or that mapping is very simple.

Michael
ReplyDelete
Replies
Michał CzardybonDecember 27, 2007 7:33 PM
> I want something like JSON + an ID/Reference model + named Records...

Try Harpoon:
http://harpoon.sourceforge.net

It has named records (tagged), as well as lists and tuples. It doesn't have id/reference model, but you can make yours by some convention. I plan to add metadata in future, which will solve that issue.
ReplyDelete
Replies

Add comment

Eclipse and Java Blog by Michael Scharf

Monday, June 18, 2007

I don't like XML! But what are the alternatives?

19 comments:

About Me

Links