<bookor
title="The Return of the King"
author="J.R.R. Tolkien"/>
<book>And in the second case, can there be more than one author? And more titles?
<title>The Return of the King</title>
<author>J.R.R. Tolkien</author>
</book>
XML is not self describing. Look at the XML below. It is very ambiguous (if you don't use one of the many schema descriptions (DTD, XML-Schema, XMI, DSD, ...)).
<dataWhat is a String? What is Boolean? What is a number? Is
x="null"
y="true"
z="42">
<a>NULL<a/>
<a>false<a/>
<b>TRUE<b/>
</data>
x
a list? You simply can't infer it from the XML. When you want to write the data, which fields are written as tags which are written as attributes?There is also a huge problem with IDs and references. From a plain XML file there's no way to figure out what an id of an object is and what references are. A good example are
plugin.xml
files. They are full of IDs and references but it is so hard to know which string refers which other XML element. Control-click does not work. Why? Because references are difficult to resolve!Are there any good alternatives?
JSON is much simpler, self-describing, less verbose and much better suited for data storage and exchange. But it has no notion of IDs and references. And it does not name object (record) types (it has only lists, maps, strings, numbers and boolean).
What else is out there? I want something like JSON + an ID/Reference model + named Records...
What do you think of YAML ?
ReplyDeleteIt is nearly of superset of JSON. I am sure that it has internal references. I don't know about external ones though.
maybe you should have a look at yaml: http://www.yaml.org/
ReplyDeleteJSON is as self-describing as XML is; it's up to you to come up with good names:
ReplyDelete{data:{x:"null";y:42;a:["null"]}
Garbage in, garbage out. It doesn't matter what syntax you use for the garbage.
May be UBF will meet your requirements.
ReplyDeleteCheers!
Try YAML? Their "Anchor/Alias" work like Reference/ID. See http://www.yaml.org/start.html
ReplyDeleteS-expressions (what Lisp source is made of) used as data is more readable than XML or JSON, and easier to write parsers for.
ReplyDelete(body
..(h1 "This is a heading")
..(p "This is a "
.....(font :color "red" "normal paragraph ")
....."with some markup"))
You can add your own idea of what an ID is if you like.
Al,
ReplyDelete>{data:{x:"null";y:42;a:["null"]}
> Garbage in, garbage out.
> It doesn't matter what syntax you use for the garbage.
It's not the names its the ambiguity! With JSON I know that a is a list and y is a number. With XML I don't know this.
UBF has two levels - A and B. A is just a transport mechanism. B includes support for contract checking. Also, UBF is easy to parse. You can find more about UBF here:
ReplyDeletehttp://www.sics.se/~joe/ubf/site/home.html
Cheers!
BTW, you're so-called "XML" is not even well-formed.
ReplyDeletetry again when you know something about what you trash...
Rebel1,
ReplyDeleteyou are right! I just fixed it :-D!
Michael
I am just reading Terence Parr's book on ANTLR. He agrees with you: "(I implore everyone to please stop using XML as
ReplyDeletea human interface!)". His solution, create a domain specific language that is at least human readable and write a translator for it (using ANTLR, of course :). But, again, it won't be self describing. You'll still need a language spec. But at least you won't have to type in all the '<' | '>' s.
Terence Parr just posted to the ANTLR list about an example config parser he built for a talk he's giving in Sydney.
ReplyDeleteThis could be just what you are looking for
Fig
Try regular JSON expressions:
ReplyDeletehttp://laurentszyster.be/jsonr/
Regards,
> It's not the names its the ambiguity! With JSON
ReplyDelete> I know that a is a list and y is a number. With
> XML I don't know this.
Actually, attributes are always text, and child elements are a mixture of text and nested elements. So you do know something about the types from the format.
In any case, just knowing whether something is a list or an integer doesn't tell you much about the data itself; you've just deciphered the container format. It says nothing about the data itself, which was my point.
Then again, neither XML nor JSON ever claimed to support your concept of human readability. It's just a structural format. You can choose to organise a JSON data structure as a single string with some kind of separator, like a CSV file. It wouldn't be good, but it would still be JSON.
{data:"1,2,3,4,5,6,7,a,b,2,x,6"}
So, my advice; don't confuse format of structure with the choice of how to arrange that structure :-)
Al,
ReplyDelete> Then again, neither XML nor JSON
> ever claimed to support your
> concept of human readability.
> It's just a structural format.
I did not talk about human readability! I talk about machine readability. I talk about getting a file and converting it into a simple in memory representation of records with lists, strings, boolean aand number attributes. And then converting this data structure back into a file.
With a schema you can read XML, but there are so many schema formats and a XML reader that can deal with all possible schemata gets *very* complex.
I believe that the data should be self describing and you should be able to infer the schema from the data.
The problem of references is not only a problem for humans, it's also a problem for programs.
True, but but when it comes to programs, few people should be trying to implement an XML parser themselves. The meat is to map the output of the parser into the objects you care about. The XML syntax itself and its handling is just gravy.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteNitin,
ReplyDeletebut mapping the XML to data structures is the problem! Parsing the XML itself can be done by a library. But there soooo many ways to map data into XML. In something like JSON or that mapping is very simple.
Michael
> I want something like JSON + an ID/Reference model + named Records...
ReplyDeleteTry Harpoon:
http://harpoon.sourceforge.net
It has named records (tagged), as well as lists and tuples. It doesn't have id/reference model, but you can make yours by some convention. I plan to add metadata in future, which will solve that issue.