Friday, July 4, 2008

Common XML design mistakes and how to avoid them(Few tips for smart architecture with XML)

XML suffers from an all-too-common problem with new technologies: It cant be called "buzzworditis." Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. Which leads to corporate memos insisting that "entire systems" need to be somehow "converted" to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems.
XML suffers from an all-too-common problem with new technologies: It cant be called "buzzworditis." Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. Which leads to corporate memos insisting that "entire systems" need to be somehow "converted" to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems.

Tip 1: If you don't need it, throw it away

One thing that many architects don't initially get about XML is that it's just a way to represent information. There's nothing magical about an XML document: It just shows how various pieces of information relate to one another. When you receive a document from an external source that has information you know you'll never use (such as internal reference numbers from that source that have no bearing on your system), toss them! Use an XSLT style sheet or some other mechanism to filter out the information you want to keep and drop the information you don't want. Remember, it's always going to be more efficient to filter the data once (as it comes into your system) than every time you need to access it. Similarly, if you receive information about seven million customers in one monster document, but the information would be more useful to you in separate documents, break it into one document per customer. After all, if you received a fixed-width file from a mainframe system, you almost certainly wouldn't keep it around in that form because it wouldn't be particularly useful. Don't be afraid to dissect, reorganize, or otherwise modify XML documents to suit your needs.
....................................................................................
Tip 2: Don't use XML for searching

XML documents (by themselves) are not well suited to being searched. Because they're just flat text, any of XML's native searching mechanisms (such as XPath) must parse the entire document (or documents) to locate the piece (or pieces) you're interested in. If you're trying to work with that single document with information about seven million customers, searching would be extremely inefficient. If you break the document up into smaller documents -- say, one per customer -- the problem still occurs: To find the particular customer you're looking for, you need to parse each document until you find the appropriate one. The only good solution for searching XML documents is to introduce some sort of indexing mechanism -- either a relational database index or some sort of native XML indexing tool -- that significantly reduces the amount of information that has to be processed to locate the document (or document fragment) you're interested in. When you have data-oriented information (as opposed to text-oriented information such as a book manuscript), a relational database is well suited for this task, and it provides other benefits, as you'll see in the next tip.
....................................................................................

Tip 3: Don't use XML for summarization

Summarizing information stored in XML documents is also very inefficient. The native language provided by XPath contains only the bare minimum of aggregation functionality, and even this is not easily usable if the information you want to summarize is found in more than one document. Also, summarization presents the same problem as searching: Each document must be parsed to discover and extract the information being summarized. Again, I recommend indexing the information, thus reducing the amount of information to sift to discover the pieces that are being operated on. Alternatively, you could generate an additional document that contains summary information as detail XML documents are introduced into the system. However, that would not allow you to do ad hoc summarization, and it can be a bit of a management chore. For the best flexibility for summarization tasks, a relational database is really the only good choice; most off-the-shelf XML indexers do not expose the indexes themselves for direct programmatic manipulation.
....................................................................................

Tip 4: Use XML to drive rendering

One real power of XML lies in its ability (via XSLT) to render its contents to various other forms. This is especially crucial if your system needs to support various means of data consumption -- through an HTML interface such as a desktop Web browser, through a portable device using WML, or to a data-transfer standard agreed upon by your industry. Relational data can drive rendering, too, but it's not as good at the job. Each possible rendering requires significant coding time. Also, if a request is received to render a piece of information that you have stored as an atomic XML document (such as a single customer), you can do so without touching the indexing system, which frees up cycles on that system to support the searching and summarization of the data as necessary.
....................................................................................

Conclusion

This column looked at some of the ways XML fits into an overall system architecture and where it does (and doesn't) make sense. You've seen that some sort of indexing mechanism -- ideally a relational database -- should be part of your overall architecture in most cases. In short, use XML to perform the tasks it excels at, such as driving a rendering system.

As you're architecting (or rearchitecting) your systems, remember that XML is just another tool in your development toolbox. You wouldn't use a screwdriver to hammer in a nail. Don't try to make XML do things it isn't designed to do well.

0 comments:

Your Ad Here
Reader's kind attention....The articles contained in this blog can be taken from other web sites, as the main intention of this blog is to let people get all sides of the web technologies under the single roof..so if any one finds duplication or copy of your articles in this blog and if you want that to be removed from this ..kindly inform me and i will remove it...alternatively if you want me to link back to your site with the article...that can also be done...

Thanks,
Webnology Blog Administrator
 

blogger templates