WEBNOLOGY :: Web Development Technologies: xml

Showing posts with label xml. Show all posts

Wednesday, August 20, 2008

Some must know truths about Google indexing of Flash files

Google has been crawling the web for years and helping a lot of businesses to grow to a level they can never imagine.Since, a vast amount of business organisations started relying on web based identity...Some of them prefer a single portfolio page..that was enough for a better identity those days..but this starts changing when the business levels and style of business process changed...and as days passed...many are now relying on highly dynamic sites and started concentrating on flash based site..where user interactivity was the main intention.

But, soon the sad news for these people came up which was really horrible that google cant index flash files…as google mainly focuses on crawling through highly raw content on the site…where static and partially dynamic sites got more priority than flash based site, even google crawler was able to scan through the images and its captions and through “alt” attribute of ““.

But still, the truth was google cant crawl through flash files..but now it has become real relief for a lot of people who thought this would never happen….Now google and Adobe shaked hands and started indexing flash files..it can crawl through flash files and capture some content contained in it…still we need to make sure that..google can index our flash files well.

So before leaving everything to google, why cant we take some precautions to avoid this kind of issues..google really loves us..its fond of the content..so google really deserves some assistance from our side too..

First of all, apart form the announcement from Adobe of this wonderful facility..lets concentrate on certain points..to make the flash based site more success among other sites…

• Make sure you embed your Flash using SWFobject so that you can display alternate HTML content. Make sure that the text content in the alternate HTML is as identical as possible to the Flash content. Graphic elements can be described, just as you would describe a photo with a caption or an image ALT tag.
• Some flash content can be build with XML as well, so ff you generate your Flash content from an external XML file, use the same XML file to generate the alternate HTML content.
• Google can see text and links inside the Flash file; but it will not split up a Flash file into multiple pages and index them separately. That means that your Flash file will be the equivalent of one, massive HTML page, unless you break it up into multiple HTML landing pages as recommended above.

But unfortunately, its not time for us to start creating great flash files that can be indexed by google…firmly based on the announcement from Adobe…

Some of the main reasons could be….

• We know that, now google can now crawl through text and links inside the flash file..but have you ever noticed that…a complete flash based site can always contain a single page and from there, the user can dynamically jump to the other navigation sections. But there ‘s something that we need to think…If a site has five html page google crawls through all these pages and finds the relevant content , but is it possible in the case of flash files…Impossible..else google has to cut these flash files into small chunks and crawl through and index them….

• Another point, it can only crawl through static content and not possibly through dynamic content..dynamic in the sense, content that are taken dynamically from an external XML file, its not sure that google indexes these dynamic contents.

Friday, July 4, 2008

Common XML design mistakes and how to avoid them(Few tips for smart architecture with XML)

XML suffers from an all-too-common problem with new technologies: It cant be called "buzzworditis." Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. Which leads to corporate memos insisting that "entire systems" need to be somehow "converted" to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems.
XML suffers from an all-too-common problem with new technologies: It cant be called "buzzworditis." Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. Which leads to corporate memos insisting that "entire systems" need to be somehow "converted" to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems.

Tip 1: If you don't need it, throw it away

One thing that many architects don't initially get about XML is that it's just a way to represent information. There's nothing magical about an XML document: It just shows how various pieces of information relate to one another. When you receive a document from an external source that has information you know you'll never use (such as internal reference numbers from that source that have no bearing on your system), toss them! Use an XSLT style sheet or some other mechanism to filter out the information you want to keep and drop the information you don't want. Remember, it's always going to be more efficient to filter the data once (as it comes into your system) than every time you need to access it. Similarly, if you receive information about seven million customers in one monster document, but the information would be more useful to you in separate documents, break it into one document per customer. After all, if you received a fixed-width file from a mainframe system, you almost certainly wouldn't keep it around in that form because it wouldn't be particularly useful. Don't be afraid to dissect, reorganize, or otherwise modify XML documents to suit your needs.
....................................................................................
Tip 2: Don't use XML for searching

XML documents (by themselves) are not well suited to being searched. Because they're just flat text, any of XML's native searching mechanisms (such as XPath) must parse the entire document (or documents) to locate the piece (or pieces) you're interested in. If you're trying to work with that single document with information about seven million customers, searching would be extremely inefficient. If you break the document up into smaller documents -- say, one per customer -- the problem still occurs: To find the particular customer you're looking for, you need to parse each document until you find the appropriate one. The only good solution for searching XML documents is to introduce some sort of indexing mechanism -- either a relational database index or some sort of native XML indexing tool -- that significantly reduces the amount of information that has to be processed to locate the document (or document fragment) you're interested in. When you have data-oriented information (as opposed to text-oriented information such as a book manuscript), a relational database is well suited for this task, and it provides other benefits, as you'll see in the next tip.
....................................................................................

Tip 3: Don't use XML for summarization

Summarizing information stored in XML documents is also very inefficient. The native language provided by XPath contains only the bare minimum of aggregation functionality, and even this is not easily usable if the information you want to summarize is found in more than one document. Also, summarization presents the same problem as searching: Each document must be parsed to discover and extract the information being summarized. Again, I recommend indexing the information, thus reducing the amount of information to sift to discover the pieces that are being operated on. Alternatively, you could generate an additional document that contains summary information as detail XML documents are introduced into the system. However, that would not allow you to do ad hoc summarization, and it can be a bit of a management chore. For the best flexibility for summarization tasks, a relational database is really the only good choice; most off-the-shelf XML indexers do not expose the indexes themselves for direct programmatic manipulation.
....................................................................................

Tip 4: Use XML to drive rendering

One real power of XML lies in its ability (via XSLT) to render its contents to various other forms. This is especially crucial if your system needs to support various means of data consumption -- through an HTML interface such as a desktop Web browser, through a portable device using WML, or to a data-transfer standard agreed upon by your industry. Relational data can drive rendering, too, but it's not as good at the job. Each possible rendering requires significant coding time. Also, if a request is received to render a piece of information that you have stored as an atomic XML document (such as a single customer), you can do so without touching the indexing system, which frees up cycles on that system to support the searching and summarization of the data as necessary.
....................................................................................

Conclusion

This column looked at some of the ways XML fits into an overall system architecture and where it does (and doesn't) make sense. You've seen that some sort of indexing mechanism -- ideally a relational database -- should be part of your overall architecture in most cases. In short, use XML to perform the tasks it excels at, such as driving a rendering system.

As you're architecting (or rearchitecting) your systems, remember that XML is just another tool in your development toolbox. You wouldn't use a screwdriver to hammer in a nail. Don't try to make XML do things it isn't designed to do well.

Thursday, July 3, 2008

Anatomy of an XML Document

Whether you're writing XML from scratch, or writing a document from a pre-defined specification there is a standard layout for XML. Here is a standard XML document:

<?xml version="1.0"?>
<workorder priority="high" datedue="09/30/2001">
<submitter>
<name first="Jennifer" last="Kyrnin" />
<email>html.guide@about.com</email>
<account number="11001100" />
</submitter>
<project title="update aa051198.htm article">
<url>http://webdesign.about.com/library/weekly/aa051198.htm</url>
<description>
Please convert this article to the new article look and feel, with the side navigation and information.
</description>
</project>
</workorder>

Whether you're writing XML from scratch, or writing a document from a pre-defined specification there is a standard layout for XML. Here is a standard XML document:

<?xml version="1.0"?>
<workorder priority="high" datedue="09/30/2001">
<submitter>
<name first="Jennifer" last="Kyrnin" />
<email>html.guide@about.com</email>
<account number="11001100" />
</submitter>
<project title="update aa051198.htm article">
<url>http://webdesign.about.com/library/weekly/aa051198.htm</url>
<description>
Please convert this article to the new article look and feel, with the side navigation and information.
</description>
</project>
</workorder>

If you look closely at this markup, you will be able to determine its structure. The first part of the structure is the XML declaration, <?xml version="1.0"?>. Everything after that is an element of the XML document. The container element is <workorder>. This element contains all the other elements and surrounds them all. Inside of that element are the specialized elements that describe the rest of the document, such as <submitter>, <project>, and <account>.

Here is a more visual tree view of the structure:

Each of the elements are in red, with any attributes in dark blue, and contents in black.

This tree can have many more branches and sub-branches. Each branch represents an element, which can have attributes or not, and content or not.

The Prolog
This is the most vital part of our document. It tells the browser or parser that this document is marked up in XML. This prolog is actually a part of HTML as well, but most HTML authors leave it out. In HTML the prolog might look like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
This tells browser that this document will be using HTML 4.0 Transitional. The prolog in an XML document tells the computer that it's using XML and what version.

But the prolog for an XML document can also contain:

* the DTD or schema being used
* declarations of special pieces of text
* text encoding
* XML processor instructions

Elements
After the prolog, come the structure of the XML document, the elements. The most important thing to remember is that there must be a container element for your XML document. In fact, your document can be made up of one element alone:

<?xml version="1.0"?>
<topelement>
This XML document is a well-formed document, but it only has one element. This is OK. It is still a correct XML document.
</topelement>

WEBNOLOGY :: Web Development Technologies

Search this site

Latest Posts

Categories

Its really cool to be out here !!

Subscribe Me

Wednesday, August 20, 2008

Some must know truths about Google indexing of Flash files

Friday, July 4, 2008

Common XML design mistakes and how to avoid them(Few tips for smart architecture with XML)

Thursday, July 3, 2008

Anatomy of an XML Document

Universal Search

Blog Archive

Page Views