XML documents contain their own markup language. It is difficult to write software that understands all XML documents. That is the reason why we rely on standards when exchanging certain types of information via XML. This includes RSS for distributing, exchanging feeds with Web site related news and information, and for publishing XML-sitemaps that help search engines index Web sites.
Web pages, as we know them, use HTML as a markup, and sometimes XML with style sheets and these pages can appear the same as Web pages using HTML. XML provides a way for Web publishers to construct their own markup for Web publishing if they prefer. However, adhering to some standards makes life easier for webmasters and that includes using Sitemap protocol for publishing sitemaps.
Web publishers can optimize XML for search engine indexing just like HTML files. Using XML Web publishers have even more control over describing content within XML formatted code than with conventional HTML code. That is why XML documents are the best format for the semantic Web.
There are no obvious technological barriers for search engines to index XML documents and it should be even easier than indexing conventional HTML Web pages. XML documents contain text with content structured as a tree that is easy to traverse, but the difficult part is of course in the understanding of each document and they it applies to HTML documents and text files in general.
On the Web sites we study, we found 27% of all sites had XML links, including links to RSS feeds. Among so-called Web 2.0 Web sites, 81% had XML links, but only 5% of business-to-business sites. Obviously, the Web 2.0 Web sites are embracing XML technology while the traditional business sites are yet to catch up.