BYU Home page BRIGHAM YOUNG UNIVERSITY  
Search BYU 
XML FAQ
Table of Contents
Go Down to Content

What is XML?
What is XML? XML is a way of making markup languages. A markup language is a way of taking regular old text and marking certain parts of it as 'special' in some way.

For example, take a look at this text:

Bristol, England

You probably recognize that that text is talking about a place. But to someone completely unfamiliar with geography, it's just so much nonsense. If I wanted to help such people out, I might try to describe to them why that text is special; for instance, I might say:

Bristol, England is a place.

In other words, I'd tack some extra stuff on to clarify why that text is special.

XML allows us to do something like that, but instead of adding our explanation on as parts of English sentences (which are sometimes imprecise and can therefore be hard to work with---especially using a computer), we make up tags that indicate why that text is special, and then wrap the text in those tags---for instance:

<place>Bristol,England</place>

<place> is a tag. It marks the 'Bristol, England' text off as special, and (if we happen to understand what the tag means---in this case, it's pretty easy to guess that it's used to mark places) the tag can also tell us why it's special. A group of such tags that's used together to describe things can be referred to as a markup language---XML gives you rules for creating such languages and the tags that go with them. Easy enough?

So, what is XML? Simply a way to describe text. XML is particularly useful to describe text for computers to process because it can be much more precise than our common, everyday ways of describing things, but usually people (with some patience) can understand XML, too.

If you're familiar with HTML, you may be thinking, "This sounds kind of like HTML to me," and it should. HTML is a markup language that's very similar to those that follow the XML rules---in fact, the folks that define what HTML is are the same ones that made up XML, and the newest version of HTML has been changed to fit the XML rules. This new HTML, called XHTML, is a language that follows the XML rules---however, the old HTML (with which many people are most familiar) breaks quite a few of those rules. If you know HTML, it would be a good idea to find out what the differences between HTML and XML languages are.

Why are we using XML?
Why XML? Basically, because XML is a useful tool for organizing all kinds of content and works well for web pages too.

Using XML for web pages lets us make a simpler language than HTML (which can be quite complex) that also focuses on content rather than presentation of pages. Ideally, you'll be able to write your web pages in the XML language we provied without worrying about the nitty gritty details of making your links, headings or tables match the BYU templates' look and feel. If you use the ECEn XML web templates you can concentrate on producing your content, and the templates will usually do the rest. XML is particulary suited to this kind of job because, as we discussed before, it is a tool to describe text---and text is the main content of most web pages. You describe your content using the ECEn XML language and then the look and feel take care of themselves.

What are the differences between HTML and XML languages?
HTML vs. XML. If you know HTML (as opposed to the newer XHTML), it would be a good idea to understand the critical differences between XML languages and old-style HTML.

Although XML languages and HTML have many similarities, they differ in many respects. Perhaps the most fundamental is that HTML is a markup language itself, with a list of tags that browsers like Internet Explorer and Mozilla Firefox have been programmed to (usually) understand. But XML is a way of making markup languages and doesn't specify what the tags in those languages are (just how they're made).

When you're using the ECEn Web XML language, however, we've already defined the tags. So as far as the ECEn web pages' XML goes, the most noticeable difference from HTML is that there are different tags.

In addition to that major difference, there are several other minor differences that have to do with the rules XML imposes on XML-based markup languages. These differences can be a bit subtle: read over them if you are familiar with HTML and new to XML.

If you'd like to find out more about XML have a look at General XML Fundamentals.

If you are already running into trouble using XML, you might be interested in the ECEn XML Troubleshooting Guide.

HTML
XML Languages
Case Insensitive

Case insensitive simply means you can type the tag in any case you want and HTML will know what you mean. For example:

<h1>My Heading</h1>

...and...

<H1>My Heading</H1>

...mean the same thing to HTML.

You can even do this:

<h1>My Heading</H1>

...and HTML won’t mind.
Case Sensitive

Case sensitive means that it matters what case I type my tag letters in. For example:

<heading>My Heading</heading>

...and...

<Heading>My Heading</Heading>

...do not mean the same thing to XML. As far as XML is concerned, <heading> and <Heading> are fundamentally different tags, which explains why you can’t do this:

<heading>My Heading</HeADinG>
Not all tags need both an opening and closing tag

HTML is perfectly happy to accept and often expects opening tags without closing tags. For example, creating a paragraph with only an opening p tag:

<p>This is a short paragraph with no closing tag.

...is perfectly valid HTML.
All tags must have an opening and closing tag

XML complains about any tag that is not properly closed. For example:

<text>This is a short line of text with no closing tag.

Would give an XML parser grief and cause it to generate a "parse error" while trying to understand your document.

Even empty tags, such as the <line-break> tag we've defined in the ECEn web XML language, must be closed correctly:

<line-break></line-break>

To save typing, you can try the XML shortcut for empty tags:

<line-break/>
Generally forgiving

Although this isn't necessarily a feature of HTML itself, most web browsers that understand HTML try to guess what you meant when you make mistakes (such as putting tags in strange places). The page may not render as you expect, but it usually does render.
Generally strict

XML will not negotiate missing or mismatched tags---as a matter of fact, XML parsers are *required* by the XML specification to choke on mistakes, so even one misplaced tag can easily make your ECEn XML web page fail to render.

Using an XML-aware editor can help alleviate this burdern by letting you know when you've done something that breaks the XML rules.
No outermost tag required

Although it is good practice to wrap your HTML web page in <html> tags, most browsers wouldn’t mind this particular HTML web page:

<h1>My Web Page</h1>

<h2>My Favorite Recipe</h2>
One outermost tag is required to enclose all the others

XML expects all the tags in an XML document to be enclosed by one outermost tag, called the "root" tag. So the HTML on the left would give XML grief, but you can fix the problem like this:

<html>

<h1>My Web Page</h1>

<h2>My Favorite Recipe</h2>

</html>

In an ECEn XML page's XML, the <page> tag is generally the outermost tag, enclosing all of the other XML tags.
No formal tag nesting required

All that means is that most browsers don’t really care what order you close your tags in:

<i><b>Bold and italic text</i></b>

Note that the <i> tag and the <b> tag are closed in different order than they were opened.
All tags must be properly nested

XML cares that you close your tags in the proper order. For example:

<b><i>Bold and italic text</i></b>

Note that the <i> tag is closed before the <b> tag.

Maintained by The ECEn Web Team. Based on v. 3.8 of the ECEn web templates (view XML, live XML, see other formats).
Copyright © 1994-2005. Brigham Young University. All Rights Reserved.