XML Overview and Tutorial
Introduction
This XML overview tutorial will provided a summary of XML and how it can be used as a data interchange between browsers and servers. JavaScript Object Notation and eXtensible Markup Language, most commonly referred to as JSON and XML, are both designed to solve the problem of maintaining computer-code structure while still keeping human readability between browsers, servers, APIs and different programming languages. However, XML files use the .xml
extension, but unlike JSON, have a HTML-like markup structure.
A Brief Overview of eXtensible Markup Language
XML is similar to HTML in many ways. It can open and close tags, like in HTML, and even uses the “ tag for comments as in HTML. However, XML is not a language like JavaScript or Ruby, but a method of storing, accessing and transmitting data.
It is worth noting that XML is more elaborate than JSON and therefore requires more memory and storage space. It also has several security problems unique to XML that are not issues with JSON.
The Syntax and Structure of XML Data
XML is composed of a “root” element with other sub-elements nested inside of it that contains the data. The XML elements use the same syntax as HTML tags.
Execute the touch
command in a UNIX terminal or echo
in a Windows command prompt to create an XML file, with the.xml
extension, as shown here:
1 | touch my-data.xml |
Namespaces and the XML Prolog
The XML Prolog, or declaration tag, in an XML file must be the first line at the top of the XML file. This tells the browser what type of data to expect so it will know to parse the XML data. However, the tag itself is optional. Refer to the following command:
1 | -?xml version="1.0" encoding="utf-8" standalone="yes" ?- |
NOTE: The version number must 1.0 or the system will return an error.
It is also good practice to include the declaration with the "utf-8"
encoding. This way the browser can process all of the Unicode characters, even if some happen to fall outside of the 128-bit ASCII range.
The XML namespaces
The XML namespace serves as a uniform resource identifier, or URI, for setting an xlmns
resource for the XML document. Using namespaces can also help avoid namespace conflicts with XML elements. Use a colon (:
) to declare XML namespaces as shown in the following example:
1 | -data:elementname- -/data:elementname- |
How the ‘root’ tag for an XML block functions
A root tag is required for every XML file, unless the XML is embedded into an .html
file. As shown in the following command, the tag must be the parent, or “outermost,” tag of all the tags nested inside of it:
1 2 3 4 5 6 7 | -root- -elementuno- -/elementuno- -elementdos- -/elementdos- -/root- |
NOTE: The tag itself does not have to be called “root.” However, it must open before all other elements and close after all other elements or it will result in an error.
How to name conventions for XML elements
All of the other element tags for the XML document are nested inside of the root tag. The following rules must be observed when naming XML elements:
- Spaces, or any other whitespace like tabs, are not allowed in the name.
- Integers, letters, hyphens (
-
), underscores (_
) and periods are all permitted characters, but special characters, like the dollar sign$
, are not permitted. Diacritics and accent marks are also permitted. - All elements must begin with either a letter or an underscore (
_
) and the element name cannot begin withXML
,xml
, etc. - Bear in mind that all element names are interpreted as case-sensitive.
- While allowed in the element name, it is good practice to reserve colons (
:
) specifically for namespaces.
NOTE: While the data fields in JSON support spaces in the name, XML does not. Use PascalCase, also called “UpperCamelCase”, when naming XML elements with multiple words or whitespace. Alternatively, use underscores or hyphens to delimit words. Just remember it is critical to maintain consistency throughout the application.
XML modules for Python programming language
There are several Python modules available to iterate and parse XML data. Some of the more popular ones are the sax
and etree
modules.
It is important to note that the Python 3 documentation page for XML modules provides the following disclaimer: “XML modules are not secure against erroneous or maliciously constructed data”.
The following example shows how to parse an XML string in Python using the xml.etree.ElementTree
module:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # import the etree lib import xml.etree.ElementTree as etree_lib # declare an XML string xml_string = ''' -root- -field1 id="field-id-1"- -value1-Hello World-/value1- -value2-Hello World-/value2- -value3-Hello World-/value3- -/field1- -field2 id="field-id-2"- -value1-Hello World-/value1- -value2-Hello World-/value2- -value3-Hello World-/value3- -/field2- -/root- ''' # get the XML tree from the string tree = etree_lib.fromstring(xml_string) # return a list for a specific element path_list = tree.findall(path="field1") print (path_list) # return a list of all the elements print (tree.getchildren()) |
The results should resemble the following:
1 2 | [-element 'field1'="'field1'" at="at" 0x7f929b6c8ea8="0x7f929b6c8ea8"-] [-element 'field1'="'field1'" at="at" 0x7f929b6c8ea8="0x7f929b6c8ea8"-, -element 'field2'="'field2'" at="at" 0x7f929a073c78="0x7f929a073c78"-] |
How to Embed XML Data Inside a HTML File
XML data can also by embedded inside of a .html
file, similar to PHP, by using the -xml-
HTML tag as shown in the following example:
1 2 | -html- # XML Test -xml id="my-xml-data"- -root- -datarow-Hello, world-/datarow- -moredata-ObjectRocket-/moredata- -/root- -/xml- -/html- |
Conclusion
This XML overview tutorial provided an explanation of eXtensible Markup Language. The article covered the syntax and structure of XML data, namespaces and the XML Prolog and disscussed the ‘root’ tag for an XML block. The XML overview tutorial also covered how to name conventions for XML elements, the XML modules for Python and how to embed XML data Inside of a HTML file. Remember that XML is more elaborate than JSON, so it requires more memory and storage space and has some security issues not found with JSON.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started