com.microstar.xml
Class XmlParser

java.lang.Object
  extended by com.microstar.xml.XmlParser

public class XmlParser
extends java.lang.Object

Parse XML documents and return parse events through call-backs.

You need to define a class implementing the XmlHandler interface: an object belonging to this class will receive the callbacks for the events. (As an alternative to implementing the full XmlHandler interface, you can simply extend the HandlerBase convenience class.)

Usage (assuming that MyHandler is your implementation of the XmlHandler interface):

 XmlHandler handler = new MyHandler();
 XmlParser parser = new XmlParser();
 parser.setHandler(handler);
 try {
   parser.parse("http://www.host.com/doc.xml", null);
 } catch (Exception e) {
   [do something interesting]
 }
 

Alternatively, you can use the standard SAX interfaces with the SAXDriver class as your entry point.

Since:
Ptolemy II 0.2
Version:
1.1
Author:
Copyright (c) 1997, 1998 by Microstar Software Ltd., Written by David Megginson <dmeggins@microstar.com>
See Also:
XmlHandler, HandlerBase

Field Summary
static int ATTRIBUTE_CDATA
          Constant: the attribute value is a string value.
static int ATTRIBUTE_DEFAULT_FIXED
          Constant: the attribute was declared #FIXED.
static int ATTRIBUTE_DEFAULT_IMPLIED
          Constant: the attribute was declared #IMPLIED.
static int ATTRIBUTE_DEFAULT_REQUIRED
          Constant: the attribute was declared #REQUIRED.
static int ATTRIBUTE_DEFAULT_SPECIFIED
          Constant: the attribute has a literal default value specified.
static int ATTRIBUTE_DEFAULT_UNDECLARED
          Constant: the attribute is not declared.
static int ATTRIBUTE_ENTITIES
          Constant: the attribute value is a list of entity names.
static int ATTRIBUTE_ENTITY
          Constant: the attribute value is the name of an entity.
static int ATTRIBUTE_ENUMERATED
          Constant: the attribute value is a token from an enumeration.
static int ATTRIBUTE_ID
          Constant: the attribute value is a unique identifier.
static int ATTRIBUTE_IDREF
          Constant: the attribute value is a reference to a unique identifier.
static int ATTRIBUTE_IDREFS
          Constant: the attribute value is a list of ID references.
static int ATTRIBUTE_NMTOKEN
          Constant: the attribute value is a name token.
static int ATTRIBUTE_NMTOKENS
          Constant: the attribute value is a list of name tokens.
static int ATTRIBUTE_NOTATION
          Constant: the attribute is the name of a notation.
static int ATTRIBUTE_UNDECLARED
          Constant: the attribute has not been declared for this element type.
private static java.util.Hashtable attributeTypeHash
          Hash table of attribute types.
private  java.io.InputStream baseInputStream
           
private  java.lang.String basePublicId
           
private  java.io.Reader baseReader
           
private  java.lang.String baseURI
           
private  int column
           
static int CONTENT_ANY
          Constant: the element has a content model of ANY.
static int CONTENT_ELEMENTS
          Constant: the element has element content.
static int CONTENT_EMPTY
          Constant: the element has declared content of EMPTY.
static int CONTENT_MIXED
          Constant: the element has mixed content.
static int CONTENT_UNDECLARED
          Constant: an element has not been declared.
private  int context
           
private static int CONTEXT_ATTRIBUTEVALUE
           
private static int CONTEXT_DTD
           
private static int CONTEXT_ENTITYVALUE
           
private static int CONTEXT_NONE
           
private  int currentByteCount
           
private  java.lang.String currentElement
           
private  int currentElementContent
           
private static int DATA_BUFFER_INITIAL
           
private  char[] dataBuffer
           
private  int dataBufferPos
           
private  java.util.Hashtable elementInfo
           
private  int encoding
           
private static int ENCODING_ISO_8859_1
           
private static int ENCODING_UCS_2_12
           
private static int ENCODING_UCS_2_21
           
private static int ENCODING_UCS_4_1234
           
private static int ENCODING_UCS_4_2143
           
private static int ENCODING_UCS_4_3412
           
private static int ENCODING_UCS_4_4321
           
private static int ENCODING_UTF_8
           
static int ENTITY_INTERNAL
          Constant: the entity is internal.
static int ENTITY_NDATA
          Constant: the entity is external, non-XML data.
static int ENTITY_TEXT
          Constant: the entity is external XML data.
static int ENTITY_UNDECLARED
          Constant: the entity has not been declared.
private  java.util.Hashtable entityInfo
           
private  java.util.Stack entityStack
           
private  int errorCount
           
private  java.net.URLConnection externalEntity
           
(package private)  XmlHandler handler
           
private static int INPUT_BUFFER
           
private static int INPUT_EXTERNAL
           
private static int INPUT_INTERNAL
           
private static int INPUT_NONE
           
private static int INPUT_READER
           
private static int INPUT_STREAM
           
private  java.util.Stack inputStack
           
private  java.io.InputStream is
           
private  int line
           
private static int LIT_CHAR_REF
           
private static int LIT_ENTITY_REF
           
private static int LIT_NORMALIZE
           
private static int LIT_PE_REF
           
private static int NAME_BUFFER_INITIAL
           
private  char[] nameBuffer
           
private  int nameBufferPos
           
private  java.util.Hashtable notationInfo
           
private  byte[] rawReadBuffer
           
private static int READ_BUFFER_MAX
           
private  char[] readBuffer
           
private  int readBufferLength
           
private  int readBufferOverflow
           
private  int readBufferPos
           
private  java.io.Reader reader
           
private  boolean sawCR
           
private  int sourceType
           
private static int SYMBOL_TABLE_LENGTH
           
private  java.lang.Object[] symbolTable
           
private  int tagAttributePos
           
private  java.lang.String[] tagAttributes
           
private static boolean USE_CHEATS
           
 
Constructor Summary
XmlParser()
          Construct a new parser with no associated handler.
 
Method Summary
(package private)  void checkEncoding(java.lang.String encodingName, boolean ignoreEncoding)
          Check that the encoding specified makes sense.
(package private)  void cleanupVariables()
          Clean up after the parse to allow some garbage collection.
(package private)  void copyIso8859_1ReadBuffer(int count)
          Convert a buffer of ISO-8859-1-encoded bytes into UTF-16 characters.
(package private)  void copyUcs2ReadBuffer(int count, int shift1, int shift2)
          Convert a buffer of UCS-2-encoded bytes into UTF-16 characters.
(package private)  void copyUcs4ReadBuffer(int count, int shift1, int shift2, int shift3, int shift4)
          Convert a buffer of UCS-4-encoded bytes into UTF-16 characters.
(package private)  void copyUtf8ReadBuffer(int count)
          Convert a buffer of UTF-8-encoded bytes into UTF-16 characters.
(package private)  void dataBufferAppend(char c)
          Add a character to the data buffer.
(package private)  void dataBufferAppend(char[] ch, int start, int length)
          Append (part of) a character array to the data buffer.
(package private)  void dataBufferAppend(java.lang.String s)
          Add a string to the data buffer.
(package private)  void dataBufferFlush()
          Flush the contents of the data buffer to the handler, if appropriate, and reset the buffer for new input.
(package private)  void dataBufferNormalize()
          Normalise whitespace in the data buffer.
(package private)  java.lang.String dataBufferToString()
          Convert the data buffer to a string.
 java.util.Enumeration declaredAttributes(java.lang.String elname)
          Get the declared attributes for an element type.
 java.util.Enumeration declaredElements()
          Get the declared elements for an XML document.
 java.util.Enumeration declaredEntities()
          Get declared entities.
 java.util.Enumeration declaredNotations()
          Get declared notations.
(package private)  void detectEncoding()
          Attempt to detect the encoding of an entity.
private  void doParse(java.lang.String systemId, java.lang.String publicId, java.io.Reader reader, java.io.InputStream stream, java.lang.String encoding)
           
(package private)  void encodingError(java.lang.String message, int value, int offset)
          Report a character encoding error.
(package private)  void error(java.lang.String message, char textFound, java.lang.String textExpected)
          Report a serious error.
(package private)  void error(java.lang.String message, java.lang.String textFound, java.lang.String textExpected)
          Report an error.
(package private)  java.lang.Object extendArray(java.lang.Object array, int currentSize, int requiredSize)
          Ensure the capacity of an array, allocating a new one if necessary.
(package private)  void filterCR()
          Filter carriage returns in the read buffer.
(package private)  java.lang.Object[] getAttribute(java.lang.String elName, java.lang.String name)
          Retrieve the three-member array representing an attribute declaration.
 java.lang.String getAttributeDefaultValue(java.lang.String name, java.lang.String aname)
          Retrieve the default value of a declared attribute.
 int getAttributeDefaultValueType(java.lang.String name, java.lang.String aname)
          Retrieve the default value type of a declared attribute.
 java.lang.String getAttributeEnumeration(java.lang.String name, java.lang.String aname)
          Retrieve the allowed values for an enumerated attribute type.
 java.lang.String getAttributeExpandedValue(java.lang.String name, java.lang.String aname)
          Retrieve the expanded value of a declared attribute.
 int getAttributeType(java.lang.String name, java.lang.String aname)
          Retrieve the declared type of an attribute.
 int getColumnNumber()
          Return the current column number.
 java.lang.String getCurrentElement()
          Return the current element.
(package private)  java.util.Hashtable getElementAttributes(java.lang.String name)
          Look up the attribute hash table for an element.
 java.lang.String getElementContentModel(java.lang.String name)
          Look up the content model of an element.
 int getElementContentType(java.lang.String name)
          Look up the content type of an element.
 java.lang.String getEntityNotationName(java.lang.String eName)
          Get the notation name associated with an NDATA entity.
 java.lang.String getEntityPublicId(java.lang.String ename)
          Return an external entity's public identifier, if any.
 java.lang.String getEntitySystemId(java.lang.String ename)
          Return an external entity's system identifier.
 int getEntityType(java.lang.String ename)
          Find the type of an entity.
 java.lang.String getEntityValue(java.lang.String ename)
          Return the value of an internal entity.
 int getLineNumber()
          Return the current line number.
(package private)  int getNextUtf8Byte(int pos, int count)
          Return the next byte value in a UTF-8 sequence.
 java.lang.String getNotationPublicId(java.lang.String nname)
          Look up the public identifier for a notation.
 java.lang.String getNotationSystemId(java.lang.String nname)
          Look up the system identifier for a notation.
(package private)  void initializeVariables()
          Re-initialize the variables for each parse.
 java.lang.String intern(char[] ch, int start, int length)
          Create an internalised string from a character array.
 java.lang.String intern(java.lang.String s)
          Return an internalised version of a string.
(package private)  boolean isWhitespace(char c)
          Test if a character is whitespace.
 void parse(java.lang.String systemId, java.lang.String publicId, java.io.InputStream stream, java.lang.String encoding)
          Parse an XML document from a byte stream.
 void parse(java.lang.String systemId, java.lang.String publicId, java.io.Reader reader)
          Parse an XML document from a character stream.
 void parse(java.lang.String systemId, java.lang.String publicId, java.lang.String encoding)
          Parse an XML document from a URI.
(package private)  void parseAttDef(java.lang.String elementName)
          Parse a single attribute definition
(package private)  void parseAttlistDecl()
          Parse an attribute list declaration
(package private)  void parseAttribute(java.lang.String name)
          Parse an attribute assignment.
(package private)  void parseCDSect()
          Parse a CDATA marked section.
(package private)  void parseCharRef()
          Read a character reference
(package private)  void parseComment()
          Skip a comment.
(package private)  void parseConditionalSect()
          Parse a conditional section
(package private)  void parseContent()
          Parse the content of an element
(package private)  void parseContentspec(java.lang.String name)
          Content specification
(package private)  void parseCp()
          Parse a content particle
(package private)  void parseDefault(java.lang.String elementName, java.lang.String name, int type, java.lang.String enumeration)
          Parse the default value for an attribute
(package private)  void parseDoctypedecl()
          Parse a document type declaration.
(package private)  void parseDocument()
          Parse an XML document.
(package private)  void parseElement()
          Parse an element, with its tags.
(package private)  void parseElementdecl()
          Parse an element type declaration
(package private)  void parseElements()
          Parse an element-content model
(package private)  void parseEntityDecl()
          Parse an entity declaration
(package private)  void parseEntityRef(boolean externalAllowed)
          Parse a reference
(package private)  void parseEnumeration()
          Parse an enumeration
(package private)  void parseEq()
          Parse an equals sign surrounded by optional whitespace
(package private)  void parseETag()
          Parse an end tag
(package private)  void parseMarkupdecl()
          Parse a markup declaration in the internal or external DTD subset.
(package private)  void parseMisc()
          Parse miscellaneous markup outside the document element and DOCTYPE declaration.
(package private)  void parseMixed()
          Parse mixed content
(package private)  void parseNotationDecl()
          Parse a notation declaration
(package private)  void parseNotationType()
          Parse a notation type for an attribute
(package private)  void parsePCData()
          Parse PCDATA.
(package private)  void parsePEReference(boolean isEntityValue)
          Parse a parameter entity reference
(package private)  void parsePI()
          Parse a processing instruction and do a call-back.
(package private)  void parseProlog()
          Parse the prolog of an XML document.
(package private)  void parseTextDecl(boolean ignoreEncoding)
          Parse the Encoding PI.
(package private)  void parseUntil(java.lang.String delim)
          Read all data until we find the specified string.
(package private)  void parseWhitespace()
          Parse whitespace characters, and leave them in the data buffer.
(package private)  void parseXMLDecl(boolean ignoreEncoding)
          Parse the XML declaration.
(package private)  void popInput()
          Restore a previous input source.
(package private)  void pushCharArray(java.lang.String ename, char[] ch, int start, int length)
          Push a new internal input source.
(package private)  void pushInput(java.lang.String ename)
          Save the current input source onto the stack.
(package private)  void pushString(java.lang.String ename, java.lang.String s)
          This method pushes a string back onto input.
(package private)  void pushURL(java.lang.String ename, java.lang.String publicId, java.lang.String systemId, java.io.Reader reader, java.io.InputStream stream, java.lang.String encoding)
          Push a new external input source.
(package private)  void read8bitEncodingDeclaration()
          Read just the encoding declaration (or XML declaration) at the start of an external entity.
(package private)  int readAttType()
          Parse the attribute type
(package private)  char readCh()
          Read a single character from the readBuffer.
(package private)  void readDataChunk()
          Read a chunk of data from an external input source.
(package private)  java.lang.String[] readExternalIds(boolean inNotation)
          Try reading external identifiers.
(package private)  java.lang.String readLiteral(int flags)
          Read a literal
(package private)  java.lang.String readNmtoken(boolean isName)
          Read a name or name token
(package private)  void require(char delim)
          Require a character to appear, or throw an exception.
(package private)  void require(java.lang.String delim)
          Require a string to appear, or throw an exception.
(package private)  void requireWhitespace()
          Require whitespace characters
(package private)  void setAttribute(java.lang.String elName, java.lang.String name, int type, java.lang.String enumeration, java.lang.String value, int valueType)
          Register an attribute declaration for later retrieval.
(package private)  void setElement(java.lang.String name, int contentType, java.lang.String contentModel, java.util.Hashtable attributes)
          Register an element.
(package private)  void setEntity(java.lang.String eName, int eClass, java.lang.String pubid, java.lang.String sysid, java.lang.String value, java.lang.String nName)
          Register an entity declaration for later retrieval.
(package private)  void setExternalDataEntity(java.lang.String eName, java.lang.String pubid, java.lang.String sysid, java.lang.String nName)
          Register an external data entity.
(package private)  void setExternalTextEntity(java.lang.String eName, java.lang.String pubid, java.lang.String sysid)
          Register an external text entity.
 void setHandler(XmlHandler handler)
          Set the handler that will receive parsing events.
(package private)  void setInternalEntity(java.lang.String eName, java.lang.String value)
          Register an entity declaration for later retrieval.
(package private)  void setNotation(java.lang.String nname, java.lang.String pubid, java.lang.String sysid)
          Register a notation declaration for later retrieval.
(package private)  void skipUntil(java.lang.String delim)
          Skip all data until we find the specified string.
(package private)  void skipWhitespace()
          Skip whitespace characters
(package private)  boolean tryEncoding(byte[] sig, byte b1, byte b2)
          Check for a two-byte signature.
(package private)  boolean tryEncoding(byte[] sig, byte b1, byte b2, byte b3, byte b4)
          Check for a four-byte signature.
(package private)  void tryEncodingDecl(boolean ignoreEncoding)
          Check for an encoding declaration.
(package private)  boolean tryRead(char delim)
          Return true if we can read the expected character.
(package private)  boolean tryRead(java.lang.String delim)
          Return true if we can read the expected string.
(package private)  boolean tryWhitespace()
          Return true if we can read some whitespace.
(package private)  void unread(char c)
          Push a single character back onto the current input stream.
(package private)  void unread(char[] ch, int length)
          Push a char array back onto the current input stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

USE_CHEATS

private static final boolean USE_CHEATS
See Also:
Constant Field Values

CONTENT_UNDECLARED

public static final int CONTENT_UNDECLARED
Constant: an element has not been declared.

See Also:
getElementContentType(java.lang.String), Constant Field Values

CONTENT_ANY

public static final int CONTENT_ANY
Constant: the element has a content model of ANY.

See Also:
getElementContentType(java.lang.String), Constant Field Values

CONTENT_EMPTY

public static final int CONTENT_EMPTY
Constant: the element has declared content of EMPTY.

See Also:
getElementContentType(java.lang.String), Constant Field Values

CONTENT_MIXED

public static final int CONTENT_MIXED
Constant: the element has mixed content.

See Also:
getElementContentType(java.lang.String), Constant Field Values

CONTENT_ELEMENTS

public static final int CONTENT_ELEMENTS
Constant: the element has element content.

See Also:
getElementContentType(java.lang.String), Constant Field Values

ENTITY_UNDECLARED

public static final int ENTITY_UNDECLARED
Constant: the entity has not been declared.

See Also:
getEntityType(java.lang.String), Constant Field Values

ENTITY_INTERNAL

public static final int ENTITY_INTERNAL
Constant: the entity is internal.

See Also:
getEntityType(java.lang.String), Constant Field Values

ENTITY_NDATA

public static final int ENTITY_NDATA
Constant: the entity is external, non-XML data.

See Also:
getEntityType(java.lang.String), Constant Field Values

ENTITY_TEXT

public static final int ENTITY_TEXT
Constant: the entity is external XML data.

See Also:
getEntityType(java.lang.String), Constant Field Values

ATTRIBUTE_UNDECLARED

public static final int ATTRIBUTE_UNDECLARED
Constant: the attribute has not been declared for this element type.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_CDATA

public static final int ATTRIBUTE_CDATA
Constant: the attribute value is a string value.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_ID

public static final int ATTRIBUTE_ID
Constant: the attribute value is a unique identifier.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_IDREF

public static final int ATTRIBUTE_IDREF
Constant: the attribute value is a reference to a unique identifier.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_IDREFS

public static final int ATTRIBUTE_IDREFS
Constant: the attribute value is a list of ID references.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_ENTITY

public static final int ATTRIBUTE_ENTITY
Constant: the attribute value is the name of an entity.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_ENTITIES

public static final int ATTRIBUTE_ENTITIES
Constant: the attribute value is a list of entity names.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_NMTOKEN

public static final int ATTRIBUTE_NMTOKEN
Constant: the attribute value is a name token.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_NMTOKENS

public static final int ATTRIBUTE_NMTOKENS
Constant: the attribute value is a list of name tokens.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_ENUMERATED

public static final int ATTRIBUTE_ENUMERATED
Constant: the attribute value is a token from an enumeration.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_NOTATION

public static final int ATTRIBUTE_NOTATION
Constant: the attribute is the name of a notation.

See Also:
getAttributeType(java.lang.String, java.lang.String), Constant Field Values

attributeTypeHash

private static java.util.Hashtable attributeTypeHash
Hash table of attribute types.


ENCODING_UTF_8

private static final int ENCODING_UTF_8
See Also:
Constant Field Values

ENCODING_ISO_8859_1

private static final int ENCODING_ISO_8859_1
See Also:
Constant Field Values

ENCODING_UCS_2_12

private static final int ENCODING_UCS_2_12
See Also:
Constant Field Values

ENCODING_UCS_2_21

private static final int ENCODING_UCS_2_21
See Also:
Constant Field Values

ENCODING_UCS_4_1234

private static final int ENCODING_UCS_4_1234
See Also:
Constant Field Values

ENCODING_UCS_4_4321

private static final int ENCODING_UCS_4_4321
See Also:
Constant Field Values

ENCODING_UCS_4_2143

private static final int ENCODING_UCS_4_2143
See Also:
Constant Field Values

ENCODING_UCS_4_3412

private static final int ENCODING_UCS_4_3412
See Also:
Constant Field Values

ATTRIBUTE_DEFAULT_UNDECLARED

public static final int ATTRIBUTE_DEFAULT_UNDECLARED
Constant: the attribute is not declared.

See Also:
getAttributeDefaultValueType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_DEFAULT_SPECIFIED

public static final int ATTRIBUTE_DEFAULT_SPECIFIED
Constant: the attribute has a literal default value specified.

See Also:
getAttributeDefaultValueType(java.lang.String, java.lang.String), getAttributeDefaultValue(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_DEFAULT_IMPLIED

public static final int ATTRIBUTE_DEFAULT_IMPLIED
Constant: the attribute was declared #IMPLIED.

See Also:
getAttributeDefaultValueType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_DEFAULT_REQUIRED

public static final int ATTRIBUTE_DEFAULT_REQUIRED
Constant: the attribute was declared #REQUIRED.

See Also:
getAttributeDefaultValueType(java.lang.String, java.lang.String), Constant Field Values

ATTRIBUTE_DEFAULT_FIXED

public static final int ATTRIBUTE_DEFAULT_FIXED
Constant: the attribute was declared #FIXED.

See Also:
getAttributeDefaultValueType(java.lang.String, java.lang.String), getAttributeDefaultValue(java.lang.String, java.lang.String), Constant Field Values

INPUT_NONE

private static final int INPUT_NONE
See Also:
Constant Field Values

INPUT_INTERNAL

private static final int INPUT_INTERNAL
See Also:
Constant Field Values

INPUT_EXTERNAL

private static final int INPUT_EXTERNAL
See Also:
Constant Field Values

INPUT_STREAM

private static final int INPUT_STREAM
See Also:
Constant Field Values

INPUT_BUFFER

private static final int INPUT_BUFFER
See Also:
Constant Field Values

INPUT_READER

private static final int INPUT_READER
See Also:
Constant Field Values

LIT_CHAR_REF

private static final int LIT_CHAR_REF
See Also:
Constant Field Values

LIT_ENTITY_REF

private static final int LIT_ENTITY_REF
See Also:
Constant Field Values

LIT_PE_REF

private static final int LIT_PE_REF
See Also:
Constant Field Values

LIT_NORMALIZE

private static final int LIT_NORMALIZE
See Also:
Constant Field Values

CONTEXT_NONE

private static final int CONTEXT_NONE
See Also:
Constant Field Values

CONTEXT_DTD

private static final int CONTEXT_DTD
See Also:
Constant Field Values

CONTEXT_ENTITYVALUE

private static final int CONTEXT_ENTITYVALUE
See Also:
Constant Field Values

CONTEXT_ATTRIBUTEVALUE

private static final int CONTEXT_ATTRIBUTEVALUE
See Also:
Constant Field Values

handler

XmlHandler handler

reader

private java.io.Reader reader

is

private java.io.InputStream is

line

private int line

column

private int column

sourceType

private int sourceType

inputStack

private java.util.Stack inputStack

externalEntity

private java.net.URLConnection externalEntity

encoding

private int encoding

currentByteCount

private int currentByteCount

errorCount

private int errorCount

READ_BUFFER_MAX

private static final int READ_BUFFER_MAX
See Also:
Constant Field Values

readBuffer

private char[] readBuffer

readBufferPos

private int readBufferPos

readBufferLength

private int readBufferLength

readBufferOverflow

private int readBufferOverflow

entityStack

private java.util.Stack entityStack

rawReadBuffer

private byte[] rawReadBuffer

DATA_BUFFER_INITIAL

private static int DATA_BUFFER_INITIAL

dataBuffer

private char[] dataBuffer

dataBufferPos

private int dataBufferPos

NAME_BUFFER_INITIAL

private static int NAME_BUFFER_INITIAL

nameBuffer

private char[] nameBuffer

nameBufferPos

private int nameBufferPos

elementInfo

private java.util.Hashtable elementInfo

entityInfo

private java.util.Hashtable entityInfo

notationInfo

private java.util.Hashtable notationInfo

currentElement

private java.lang.String currentElement

currentElementContent

private int currentElementContent

basePublicId

private java.lang.String basePublicId

baseURI

private java.lang.String baseURI

baseReader

private java.io.Reader baseReader

baseInputStream

private java.io.InputStream baseInputStream

context

private int context

symbolTable

private java.lang.Object[] symbolTable

SYMBOL_TABLE_LENGTH

private static final int SYMBOL_TABLE_LENGTH
See Also:
Constant Field Values

tagAttributes

private java.lang.String[] tagAttributes

tagAttributePos

private int tagAttributePos

sawCR

private boolean sawCR
Constructor Detail

XmlParser

public XmlParser()
Construct a new parser with no associated handler.

See Also:
setHandler(com.microstar.xml.XmlHandler), parse(java.lang.String, java.lang.String, java.lang.String)
Method Detail

setHandler

public void setHandler(XmlHandler handler)
Set the handler that will receive parsing events.

Parameters:
handler - The handler to receive callback events.
See Also:
parse(java.lang.String, java.lang.String, java.lang.String), XmlHandler

parse

public void parse(java.lang.String systemId,
                  java.lang.String publicId,
                  java.lang.String encoding)
           throws java.lang.Exception
Parse an XML document from a URI.

You may parse a document more than once, but only one thread may call this method for an object at one time.

Parameters:
systemId - The URI of the document.
publicId - The public identifier of the document, or null.
encoding - The suggested encoding, or null if unknown.
Throws:
java.lang.Exception - Any exception thrown by your own handlers, or any derivation of java.io.IOException thrown by the parser itself.

parse

public void parse(java.lang.String systemId,
                  java.lang.String publicId,
                  java.io.InputStream stream,
                  java.lang.String encoding)
           throws java.lang.Exception
Parse an XML document from a byte stream.

The URI that you supply will become the base URI for resolving relative links, but Ælfred will actually read the document from the supplied input stream.

You may parse a document more than once, but only one thread may call this method for an object at one time.

Parameters:
systemId - The base URI of the document, or null if not known.
publicId - The public identifier of the document, or null if not known.
stream - A byte input stream.
encoding - The suggested encoding, or null if unknown.
Throws:
java.lang.Exception - Any exception thrown by your own handlers, or any derivation of java.io.IOException thrown by the parser itself.

parse

public void parse(java.lang.String systemId,
                  java.lang.String publicId,
                  java.io.Reader reader)
           throws java.lang.Exception
Parse an XML document from a character stream.

The URI that you supply will become the base URI for resolving relative links, but Ælfred will actually read the document from the supplied input stream.

You may parse a document more than once, but only one thread may call this method for an object at one time.

Parameters:
systemId - The base URI of the document, or null if not known.
publicId - The public identifier of the document, or null if not known.
reader - A character stream.
Throws:
java.lang.Exception - Any exception thrown by your own handlers, or any derivation of java.io.IOException thrown by the parser itself.

doParse

private void doParse(java.lang.String systemId,
                     java.lang.String publicId,
                     java.io.Reader reader,
                     java.io.InputStream stream,
                     java.lang.String encoding)
              throws java.lang.Exception
Throws:
java.lang.Exception

error

void error(java.lang.String message,
           java.lang.String textFound,
           java.lang.String textExpected)
     throws java.lang.Exception
Report an error.

Parameters:
message - The error message.
textFound - The text that caused the error (or null).
Throws:
java.lang.Exception
See Also:
XmlHandler.error(java.lang.String, java.lang.String, int, int), line

error

void error(java.lang.String message,
           char textFound,
           java.lang.String textExpected)
     throws java.lang.Exception
Report a serious error.

Parameters:
message - The error message.
textFound - The text that caused the error (or null).
Throws:
java.lang.Exception

parseDocument

void parseDocument()
             throws java.lang.Exception
Parse an XML document.
 [1] document ::= prolog element Misc*
 

This is the top-level parsing function for a single XML document. As a minimum, a well-formed document must have a document element, and a valid document must have a prolog as well.

Throws:
java.lang.Exception

parseComment

void parseComment()
            throws java.lang.Exception
Skip a comment.
 [18] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* "-->"
 

(The <!-- has already been read.)

Throws:
java.lang.Exception

parsePI

void parsePI()
       throws java.lang.Exception
Parse a processing instruction and do a call-back.
 [19] PI ::= '<?' Name (S (Char* - (Char* '?>' Char*)))? '?>'
 

(The <? has already been read.)

An XML processing instruction must begin with a Name, which is the instruction's target.

Throws:
java.lang.Exception

parseCDSect

void parseCDSect()
           throws java.lang.Exception
Parse a CDATA marked section.
 [20] CDSect ::= CDStart CData CDEnd
 [21] CDStart ::= '<![CDATA['
 [22] CData ::= (Char* - (Char* ']]>' Char*))
 [23] CDEnd ::= ']]>'
 

(The '<![CDATA[' has already been read.)

Note that this just appends characters to the dataBuffer, without actually generating an event.

Throws:
java.lang.Exception

parseProlog

void parseProlog()
           throws java.lang.Exception
Parse the prolog of an XML document.
 [24] prolog ::= XMLDecl? Misc* (Doctypedecl Misc*)?
 

There are a couple of tricks here. First, it is necessary to declare the XML default attributes after the DTD (if present) has been read. Second, it is not possible to expand general references in attribute value literals until after the entire DTD (if present) has been parsed.

We do not look for the XML declaration here, because it is handled by pushURL().

Throws:
java.lang.Exception
See Also:
pushURL(java.lang.String, java.lang.String, java.lang.String, java.io.Reader, java.io.InputStream, java.lang.String)

parseXMLDecl

void parseXMLDecl(boolean ignoreEncoding)
            throws java.lang.Exception
Parse the XML declaration.
 [25] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
 [26] VersionInfo ::= S 'version' Eq ('"1.0"' | "'1.0'")
 [33] SDDecl ::= S 'standalone' Eq "'" ('yes' | 'no') "'"
               | S 'standalone' Eq '"' ("yes" | "no") '"'
 [78] EncodingDecl ::= S 'encoding' Eq QEncoding
 

([80] to [82] are also significant.)

(The <?xml and whitespace have already been read.)

TODO: validate value of standalone.

Throws:
java.lang.Exception
See Also:
parseTextDecl(boolean), checkEncoding(java.lang.String, boolean)

parseTextDecl

void parseTextDecl(boolean ignoreEncoding)
             throws java.lang.Exception
Parse the Encoding PI.
 [78] EncodingDecl ::= S 'encoding' Eq QEncoding
 [79] EncodingPI ::= '<?xml' S 'encoding' Eq QEncoding S? '?>'
 [80] QEncoding ::= '"' Encoding '"' | "'" Encoding "'"
 [81] Encoding ::= LatinName
 [82] LatinName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
 

(The <?xml' and whitespace have already been read.)

Throws:
java.lang.Exception
See Also:
parseXMLDecl(boolean), checkEncoding(java.lang.String, boolean)

checkEncoding

void checkEncoding(java.lang.String encodingName,
                   boolean ignoreEncoding)
             throws java.lang.Exception
Check that the encoding specified makes sense.

Compare what the author has specified in the XML declaration or encoding PI with what we have detected.

This is also important for distinguishing among the various 7- and 8-bit encodings, such as ISO-LATIN-1 (I cannot autodetect those).

Parameters:
encodingName - The name of the encoding specified by the user.
Throws:
java.lang.Exception
See Also:
parseXMLDecl(boolean), parseTextDecl(boolean)

parseMisc

void parseMisc()
         throws java.lang.Exception
Parse miscellaneous markup outside the document element and DOCTYPE declaration.
 [27] Misc ::= Comment | PI | S
 

Throws:
java.lang.Exception

parseDoctypedecl

void parseDoctypedecl()
                throws java.lang.Exception
Parse a document type declaration.
 [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
                      ('[' %markupdecl* ']' S?)? '>'
 

(The <!DOCTYPE has already been read.)

Throws:
java.lang.Exception

parseMarkupdecl

void parseMarkupdecl()
               throws java.lang.Exception
Parse a markup declaration in the internal or external DTD subset.
 [29] markupdecl ::= ( %elementdecl | %AttlistDecl | %EntityDecl |
                       %NotationDecl | %PI | %S | %Comment |
                       InternalPERef )
 [30] InternalPERef ::= PEReference
 [31] extSubset ::= (%markupdecl | %conditionalSect)*
 

Throws:
java.lang.Exception

parseElement

void parseElement()
            throws java.lang.Exception
Parse an element, with its tags.
 [33] STag ::= '<' Name (S Attribute)* S? '>' [WFC: unique Att spec]
 [38] element ::= EmptyElement | STag content ETag
 [39] EmptyElement ::= '<' Name (S Attribute)* S? '/>'
                       [WFC: unique Att spec]
 

(The '<' has already been read.)

NOTE: this method actually chains onto parseContent(), if necessary, and parseContent() will take care of calling parseETag().

Throws:
java.lang.Exception

parseAttribute

void parseAttribute(java.lang.String name)
              throws java.lang.Exception
Parse an attribute assignment.
 [34] Attribute ::= Name Eq AttValue
 

Parameters:
name - The name of the attribute's element.
Throws:
java.lang.Exception
See Also:
XmlHandler.attribute(java.lang.String, java.lang.String, boolean)

parseEq

void parseEq()
       throws java.lang.Exception
Parse an equals sign surrounded by optional whitespace. [35] Eq ::= S? '=' S?

Throws:
java.lang.Exception

parseETag

void parseETag()
         throws java.lang.Exception
Parse an end tag. [36] ETag ::= '' *NOTE: parseContent() chains to here.

Throws:
java.lang.Exception

parseContent

void parseContent()
            throws java.lang.Exception
Parse the content of an element. [37] content ::= (element | PCData | Reference | CDSect | PI | Comment)* [68] Reference ::= EntityRef | CharRef

Throws:
java.lang.Exception

parseElementdecl

void parseElementdecl()
                throws java.lang.Exception
Parse an element type declaration. [40] elementdecl ::= '' [VC: Unique Element Declaration] *NOTE: the '
Throws:
java.lang.Exception

parseContentspec

void parseContentspec(java.lang.String name)
                throws java.lang.Exception
Content specification. [41] contentspec ::= 'EMPTY' | 'ANY' | Mixed | elements

Throws:
java.lang.Exception

parseElements

void parseElements()
             throws java.lang.Exception
Parse an element-content model. [42] elements ::= (choice | seq) ('?' | '*' | '+')? [44] cps ::= S? %cp S? [45] choice ::= '(' S? %ctokplus (S? '|' S? %ctoks)* S? ')' [46] ctokplus ::= cps ('|' cps)+ [47] ctoks ::= cps ('|' cps)* [48] seq ::= '(' S? %stoks (S? ',' S? %stoks)* S? ')' [49] stoks ::= cps (',' cps)* *NOTE: the opening '(' and S have already been read. *TODO: go over parameter entity boundaries more carefully.

Throws:
java.lang.Exception

parseCp

void parseCp()
       throws java.lang.Exception
Parse a content particle. [43] cp ::= (Name | choice | seq) ('?' | '*' | '+') *NOTE: I actually use a slightly different production here: cp ::= (elements | (Name ('?' | '*' | '+')?))

Throws:
java.lang.Exception

parseMixed

void parseMixed()
          throws java.lang.Exception
Parse mixed content. [50] Mixed ::= '(' S? %( %'#PCDATA' (S? '|' S? %Mtoks)* ) S? ')*' | '(' S? %('#PCDATA') S? ')' [51] Mtoks ::= %Name (S? '|' S? %Name)* *NOTE: the S and '#PCDATA' have already been read.

Throws:
java.lang.Exception

parseAttlistDecl

void parseAttlistDecl()
                throws java.lang.Exception
Parse an attribute list declaration. [52] AttlistDecl ::= '' *NOTE: the '
Throws:
java.lang.Exception

parseAttDef

void parseAttDef(java.lang.String elementName)
           throws java.lang.Exception
Parse a single attribute definition. [53] AttDef ::= S %Name S %AttType S %Default

Throws:
java.lang.Exception

readAttType

int readAttType()
          throws java.lang.Exception
Parse the attribute type. [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS' [57] EnumeratedType ::= NotationType | Enumeration *TODO: validate the type!!

Throws:
java.lang.Exception

parseEnumeration

void parseEnumeration()
                throws java.lang.Exception
Parse an enumeration. [60] Enumeration ::= '(' S? %Etoks (S? '|' S? %Etoks)* S? ')' [61] Etoks ::= %Nmtoken (S? '|' S? %Nmtoken)* *NOTE: the '(' has already been read.

Throws:
java.lang.Exception

parseNotationType

void parseNotationType()
                 throws java.lang.Exception
Parse a notation type for an attribute. [58] NotationType ::= %'NOTATION' S '(' S? %Ntoks (S? '|' S? %Ntoks)* S? ')' [59] Ntoks ::= %Name (S? '|' S? %Name) *NOTE: the 'NOTATION' has already been read

Throws:
java.lang.Exception

parseDefault

void parseDefault(java.lang.String elementName,
                  java.lang.String name,
                  int type,
                  java.lang.String enumeration)
            throws java.lang.Exception
Parse the default value for an attribute. [62] Default ::= '#REQUIRED' | '#IMPLIED' | ((%'#FIXED' S)? %AttValue

Throws:
java.lang.Exception

parseConditionalSect

void parseConditionalSect()
                    throws java.lang.Exception
Parse a conditional section. [63] conditionalSect ::= includeSect || ignoreSect [64] includeSect ::= '' [65] ignoreSect ::= '' [66] ignoreSectContents ::= ((SkipLit | Comment | PI) -(Char* ']]>')) | ('') | (Char - (']' | [<'"])) | ('
Throws:
java.lang.Exception

parseCharRef

void parseCharRef()
            throws java.lang.Exception
Read a character reference. [67] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' *NOTE: the '&#' has already been read.

Throws:
java.lang.Exception

parseEntityRef

void parseEntityRef(boolean externalAllowed)
              throws java.lang.Exception
Parse a reference. [69] EntityRef ::= '&' Name ';' *NOTE: the '&' has already been read.

Parameters:
externalAllowed - External entities are allowed here.
Throws:
java.lang.Exception

parsePEReference

void parsePEReference(boolean isEntityValue)
                throws java.lang.Exception
Parse a parameter entity reference. [70] PEReference ::= '%' Name ';' *NOTE: the '%' has already been read.

Throws:
java.lang.Exception

parseEntityDecl

void parseEntityDecl()
               throws java.lang.Exception
Parse an entity declaration. [71] EntityDecl ::= '' | '' [72] EntityDef ::= EntityValue | ExternalDef [73] ExternalDef ::= ExternalID %NDataDecl? [74] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [75] NDataDecl ::= S %'NDATA' S %Name *NOTE: the '
Throws:
java.lang.Exception

parseNotationDecl

void parseNotationDecl()
                 throws java.lang.Exception
Parse a notation declaration. [81] NotationDecl ::= '' *NOTE: the '
Throws:
java.lang.Exception

parsePCData

void parsePCData()
           throws java.lang.Exception
Parse PCDATA.
 [16] PCData ::= [^<&]*
 

The trick here is that the data stays in the dataBuffer without necessarily being converted to a string right away.

Throws:
java.lang.Exception

requireWhitespace

void requireWhitespace()
                 throws java.lang.Exception
Require whitespace characters. [1] S ::= (#x20 | #x9 | #xd | #xa)+

Throws:
java.lang.Exception

parseWhitespace

void parseWhitespace()
               throws java.lang.Exception
Parse whitespace characters, and leave them in the data buffer.

Throws:
java.lang.Exception

skipWhitespace

void skipWhitespace()
              throws java.lang.Exception
Skip whitespace characters. [1] S ::= (#x20 | #x9 | #xd | #xa)+

Throws:
java.lang.Exception

readNmtoken

java.lang.String readNmtoken(boolean isName)
                       throws java.lang.Exception
Read a name or name token. [5] Name ::= (Letter | '_' | ':') (NameChar)* [7] Nmtoken ::= (NameChar)+ *NOTE: [6] is implemented implicitly where required.

Throws:
java.lang.Exception

readLiteral

java.lang.String readLiteral(int flags)
                       throws java.lang.Exception
Read a literal. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [11] SystemLiteral ::= '"' URLchar* '"' | "'" (URLchar - "'")* "'" [13] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' | "'" ([^%&'] | PEReference | Reference)* "'"

Throws:
java.lang.Exception

readExternalIds

java.lang.String[] readExternalIds(boolean inNotation)
                             throws java.lang.Exception
Try reading external identifiers.

The system identifier is not required for notations.

Parameters:
inNotation - Are we in a notation?
Returns:
A two-member String array containing the identifiers.
Throws:
java.lang.Exception

isWhitespace

final boolean isWhitespace(char c)
Test if a character is whitespace.
 [1] S ::= (#x20 | #x9 | #xd | #xa)+
 

Parameters:
c - The character to test.
Returns:
true if the character is whitespace.

dataBufferAppend

void dataBufferAppend(char c)
Add a character to the data buffer.


dataBufferAppend

void dataBufferAppend(java.lang.String s)
Add a string to the data buffer.


dataBufferAppend

void dataBufferAppend(char[] ch,
                      int start,
                      int length)
Append (part of) a character array to the data buffer.


dataBufferNormalize

void dataBufferNormalize()
Normalise whitespace in the data buffer.


dataBufferToString

java.lang.String dataBufferToString()
Convert the data buffer to a string.

See Also:
intern(char[],int,int)

dataBufferFlush

void dataBufferFlush()
               throws java.lang.Exception
Flush the contents of the data buffer to the handler, if appropriate, and reset the buffer for new input.

Throws:
java.lang.Exception

require

void require(java.lang.String delim)
       throws java.lang.Exception
Require a string to appear, or throw an exception.

Throws:
java.lang.Exception

require

void require(char delim)
       throws java.lang.Exception
Require a character to appear, or throw an exception.

Throws:
java.lang.Exception

intern

public java.lang.String intern(java.lang.String s)
Return an internalised version of a string.

Ælfred uses this method to create an internalised version of all names and attribute values, so that it can test equality with == instead of String.equals().

If you want to be able to test for equality in the same way, you can use this method to internalise your own strings first:

 String PARA = handler.intern("PARA");
 

Note that this will not return the same results as String.intern().

Parameters:
s - The string to internalise.
Returns:
An internalised version of the string.
See Also:
intern(char[],int,int), String.intern()

intern

public java.lang.String intern(char[] ch,
                               int start,
                               int length)
Create an internalised string from a character array.

This is much more efficient than constructing a non-internalised string first, and then internalising it.

Note that this will not return the same results as String.intern().

Parameters:
ch - an array of characters for building the string.
start - the starting position in the array.
length - the number of characters to place in the string.
Returns:
an internalised string.
See Also:
intern(String), String.intern()

extendArray

java.lang.Object extendArray(java.lang.Object array,
                             int currentSize,
                             int requiredSize)
Ensure the capacity of an array, allocating a new one if necessary.


declaredElements

public java.util.Enumeration declaredElements()
Get the declared elements for an XML document.

The results will be valid only after the DTD (if any) has been parsed.

Returns:
An enumeration of all element types declared for this document (as Strings).
See Also:
getElementContentType(java.lang.String), getElementContentModel(java.lang.String)

getElementContentType

public int getElementContentType(java.lang.String name)
Look up the content type of an element.

Parameters:
name - The element type name.
Returns:
An integer constant representing the content type.
See Also:
getElementContentModel(java.lang.String), CONTENT_UNDECLARED, CONTENT_ANY, CONTENT_EMPTY, CONTENT_MIXED, CONTENT_ELEMENTS

getElementContentModel

public java.lang.String getElementContentModel(java.lang.String name)
Look up the content model of an element.

The result will always be null unless the content type is CONTENT_ELEMENTS or CONTENT_MIXED.

Parameters:
name - The element type name.
Returns:
The normalised content model, as a string.
See Also:
getElementContentType(java.lang.String)

setElement

void setElement(java.lang.String name,
                int contentType,
                java.lang.String contentModel,
                java.util.Hashtable attributes)
          throws java.lang.Exception
Register an element. Array format: element type attribute hash table

Throws:
java.lang.Exception

getElementAttributes

java.util.Hashtable getElementAttributes(java.lang.String name)
Look up the attribute hash table for an element. The hash table is the second item in the element array.


declaredAttributes

public java.util.Enumeration declaredAttributes(java.lang.String elname)
Get the declared attributes for an element type.

Parameters:
elname - The name of the element type.
Returns:
An Enumeration of all the attributes declared for a specific element type. The results will be valid only after the DTD (if any) has been parsed.
See Also:
getAttributeType(java.lang.String, java.lang.String), getAttributeEnumeration(java.lang.String, java.lang.String), getAttributeDefaultValueType(java.lang.String, java.lang.String), getAttributeDefaultValue(java.lang.String, java.lang.String), getAttributeExpandedValue(java.lang.String, java.lang.String)

getAttributeType

public int getAttributeType(java.lang.String name,
                            java.lang.String aname)
Retrieve the declared type of an attribute.

Parameters:
name - The name of the associated element.
aname - The name of the attribute.
Returns:
An integer constant representing the attribute type.
See Also:
ATTRIBUTE_UNDECLARED, ATTRIBUTE_CDATA, ATTRIBUTE_ID, ATTRIBUTE_IDREF, ATTRIBUTE_IDREFS, ATTRIBUTE_ENTITY, ATTRIBUTE_ENTITIES, ATTRIBUTE_NMTOKEN, ATTRIBUTE_NMTOKENS, ATTRIBUTE_ENUMERATED, ATTRIBUTE_NOTATION

getAttributeEnumeration

public java.lang.String getAttributeEnumeration(java.lang.String name,
                                                java.lang.String aname)
Retrieve the allowed values for an enumerated attribute type.

Parameters:
name - The name of the associated element.
aname - The name of the attribute.
Returns:
A string containing the token list.
See Also:
ATTRIBUTE_ENUMERATED, ATTRIBUTE_NOTATION

getAttributeDefaultValue

public java.lang.String getAttributeDefaultValue(java.lang.String name,
                                                 java.lang.String aname)
Retrieve the default value of a declared attribute.

Parameters:
name - The name of the associated element.
aname - The name of the attribute.
Returns:
The default value, or null if the attribute was #IMPLIED or simply undeclared and unspecified.
See Also:
getAttributeExpandedValue(java.lang.String, java.lang.String)

getAttributeExpandedValue

public java.lang.String getAttributeExpandedValue(java.lang.String name,
                                                  java.lang.String aname)
Retrieve the expanded value of a declared attribute.

All general entities will be expanded.

Parameters:
name - The name of the associated element.
aname - The name of the attribute.
Returns:
The expanded default value, or null if the attribute was #IMPLIED or simply undeclared
See Also:
getAttributeDefaultValue(java.lang.String, java.lang.String)

getAttributeDefaultValueType

public int getAttributeDefaultValueType(java.lang.String name,
                                        java.lang.String aname)
Retrieve the default value type of a declared attribute.

Parameters:
name - The name of the element.
aname - The name of the attribute.
Returns:
ATTRIBUTE_DEFAULT_UNDECLARED if the attribute cannot be found, otherwise return an integer.
See Also:
ATTRIBUTE_DEFAULT_SPECIFIED, ATTRIBUTE_DEFAULT_IMPLIED, ATTRIBUTE_DEFAULT_REQUIRED, ATTRIBUTE_DEFAULT_FIXED

setAttribute

void setAttribute(java.lang.String elName,
                  java.lang.String name,
                  int type,
                  java.lang.String enumeration,
                  java.lang.String value,
                  int valueType)
            throws java.lang.Exception
Register an attribute declaration for later retrieval. Format: - String type - String default value - int value type *TODO: do something with attribute types.

Throws:
java.lang.Exception

getAttribute

java.lang.Object[] getAttribute(java.lang.String elName,
                                java.lang.String name)
Retrieve the three-member array representing an attribute declaration.

Parameters:
elName - The name of the element.
name - The name of the attribute.

declaredEntities

public java.util.Enumeration declaredEntities()
Get declared entities.

Returns:
An Enumeration of all the entities declared for this XML document. The results will be valid only after the DTD (if any) has been parsed.
See Also:
getEntityType(java.lang.String), getEntityPublicId(java.lang.String), getEntitySystemId(java.lang.String), getEntityValue(java.lang.String), getEntityNotationName(java.lang.String)

getCurrentElement

public java.lang.String getCurrentElement()
Return the current element.

Returns:
The current Element.

getEntityType

public int getEntityType(java.lang.String ename)
Find the type of an entity.

Parameters:
ename - The name of the entity.
Returns:
An integer constant representing the entity type.
See Also:
ENTITY_UNDECLARED, ENTITY_INTERNAL, ENTITY_NDATA, ENTITY_TEXT

getEntityPublicId

public java.lang.String getEntityPublicId(java.lang.String ename)
Return an external entity's public identifier, if any.

Parameters:
ename - The name of the external entity.
Returns:
The entity's system identifier, or null if the entity was not declared, if it is not an external entity, or if no public identifier was provided.
See Also:
getEntityType(java.lang.String)

getEntitySystemId

public java.lang.String getEntitySystemId(java.lang.String ename)
Return an external entity's system identifier.

Parameters:
ename - The name of the external entity.
Returns:
The entity's system identifier, or null if the entity was not declared, or if it is not an external entity.
See Also:
getEntityType(java.lang.String)

getEntityValue

public java.lang.String getEntityValue(java.lang.String ename)
Return the value of an internal entity.

Parameters:
ename - The name of the internal entity.
Returns:
The entity's value, or null if the entity was not declared, or if it is not an internal entity.
See Also:
getEntityType(java.lang.String)

getEntityNotationName

public java.lang.String getEntityNotationName(java.lang.String eName)
Get the notation name associated with an NDATA entity.

Parameters:
eName - The NDATA entity name.
Returns:
The associated notation name, or null if the entity was not declared, or if it is not an NDATA entity.
See Also:
getEntityType(java.lang.String)

setInternalEntity

void setInternalEntity(java.lang.String eName,
                       java.lang.String value)
Register an entity declaration for later retrieval.


setExternalDataEntity

void setExternalDataEntity(java.lang.String eName,
                           java.lang.String pubid,
                           java.lang.String sysid,
                           java.lang.String nName)
Register an external data entity.


setExternalTextEntity

void setExternalTextEntity(java.lang.String eName,
                           java.lang.String pubid,
                           java.lang.String sysid)
Register an external text entity.


setEntity

void setEntity(java.lang.String eName,
               int eClass,
               java.lang.String pubid,
               java.lang.String sysid,
               java.lang.String value,
               java.lang.String nName)
Register an entity declaration for later retrieval.


declaredNotations

public java.util.Enumeration declaredNotations()
Get declared notations.

Returns:
An Enumeration of all the notations declared for this XML document. The results will be valid only after the DTD (if any) has been parsed.
See Also:
getNotationPublicId(java.lang.String), getNotationSystemId(java.lang.String)

getNotationPublicId

public java.lang.String getNotationPublicId(java.lang.String nname)
Look up the public identifier for a notation. You will normally use this method to look up a notation that was provided as an attribute value or for an NDATA entity.

Parameters:
nname - The name of the notation.
Returns:
A string containing the public identifier, or null if none was provided or if no such notation was declared.
See Also:
getNotationSystemId(java.lang.String)

getNotationSystemId

public java.lang.String getNotationSystemId(java.lang.String nname)
Look up the system identifier for a notation. You will normally use this method to look up a notation that was provided as an attribute value or for an NDATA entity.

Parameters:
nname - The name of the notation.
Returns:
A string containing the system identifier, or null if no such notation was declared.
See Also:
getNotationPublicId(java.lang.String)

setNotation

void setNotation(java.lang.String nname,
                 java.lang.String pubid,
                 java.lang.String sysid)
           throws java.lang.Exception
Register a notation declaration for later retrieval. Format: - public id - system id

Throws:
java.lang.Exception

getLineNumber

public int getLineNumber()
Return the current line number.

Returns:
The current line number.

getColumnNumber

public int getColumnNumber()
Return the current column number.

Returns:
The current column number.

readCh

char readCh()
      throws java.lang.Exception
Read a single character from the readBuffer.

The readDataChunk() method maintains the buffer.

If we hit the end of an entity, try to pop the stack and keep going.

(This approach doesn't really enforce XML's rules about entity boundaries, but this is not currently a validating parser).

This routine also attempts to keep track of the current position in external entities, but it's not entirely accurate.

Returns:
The next available input character.
Throws:
java.lang.Exception
See Also:
unread(char), readDataChunk(), readBuffer, line

unread

void unread(char c)
      throws java.lang.Exception
Push a single character back onto the current input stream.

This method usually pushes the character back onto the readBuffer.

I don't think that this would ever be called with readBufferPos = 0, because the methods always reads a character before unreading it, but just in case, I've added a boundary condition.

Parameters:
c - The character to push back.
Throws:
java.lang.Exception
See Also:
readCh(), unread(char[], int), readBuffer

unread

void unread(char[] ch,
            int length)
      throws java.lang.Exception
Push a char array back onto the current input stream.

NOTE: you must never push back characters that you haven't actually read: use pushString() instead.

Throws:
java.lang.Exception
See Also:
readCh(), unread(char), readBuffer, pushString(java.lang.String, java.lang.String)

pushURL

void pushURL(java.lang.String ename,
             java.lang.String publicId,
             java.lang.String systemId,
             java.io.Reader reader,
             java.io.InputStream stream,
             java.lang.String encoding)
       throws java.lang.Exception
Push a new external input source.

The source will be either an external text entity, or the DTD external subset.

TO DO: Right now, this method always attempts to autodetect the encoding; in the future, it should allow the caller to request an encoding explicitly, and it should also look at the headers with an HTTP connection.

Parameters:
ename -
publicId -
systemId -
reader -
stream -
encoding -
Throws:
java.lang.Exception
See Also:
XmlHandler.resolveEntity(java.lang.String, java.lang.String), pushString(java.lang.String, java.lang.String), sourceType, pushInput(java.lang.String), detectEncoding(), sourceType, readBuffer

tryEncodingDecl

void tryEncodingDecl(boolean ignoreEncoding)
               throws java.lang.Exception
Check for an encoding declaration.

Throws:
java.lang.Exception

detectEncoding

void detectEncoding()
              throws java.lang.Exception
Attempt to detect the encoding of an entity.

The trick here (as suggested in the XML standard) is that any entity not in UTF-8, or in UCS-2 with a byte-order mark, must begin with an XML declaration or an encoding declaration; we simply have to look for "<?XML" in various encodings.

This method has no way to distinguish among 8-bit encodings. Instead, it assumes UTF-8, then (possibly) revises its assumption later in checkEncoding(). Any ASCII-derived 8-bit encoding should work, but most will be rejected later by checkEncoding().

I don't currently detect EBCDIC, since I'm concerned that it could also be a valid UTF-8 sequence; I'll have to do more checking later.

Throws:
java.lang.Exception
See Also:
tryEncoding(byte[], byte, byte, byte, byte), tryEncoding(byte[], byte, byte), checkEncoding(java.lang.String, boolean), read8bitEncodingDeclaration()

tryEncoding

boolean tryEncoding(byte[] sig,
                    byte b1,
                    byte b2,
                    byte b3,
                    byte b4)
Check for a four-byte signature.

Utility routine for detectEncoding().

Always looks for some part of "

Parameters:
sig - The first four bytes read.
b1 - The first byte of the signature
b2 - The second byte of the signature
b3 - The third byte of the signature
b4 - The fourth byte of the signature
See Also:
detectEncoding()

tryEncoding

boolean tryEncoding(byte[] sig,
                    byte b1,
                    byte b2)
Check for a two-byte signature.

Looks for a UCS-2 byte-order mark.

Utility routine for detectEncoding().

Parameters:
sig - The first four bytes read.
b1 - The first byte of the signature
b2 - The second byte of the signature
See Also:
detectEncoding()

pushString

void pushString(java.lang.String ename,
                java.lang.String s)
          throws java.lang.Exception
This method pushes a string back onto input.

It is useful either as the expansion of an internal entity, or for backtracking during the parse.

Call pushCharArray() to do the actual work.

Parameters:
s - The string to push back onto input.
Throws:
java.lang.Exception
See Also:
pushCharArray(java.lang.String, char[], int, int)

pushCharArray

void pushCharArray(java.lang.String ename,
                   char[] ch,
                   int start,
                   int length)
             throws java.lang.Exception
Push a new internal input source.

This method is useful for expanding an internal entity, or for unreading a string of characters. It creates a new readBuffer containing the characters in the array, instead of characters converted from an input byte stream.

I've added a couple of optimisations: don't push zero- length strings, and just push back a single character for 1-character strings; this should save some time and memory.

Parameters:
ch - The char array to push.
Throws:
java.lang.Exception
See Also:
pushString(java.lang.String, java.lang.String), pushURL(java.lang.String, java.lang.String, java.lang.String, java.io.Reader, java.io.InputStream, java.lang.String), readBuffer, sourceType, pushInput(java.lang.String)

pushInput

void pushInput(java.lang.String ename)
         throws java.lang.Exception
Save the current input source onto the stack.

This method saves all of the global variables associated with the current input source, so that they can be restored when a new input source has finished. It also tests for entity recursion.

The method saves the following global variables onto a stack using a fixed-length array:

  1. sourceType
  2. externalEntity
  3. readBuffer
  4. readBufferPos
  5. readBufferLength
  6. line
  7. encoding

Parameters:
ename - The name of the entity (if any) causing the new input.
Throws:
java.lang.Exception
See Also:
popInput(), sourceType, externalEntity, readBuffer, readBufferPos, readBufferLength, line, encoding

popInput

void popInput()
        throws java.lang.Exception
Restore a previous input source.

This method restores all of the global variables associated with the current input source.

Throws:
java.io.EOFException - If there are no more entries on the input stack.
java.lang.Exception
See Also:
pushInput(java.lang.String), sourceType, externalEntity, readBuffer, readBufferPos, readBufferLength, line, encoding

tryRead

boolean tryRead(char delim)
          throws java.lang.Exception
Return true if we can read the expected character.

Note that the character will be removed from the input stream on success, but will be put back on failure. Do not attempt to read the character again if the method succeeds.

Parameters:
delim - The character that should appear next. For a insensitive match, you must supply this in upper-case.
Returns:
true if the character was successfully read, or false if it was not.
Throws:
java.lang.Exception
See Also:
tryRead(String)

tryRead

boolean tryRead(java.lang.String delim)
          throws java.lang.Exception
Return true if we can read the expected string.

This is simply a convenience method.

Note that the string will be removed from the input stream on success, but will be put back on failure. Do not attempt to read the string again if the method succeeds.

This method will push back a character rather than an array whenever possible (probably the majority of cases).

NOTE: This method currently has a hard-coded limit of 100 characters for the delimiter.

Parameters:
delim - The string that should appear next.
Returns:
true if the string was successfully read, or false if it was not.
Throws:
java.lang.Exception
See Also:
tryRead(char)

tryWhitespace

boolean tryWhitespace()
                throws java.lang.Exception
Return true if we can read some whitespace.

This is simply a convenience method.

This method will push back a character rather than an array whenever possible (probably the majority of cases).

Returns:
true if whitespace was found.
Throws:
java.lang.Exception

parseUntil

void parseUntil(java.lang.String delim)
          throws java.lang.Exception
Read all data until we find the specified string.

This is especially useful for scanning marked sections.

This is a a little inefficient right now, since it calls tryRead() for every character.

Parameters:
delim - The string delimiter
Throws:
java.lang.Exception
See Also:
tryRead(String), readCh()

skipUntil

void skipUntil(java.lang.String delim)
         throws java.lang.Exception
Skip all data until we find the specified string.

This is especially useful for scanning comments.

This is a a little inefficient right now, since it calls tryRead() for every character.

Parameters:
delim - The string delimiter
Throws:
java.lang.Exception
See Also:
readCh()

read8bitEncodingDeclaration

void read8bitEncodingDeclaration()
                           throws java.lang.Exception
Read just the encoding declaration (or XML declaration) at the start of an external entity. When this method is called, we know that the declaration is present (or appears to be). We also know that the entity is in some sort of ASCII-derived 8-bit encoding. The idea of this is to let us read what the 8-bit encoding is before we've committed to converting any more of the file; the XML or encoding declaration must be in 7-bit ASCII, so we're safe as long as we don't go past it.

Throws:
java.lang.Exception

readDataChunk

void readDataChunk()
             throws java.lang.Exception
Read a chunk of data from an external input source.

This is simply a front-end that fills the rawReadBuffer with bytes, then calls the appropriate encoding handler.

Throws:
java.lang.Exception
See Also:
encoding, rawReadBuffer, readBuffer, filterCR(), copyUtf8ReadBuffer(int), copyIso8859_1ReadBuffer(int)

filterCR

void filterCR()
Filter carriage returns in the read buffer.

CRLF becomes LF; CR becomes LF.

See Also:
readDataChunk(), readBuffer, readBufferOverflow

copyUtf8ReadBuffer

void copyUtf8ReadBuffer(int count)
                  throws java.lang.Exception
Convert a buffer of UTF-8-encoded bytes into UTF-16 characters.

When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.

The tricky part of this is dealing with UTF-8 multi-byte sequences, but it doesn't seem to slow things down too much.

Parameters:
count - The number of bytes to convert.
Throws:
java.lang.Exception
See Also:
readDataChunk(), rawReadBuffer, readBuffer, getNextUtf8Byte(int, int)

getNextUtf8Byte

int getNextUtf8Byte(int pos,
                    int count)
              throws java.lang.Exception
Return the next byte value in a UTF-8 sequence. If it is not possible to get a byte from the current entity, throw an exception.

Parameters:
pos - The current position in the rawReadBuffer.
count - The number of bytes in the rawReadBuffer
Returns:
The significant six bits of a non-initial byte in a UTF-8 sequence.
Throws:
java.io.EOFException - If the sequence is incomplete.
java.lang.Exception

copyIso8859_1ReadBuffer

void copyIso8859_1ReadBuffer(int count)
Convert a buffer of ISO-8859-1-encoded bytes into UTF-16 characters.

When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.

This is a direct conversion, with no tricks.

Parameters:
count - The number of bytes to convert.
See Also:
readDataChunk(), rawReadBuffer, readBuffer

copyUcs2ReadBuffer

void copyUcs2ReadBuffer(int count,
                        int shift1,
                        int shift2)
                  throws java.lang.Exception
Convert a buffer of UCS-2-encoded bytes into UTF-16 characters.

When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.

Parameters:
count - The number of bytes to convert.
shift1 - The number of bits to shift byte 1.
shift2 - The number of bits to shift byte 2
Throws:
java.lang.Exception
See Also:
readDataChunk(), rawReadBuffer, readBuffer

copyUcs4ReadBuffer

void copyUcs4ReadBuffer(int count,
                        int shift1,
                        int shift2,
                        int shift3,
                        int shift4)
                  throws java.lang.Exception
Convert a buffer of UCS-4-encoded bytes into UTF-16 characters.

When readDataChunk() calls this method, the raw bytes are in rawReadBuffer, and the final characters will appear in readBuffer.

Java has 16-bit chars, but this routine will attempt to use surrogates to encoding values between 0x00010000 and 0x000fffff.

Parameters:
count - The number of bytes to convert.
shift1 - The number of bits to shift byte 1.
shift2 - The number of bits to shift byte 2
shift3 - The number of bits to shift byte 2
shift4 - The number of bits to shift byte 2
Throws:
java.lang.Exception
See Also:
readDataChunk(), rawReadBuffer, readBuffer

encodingError

void encodingError(java.lang.String message,
                   int value,
                   int offset)
             throws java.lang.Exception
Report a character encoding error.

Throws:
java.lang.Exception

initializeVariables

void initializeVariables()
Re-initialize the variables for each parse.


cleanupVariables

void cleanupVariables()
Clean up after the parse to allow some garbage collection. Leave around anything that might be useful for queries.