Digest: Computer Related: DataGlyphs: Xerox's technology for embedding data into documents


	Basic DataGlyphs are a pattern of forward and backward slashes representing ones and zeroes. This pattern forms an evenly textured field.

PARC DataGlyphs® are a robust and unobtrusive method of embedding computer-readable data on paper surfaces.

Unlike most barcodes, DataGlyphs are flexible in shape and size. Their structure and robust error correction also make them suitable for curved surfaces and other situations where barcodes fail.

PARC invented DataGlyphs in 1993, and has licensed the basic software patents to Microglyph Technology GmbH to form the foundation of their microglyph® code. While PARC developed DataGlyphs for document management systems, Microglyph Technology has developed additional, proprietary code structures and algorithms to enable parts-marking for the manufacturing industry, to enable the embedding of computer-readable data on surfaces such as plastic, glass, or metal.


	Roll mouse over image to zoom in. Original size: 3.3" x 3.3" @ 600dpi. Glyphtones, DataGlyphs of varying weights, emulate the look of a grayscale image.

Features:

Flexibility -- adjustable size, shape, color
High data density
Robustness
Adjustable error correction
Compatible with cryptography

At 600dpi, DataGlyphs offer up to 1KB per square inch of data. At this density, the Gettysburg Address fits in a block the size of a small United States postage stamp.

Applications:

document management
fraud prevention
inventory tracking
ID cards
parts marking
product tagging

DataGlyphs have been used in several Xerox products, licensed to a major manufacturer of airplane parts, licensed to Progressive Casualty Insurance for use in turnaround documents, and more. Other markets include financial services, software, government, health care, and pharmaceuticals.


Roll mouse over image to zoom in. Original size: 5.9"x4.8" @600dpi. Color DataGlyphs provide a similar functionality as Glyphtones but extend the applications to color images.	Roll mouse over image to see "invisible" glyphs as seen by the blue channel of a scanner. "Invisible" DataGlyphs are fine yellow glyphs printed on white. This drawing shows them at 200% and 1000% enlargement.

Combining different types of DataGlyphs increases the options for encoding digital information.

Technical Overview of DataGlyphs®

PARC DataGlyphs are a robust and unobtrusive method of embedding computer-readable data on surfaces such as paper, labels, plastic, glass, or metal.

How Data Is Encoded

DataGlyphs encode information - text, data, graphics - in thousands of tiny glyphs. Each glyph consists of a 45-degree diagonal line, as short as one one-hundredth of an inch or less, depending on the resolution of the printing and scanning that's used. Each glyph represents a single binary 0 or 1, depending on whether it slopes to the right or left.

Leaning either forward or back, DataGlyphs represents the ones and zeroes in binary digital data.

The glyphs are laid down in groups on a regular, finely spaced grid forming unobtrusive, evenly textured gray areas. Even when individual glyphs are large enough to be resolved by the human eye, in groups they form a pleasing pattern that is not distracting.

Robustness and Error Correction

In addition to the data, each DataGlyph contains an embedded synchronization lattice or skeleton - a repeating fixed pattern of glyphs that marks the DataGlyph's boundaries and serves as a clocking track to improve the reliability of reading. Groups of glyphs, representing bytes of data, are laid down within this frame.

The data is grouped into blocks of a few dozen bytes each, and error-correction code is added to each block. Individual applications determine the amount of error correction necessary. Of course, higher levels of error correction require larger overall DataGlyphs for a given amount of data, but improve the reliability with which the information can be read back - often a worthwhile trade-off, especially when the DataGlyph will sustain a high level of image noise (for example, during fax transmissions) or when the glyph block will be subjected to rough handling.

For reliability, each DataGlyph contains a measure of error correction appropriate to the application. Glyphs are also randomized to sustain the integrity of the data through damage to the document and laid into a synchronization frame.

As a final step, the bytes of data are randomly dispersed across the entire area of the DataGlyph. Thus, if any part of the DataGlyph is severely damaged, the damage to any individual block of data will be slight, and the error-correcting code will easily be able to recover all the information encoded, despite the damage.

Together, built-in error correction code and data randomization give DataGlyphs a very high level of reliability, even in the face of damage from ink marks, staples, coffee spills, and the other vicissitudes of a paper document's life.

Superior Data Density

The amount of data that can be encoded in a DataGlyph of a given size will vary with the quality of the imprinting and scanning equipment to be used.

DataGlyphs offer a data density nearly twice that of PDF417, one of the most popular forms of 2d barcodes.

For example, with one- and two-dimensional bar codes, the minimum feature size that can be used is 0.0075inch - three dots at 400 dpi. At that density, and with minimal height, Code 39 (the most commonly used general-purpose linear bar code) can only achieve a density of about 25 binary bytes per square inch. Code 128 can achieve about 40 bytes per square inch.

The two-dimensional bar codes, such as PDF417, do much better. achieves a maximum data density of 2,960 bits (or 370 binary bytes) per square inch, with no error correction, at 400 dpi. But with realistic error correction of 27%, the effective data rates for PDF417 are about 270 bytes per square inch.

At the same resolution and level of error correction, DataGlyphs can carry nearly 500 bytes per square inch.

As with other visual encoding schemes, the density of DataGlyph encoding is determined by four factors:

The resolution at which the encoding is created and scanned. High-resolution devices such as office laser printers and document scanners permit denser marking patterns, and thus denser encoding, than low-resolution devices such as dot-matrix printers and fax machines.
The amount of error correction used. The process of printing and scanning unavoidably degrades image-encoded data. In high-density encoding, where the print and scan defects are likely to be large compared to the encoding feature size, more of the encoding features will be lost or misread as a result of such degradation. As a countermeasure, some system of redundancy must be used to keep the failure rate within reasonable bounds - that is, for error correction. And redundant coding consumes extra space, reducing the effective data density. Again, how much error correction must be employed will vary from application to application. But there must always be some in any real-world application of any encoding scheme. The data densities that can be achieved using no error correction are theoretical upper bounds, unlikely to be of practical use.
The data compression used. Data can be compressed as it's encoded, For example, if all the data is numeric, there's no need to use one byte (8 bits) per digit, 3.32 bits will suffice. When text is encoded, it can be compressed by factors of two or more by means of character encoding or other compression techniques. For example the full text of the Gettysburg Address, often used to demonstrate high-density encoding contains 268 words, or about 1,450 characters. But the entire speech can easily be represented in less than 900 bytes.
The fixed overhead of the synchronization frame and header. For DataGlyphs, the synchronization frame is a fixed proportion of the data area. DataGlyphs also have a very small fixed header.

The size of the DataGlyph required to encode 80 bytes of information depends on the device(s) to be used for printing and/or scanning. DataGlyphs for faxing are often drawn disproportionately large for added reliability in the face of the "noise" that frequently affects fax images.