anastigmatix.net

This document has a standard, validated CSS2 stylesheet, which your browser does not seem to display properly. In a browser supporting web standards, this table of contents would be fixed at the side of the page for easy reference.

anastigmatix home
  • Extensible PostScript I/O
  • StreamIO: reference
  • StreamIO dictionary contents
  • copyfile
  • extfilter
  • flushn
  • flushthru
  • hold
  • holdfile
  • holdstring
  • nullsrc
  • nulltgt
  • rfile wfile afile r+file w+file a+file
  • stdin stdout stderr
  • Details of individual filters
  • SourceArrayDecode
  • SourceQueueDecode
  • TapDecode
  • StringQueueEncode
  • TeeEncode
  • Additional filters
  • DSCDecode
  • Defining new filters
  • Example
  • Filter naming
  • The .decodehelper procedure
  • Miscellany
  • filterCloseBug
  • filterFlushfileBug
  • StreamIO: I/O conveniences and definable filters in pure PostScript

    StreamIO is a small resource providing some convenience procedures for file and stream I/O built on the existing facilities in LanguageLevel 2 and later PostScript.

    The most significant is extfilter, which is analogous to PostScript's own filter, but allows new filters to be defined by ordinary PostScript code. The set of standard filters in PostScript can be enumerated (using resourceforall in the Filter category), but cannot be extended by any means in standard PostScript. Reportedly, some products do permit new filters to be added, but by product-specific means outside of PostScript proper. With extfilter, the same can be achieved in portable code.

    StreamIO supplies a small set of predefined filters useful for doing slightly more elaborate plumbing in PostScript programs. One of the motivating needs for StreamIO was a portable, systematic way to handle the common task of peeking part way into an inline data stream and then rereading it from the start. The task arises, for example, in a procedure to grovel the dimensions out of an inline image before placing it.

    The ReusableStreamDecode filter introduced by Adobe in LanguageLevel 3 would seem to be an obvious fit, but the fit is disappointing. Besides being available only in LanguageLevel 3, it defers so much behavior to the choice of implementation as to leave very little to rely on. It may (or may not) pre-read the entire inline data stream, even when AsyncRead is requested, and that is a large resource demand, with a risk of resource exhaustion, when typically only the header of an image needs to be groveled.

    A simple solution in a few steps is possible with the filters provided in StreamIO:

    1. Create a StringQueueEncode filter. To ensure that the queue keeps its place if save/restore are executed while reading the input, set global mode temporarily just to allocate the initial empty queue (currentglobal true setglobal 1 array exch setglobal). The filter will honor the queue's original allocation mode.
    2. Create a TapDecode filter on the original inline data stream, giving the filter from step 1 as the TapTarget.
    3. Grovel the header by reading the filter from step 2. All data read from the stream will also be tapped off to the StringQueueEncode filter.
    4. Close the StringQueueEncode filter to flush its data to the queue.
    5. The original file object now represents all the remaining data not yet read. If it is known to be global, simply add it to the end of the queue and then create a SourceQueueDecode filter, from which the entire stream can be read from the beginning. However, if the file object is not global—and you cannot safely assume an arbitrary file such as currentfile is—it cannot be stored in a global queue. For that possibility, it is better to create a SourceQueueDecode filter from the queue without adding the original file, and store that filter and the original file object in a two-element locally allocated array; create a SourceArrayDecode filter from that. SourceArrayDecode is specified to keep its place in the source array regardless of save/restore, whether the array is local or global.

    StreamIO: reference

    StreamIO is a ProcSet resource. To make it available to your own code, include in the setup section of your file:

    /net.anastigmatix.StreamIO /ProcSet findresource
    

    The findresource will succeed, leaving a dictionary on the operand stack, if you have made the StreamIO resource file [download] available in any of these ways:

    StreamIO relies on a few other resources, and you will need those files also. If you use the first method, directly including resources in your file's prolog, the prolog has to contain all of the needed resources, in any order so long as no resource comes before one it depends on, and categories come before resources that belong in them. Any of the other methods should just work as long as all the files are where they need to be. These are the resources you will need:

    ResourceCategoryDescription
    net.anastigmatix.MetaPre ProcSet Staged-programming extensions for PostScript
    net.anastigmatix.filter Category Category to contain filter resources usable with StreamIO.
    additional filters net.anastigmatix.filter The net.anastigmatix.filter resource category is not limited to the predefined filters described here. New filter types can be defined in your own PostScript code or made available as as additional resources of type net.anastigmatix.filter, and would have to be downloaded or placed in a document prolog before use. The filters described below, however, are included in the net.anastigmatix.filter category without any additional download.
    net.anastigmatix.StreamIO ProcSet The main attraction.

    The resource files are in a compact form. That is for efficiency, not to keep you from viewing them; there is a script for that on the resource packaging page.

    The StreamIO dictionary may be placed on the lookup stack (with begin) for convenient access to the definitions in it, without the bother of get and exec. The dictionary is read-only, so before creating any definitions, you will want either userdict begin or your own dict begin so that you have a writable dictionary on top of the dictionary stack.

    StreamIO dictionary contents

    This section describes the contents of the read-only dictionary that is returned by /net.anastigmatix.StreamIO /ProcSet findresource.

    copyfile
    src tgt copyfile

    Reads all contents from the current position of src to the end, writing to tgt. If src or tgt is a string rather than a file object, it will be taken as a file name and opened for reading or writing, respectively. copyfile does not close either file on completion, which is reasonable when file objects are used (since the calling code can close them at the appropriate time) and also convenient with special named files, as in

    f (%stdout) copyfile

    to copy from f to standard output (which ought not to be closed). The file name form should not be used in other cases where the program needs control over when the file is closed.

    extfilter
    src|tgt dict name|proc extfilter file

    Creates and returns a filtered file, analogously to the native filter operator, except that the dictionary form for supplying filter parameters (as used for some native filters in LanguageLevel 2, and usable for all of them in LanguageLevel 3) is mandatory, and the dictionary is never optional. The type of src|tgt is typically a file, string, or procedure, as for the native filters, but the documentation for some special-purpose filters may specify other types for src or tgt.

    The filter type may be specified by name, referring to a resource defined in the category net.anastigmatix.filter, just as native filter names are found in category Filter. Unlike Filter, however, which is an implicit category, net.anastigmatix.filter is an ordinary resource category and new filter types can be constructed with pure PostScript code and registered with defineresource, as described below.

    As a convenience in development, the filter may also be given as a proc in the form appropriate to register in the net.anastigmatix.filter category, without registering it.

    If the filter is given as name and not found in the net.anastigmatix.filter category, it is assumed to be a standard filter known to filter. This permits extfilter to be used to set up any filter whether standard or user-defined, and allows for user-defined filters to supersede standard ones or provide them if the implementation does not. For example, /FlateDecode extfilter could work either on a Level 3 interpreter with native FlateDecode, or on an older interpreter with a PostScript implementation of FlateDecode downloaded in the net.anastigmatix.filter category.

    Like filter, extfilter returns a globally-allocated file object if, and only if, the src|tgt and any composite objects retained from dict are globally allocated, without regard to the current allocation mode.

    Decoding filters created with extfilter do not have a CloseSource parameter, as there is no way in pure PostScript to implement it.

    flushn
    src nbytes flushn

    Efficiently skips and discards the next nbytes bytes from src, or to the end of src if fewer than nbytes bytes remain.

    flushthru
    src string flushthru

    Efficiently skips and discards data from src through and discarding the next occurrence of string, or to the end of src if string does not occur.

    hold
    proc int hold queue int

    Executes proc with a writable file object on top of stack, capturing everything proc writes on the file in a queue of strings, each string no larger than the int argument. Returns the queue and the total byte count written.

    holdfile
    proc int holdfile file int

    As for hold but returns a readable file instead of a queue of strings.

    holdstring
    proc int holdstring string

    As for hold but returns a string containing everything written by proc.

    nullsrc
    nullsrc file

    Produces a readable file with zero bytes available; as with the null device on some operating systems, there can be times it is convenient to supply such a file to an existing procedure.

    nulltgt
    nulltgt file

    Produces a writable file that accepts and discards all data written to it; as with the null device on some operating systems, there can be times it is convenient to supply such a file to an existing procedure.

    rfile wfile afile r+file w+file a+file
    string xxfile file

    The xxfile procedures are equivalent to the file operator with the corresponding access-mode string on the stack. They simplify writing code free of writable objects.

    stdin stdout stderr
    stdxx file

    These procedures push the corresponding standard file objects. They cannot simply be the standard file objects because they are defined in the resource dictionary, which may be global, and the file objects can be local. You have to execute the procedures to get the file objects. Still, //stderr exec is cleaner than (%stderr) (w) file when you want to write code free of writable objects.

    Details of individual filters

    The following filters are predefined in the net.anastigmatix.filter resource category and always available to extfilter.

    SourceArrayDecode filter

    This filter requires not a single src but an array of sources, that is, an array whose elements are any combination of files, strings, or procedures acceptable as normal filter sources. As the resulting file is read, each source in the array supplies data until exhausted, and the resulting file reaches EOD when EOD is reached for the last source in the array.

    The filter dictionary has a single parameter BufferSize which is the size in bytes of the buffer to be used when reading from any element in the source array that has file type. There is no default and the parameter must always be supplied, but the buffer will not be allocated until an element of file type is encountered.

    The filter's progress through the array of sources will not be disturbed by save and restore, even if the array is allocated in local VM. The effect of save/restore on any individual source in the array, however, depends on that source.

    SourceQueueDecode filter

    This filter has the same behavior as SourceArrayDecode but, instead of an array of sources, it accepts a queue of them, in the form used with the enq and deq operators in net.anastigmatix.MetaPre.

    As for SourceArrayDecode, there is one mandatory parameter BufferSize.

    An empty queue is created simply by 1 array. A source s is added to an existing queue q in one of the following ways:

    s q 2 array enq astore pop
    s q 2 array //enq exec astore pop

    The second form is recommended in resource code intended not to depend on the MetaPre resource dictionary being on the dictionary stack at run time.

    The queue should be allocated in global VM if there is any possibility that save/restore will be used while the filter is being read. The filter's progress through a locally-allocated queue can be altered by restore and, as the effect cannot be synchronized with the interpreter's own buffering, the result will not be predictable or useful.

    TapDecode filter

    This filter copies all data that it reads from src into the file given as the TapTarget parameter. Its parameter dictionary may contain the following parameters.

    KeyTypeSemantics
    BufferSize integer The size of buffer to be allocated and used if src is of file type. No buffer is allocated if src is a string or procedure, but the parameter must still be present.
    CloseTarget boolean If true, the file given as TapTarget will be closed as soon as EOD is reached on src. The filter does not flush or close the TapTarget file if this parameter is false, or if src is not read all the way to EOD. In those cases, the calling program must be sure to retain a copy of the file object given with TapTarget and explicitly flush or close it before assuming that all data read from src can be recovered from the resulting file. This parameter has a default of false.
    TapTarget file A file to which all data read from src is to be written. If CloseTarget is false or if the filter is not read all the way to EOD on src, the program must be sure to flush or close this file before assuming that all data read from src can be recovered. This parameter must be present.

    StringQueueEncode filter

    This filter requires a queue as its tgt operand. See SourceQueueDecode for details on constructing a queue. 1 array creates an empty queue.

    Data written to the filtered file will be accumulated in string objects placed on the queue. As with any encoding filter, be sure to flush or close it before assuming that the queue contains all data written. The filter honors the allocation mode of the queue, and uses the same mode for allocating strings to place on it.

    The parameter dictionary may have the following entries.

    KeyTypeSemantics
    BufferSize integer The maximum size of any individual string to be placed on the queue. There is no default and this parameter must be present.
    Count array If this parameter is present, it must have length at least 1 and its first element must be an int. The int will be incremented by the number of bytes written through the filter. Only after a flush or close can its value be relied on.

    TeeEncode filter

    This filter requires not a single tgt but an array whose elements may be any combination of files, strings, or procedures acceptable as data targets to ordinary filters. All data written to the filtered file will be replicated and written to all of the targets. Its parameter dictionary has a mandatory BufferSize limiting the maximum size of any individual write to the underlying targets, and the standard CloseTarget parameter which, if true, ensures that the underlying targets will all be closed when the filtered file is.

    Additional filters

    The following filters are not bundled in the net.anastigmatix.filter category resource, but are available as separate resource files.

    DSCDecode filter

    This filter provides support for reading and processing PostScript input that conforms to TN 5001, the Document Structuring Convention. It maintains a nesting level that is incremented by %%BeginDocument: comments and decremented by %%EndDocument comments. The filter reaches EOD upon reading an %%EndDocument comment that decrements the nesting level to zero, or upon reading at nesting level 1 any comment whose keyword was supplied in the Keywords parameter array, or on reaching the end of the header comments if the HeaderOnly parameter is true. %%BeginData: and %%EndData comments are honored, and the verbatim data passed without scanning for keywords. Lines longer than the DSC-specified maximum of 255 characters are passed without alteration but are not scanned for comments.

    Once reading from the filter has begun, the position of the underlying src is indeterminate until the filter reaches EOD. At that point, if EOD was reached because of a comment line and not EOD on src, then the position of src depends on whether a Pushback entry was present in the filter dictionary:

    No Pushback present

    src is positioned just past a single carriage-return or line-feed byte that terminated the comment line. If the line was terminated with a carriage-return/line-feed sequence, the line-feed remains to be read from src, while in any other case the next character read from src is the first of the following line. As line-feed is considered whitespace, the difference is unimportant for many purposes, but should be checked if lines must be distinguished accurately.

    Pushback present

    The comment line has been consumed along with any directly following %%+ lines. The newline sequence that ends the last line consumed has also been completely consumed, whether a one-byte cr or lf or a two-byte crlf sequence, so when Pushback is used there is no special attention needed for accurate line counting. src is positioned up to three bytes beyond the last byte consumed, and the buffer supplied to Pushback contains those zero to three bytes and their count. If a new DSCDecode filter is opened to resume reading from src, correct behavior requires only that the same Pushback buffer be supplied in its filter dictionary without alteration. If other code will resume reading from src, the zero to three bytes in the Pushback buffer must be treated as preceding the next byte read from src. (This is the price of getting %%+ lines read automagically and not having to manipulate CountLF.)

    It is recommended that src be a file object. The position in a string or procedure source cannot be accurately determined after this filter reached EOD.

    DSCDecode can be used as a simple filter for reading from an inlined document until the balancing %%EndDocument, or can be used to simplify scanning for particular comments. Simply creating a DSCDecode filter with certain keywords given in the Keywords parameter and flushing it with flushfile will cause the file to be scanned until the next matching comment is encountered at nesting level 1 (or another EOD condition is matched). Another filter can be created to resume reading from that point. The next filter should be created with NestLevel specifying the nesting level at which the previous filter terminated, and CountLF set to false if and only if the last character read from the previous filter was carriage-return when no Pushback buffer was used.

    The parameter dictionary may have the following entries:

    KeyTypeSemantics
    CountLF boolean Whether the filter should consider a line-feed as the first character read to represent an actual newline. This should be set to false if and only if an immediately-prior read ended with a carriage-return and the Pushback entry is not present, as in that case an initial line-feed should be considered part of a single CRLF newline sequence. There is no use for this entry when Pushback is used. Default: true.
    HeaderOnly boolean Whether the filter should reach EOD at the end of the header comments, that is, either at an explicit %%EndComments line, or the first line that does not begin %X where X is a character value between 33 and 126 decimal. When using this feature to scan through a header, care should be taken to set CountLF correctly when Pushback is not used, as a line-feed misinterpreted as a newline would incorrectly be taken to end the header. Default: false.
    Keywords array An array of strings representing comment keywords. The filter will reach EOD as soon as it has read, at nesting level 1, any comment line whose keyword is included in this array. A keyword must be specified by all characters from the first % through the terminating : if any, which is part of the keyword per TN 5001. Default: empty.
    NestDepth integer Specifies the initial value for the nesting level. %%BeginDocument: comments increment the level, %%EndDocument comments decrement it, and an %%EndDocument comment that decrements the level to zero is an EOD condition. Default: 0 (but see Unwrap).
    Pushback string A four-byte string to be used as a pushback buffer. Should be supplied filled with zeros (the condition of a newly-allocated string) on the first call.

    When this entry is present, all forms of newline are handled transparently, there is no need to fuss with CountLF, and %%+ lines are handled automagically. On the other hand, the same pushback buffer must be passed to the next DSCDecode filter opened to resume reading from src; any other code that will resume reading from src must treat zero to three bytes stored in this buffer as logically preceding the next read from src. Things have to be kept track of one way or the other.

    The last byte contains the count cnt of pushed-back bytes. Those bytes immediately precede the count byte and should be read in increasing-index order. So the following code would convert the buffer to a string of the buffered bytes in the right order:

    dup length 1 sub 2 copy get exch 1 index sub exch getinterval
    Status array A four-element array for returning status to the caller. If the filter reaches EOD by reading a line that matches an EOD condition and this array is supplied, then the matching line is returned in this array and not passed through to the filter's reader. If the line is a DSC comment, the array's first element is the keyword and the second is the remainder of the line, less the terminating CR or LF. The third element is the entire matching line including the final CR or LF, and the fourth is the nesting level.

    When a Pushback entry is used, the third element contains the entire matching text including the complete newline sequence, whether it is a single-byte cr or lf or a two-byte crlf sequence. For a DSC comment when Pushback is used, the strings returned in this array may represent not just a single line, but the line and all immediately-following %%+ continuation lines. In this case, the third element is the exact text consumed with comment symbols and newlines intact. The second element is the logical content, with the initial DSC keyword and intermediate %%+ symbols stripped out, and the intervening newlines, whatever their original form, replaced with single lf bytes. The lf bytes can be treated as simple whitespace when the position of continuations does not matter, while making it possible to parse comments such as %%DocumentNeededResources: where the TN 5001 examples show that several resource keys can appear after one category, but category must be repeated after a continuation.

    Unwrap boolean Enables a feature useful for reading DSC-conformant input interchangeably from an external file or inlined between %%BeginDocument: and %%EndDocument in a larger file. If the first line read is a BeginDocument: comment, then the filter behaves normally except that this first line and its balancing %%EndDocument are not passed through to the filter's reader. If the first line read is anything else, the nesting level is incremented to 1 (just as if a %%BeginDocument: comment had been read), and the filter thereafter behaves normally.

    Defining new filters

    A new filter type is defined by writing a PostScript procedure and registering it in the net.anastigmatix.filter category with defineresource. When extfilter is invoked to set up an instance of the filter, it calls this procedure with the src|tgt and the parameter dictionary dict on the stack. The procedure must consume these two and return two items on the stack: an array and the name Encode or Decode to identify its filter direction.

    The first element of the array must be a procedure that will be called to handle reading or writing of the filter. If the array length is greater than one, the remaining elements are nobody's business but the filter's.

    For a decoding filter, the procedure in the array's first element will be called when data must be read. It is passed the entire array on the stack, which it may consult and modify, and must return a string. It can return a zero-length string to signal EOD.

    For an encoding filter, the procedure is called when data must be written, and is passed three items on the stack: the array, a string, and a boolean. Except for receiving the array as an additional argument, the procedure must behave exactly as described in the PLRM filter section for a procedure as data target.

    Example

    Let's implement a ROT13Decode filter. It will have just a single parameter, BufferSize, and will use a three-element array to keep its state. The first element will be the service procedure, as required. The second will be used to remember the src to read from, which for simplicity we will convert to a file (using a SubFileDecode filter) if it is a string or procedure. The array's third element will be used for a buffer that, again for simplicity, will be allocated unconditionally. The example is optimized for clarity rather than performance.

    currentglobal true setglobal
    /r13rd {
      dup 1 get exch 2 get readstring pop
      dup 0 1 2 index length 1 sub { %for   stack: buf buf i
        2 copy get                   %      stack: buf buf i c
        dup dup 16#41 ge exch 16#5A le and { 16#41 sub 13 add 26 mod 16#41 add } if
        dup dup 16#61 ge exch 16#7A le and { 16#61 sub 13 add 26 mod 16#61 add } if
        put dup
      } for % stack: buf buf
      pop
    } bind def
    setglobal
    
    /ROT13Decode { % src dict *ROT13Decode* state-array /Decode
      //r13rd
      3 1 roll /BufferSize get % stack: proc src size
      1 index type /filetype ne { exch 0 () /SubFileDecode filter exch } if
      1 index gcheck setglobal
      string % stack: proc src buf
      3 array astore /Decode          % stack: [proc src buf] /Decode
    }
    currentdict /r13rd undef
    bind /net.anastigmatix.filter defineresource pop

    Two techniques give the filter its correct memory-allocation properties. First, the reading procedure is factored out of the filter setup procedure so that it can be made unconditionally global even if the setup procedure is not. Its allocation occurs at the time the code is scanned. Second, the setup procedure, at the time it is used, checks whether the src parameter (for ROT13, the only composite parameter that will be retained) is global, and sets the allocation mode to match before proceeding to allocate space. It does not need to remember and restore the prior allocation mode, because extfilter itself looks after that.

    If the state array will be in global VM, the buffer must be also (or it can't be stored in the state array). On the other hand, if the state array is in local VM it's ok for the buffer to be too, because contents of strings aren't molested by save and restore. So it's enough to allocate the buffer in whatever arena the array is placed in.

    Here is an example of the filter in use:

    /net.anastigmatix.StreamIO /ProcSet findresource begin
    
    (How can you tell an extrovert from an
    introvert at NSA? Va gur ryringbef,
    gur rkgebiregf ybbx ng gur BGURE thl'f fubrf.)
    <</BufferSize 80>> /ROT13Decode extfilter (%stdout) copyfile

    A ROT13Encode filter would be nearly identical, except that the service procedure is called with three items on the stack rather than one, and the filter should accept and honor a CloseTarget parameter.

    Filter naming

    Anastigmatix-developed filters in the net.anastigmatix.filter resource category have simple names. To avoid naming conflicts, code from other sources that defines new filters should give them inverted-domain-style names (along the lines of com.example.ROT13Decode).

    The .decodehelper procedure

    To simplify writing decode filters a little more efficient than the example, a procedure called .decodehelper is defined in (where else?) the category implementation dictionary for net.anastigmatix.filter (that is, the dictionary obtained with /net.anastigmatix.filter /Category findresource). The procedure takes the same two stack arguments passed to a filter procedure, namely the source and the parameter dictionary, and it takes care of whether the source is a file, string, or procedure, and buffer allocation if necessary according to the BufferSize parameter in the dictionary. Like any filter procedure, it returns a state array (but it does not return the name Decode), and the first element is a service procedure. The filter being developed can remember this array in its own state array, and obtain data at read time by placing this array on the stack and calling the service procedure. A string is returned, zero length if EOD has been reached.

    To refer to the helper procedure, the category implementation dictionary should be temporarily pushed on the dictionary stack, and .decodehelper referenced with // and exec:

    /net.anastigmatix.filter /Category findresource begin
    
    /ROT13Decode {
    ... //.decodehelper exec ...
    }
    end bind /net.anastigmatix.filter defineresource pop

    Miscellany

    filterCloseBug

    Some PostScript interpreters have been known to be buggy when it comes to closing an encoding filter. The result can be incomplete output if the filter never gets the signal to flush the data still in its buffers. Certain versions of ghostscript were affected by the bug (it was their bug #688326). There may be other interpreters with similar bugs.

    The bug is easy to test for, and StreamIO tests for it when loaded. The boolean filterCloseBug in the StreamIO dictionary will be true if the interpreter is found to have such a bug. There is no fully transparent workaround. A non-transparent workaround can be added to StreamIO if there are enough buggy interpreters left to justify it. Calling code would have to take extra steps to use it. For now, only the boolean flag is there, which at least you can check if something doesn't seem to be working right. If the flag is true, it could explain a problem. If nothing prevents upgrading to a working PostScript interpreter, that's the best solution.

    filterFlushfileBug

    The boolean filterFlushfileBug in the StreamIO dictionary will be true on an interpreter where flushfile incurs an error if used on a decode-filter file object backed by a procedure. This has been seen in level 2 and level 3 versions of Hewlett-Packard's knockoff interpreter. On those interpreters a sufficient workaround is to wrap any decode filter backed by a proc in an extra trivial SubFileDecode filter. extfilter applies this workaround automatically if this flag is true.

    Valid XHTML 1.0! Valid CSS! $Id: StreamIO.html,v 1.10 2009/12/03 05:28:32 chap Exp $