File-like objects are near-ubiquitous in Python. They provide a clear and easy-to-use interface for reading from or writing to any number of different types of data store. At the basic level, they are beautifully simple - for an object to be considered 'file-like' it only requires a read() method for reading data, and/or a write() method for writing data.
However, the full file-like interface is considerably more complex. On top of read(), there are methods to read a full line and to iterate over the lines in the file. The write() method can be accompanied by writelines(). It's nothing a little buffering and boilerplate wont solve, but who wants to do all that just to be considered "file-like"? And we haven't even started to consider support for seek() and tell()...
That's where filelike comes in.
filelike is designed to make it easy to support the entire rich file-like interface while writing as little code as possible. It provides convenient facilities for checking whether objects support the file-like interface, and for wrapping a variety of common objects in such an interface. And it provides a bunch of nifty classes for manipulating file-like objects: concatenating them together, accessing slices of them, encrypting, compressing and transforming them.
This software is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. See the file LICENSE.txt in the source distribution for details. If the LGPL is a showstopper for you, please let me know and I'm sure we can work something out.
The following is a quick run-through of the current features of the filelike module. For a more complete overview, the pydoc-generated API documentation is also available.
The main class in the filelike module is FileLikeBase. This is an abstract base class that implements the entire rich file-like interface on top of four primitive methods: _read(), _write(), _seek() and _tell(). Subclasses can implement any or all of these methods to immediately obtain the related higher-level file behaviors. For more details, see the section on writing subclasses below.
The module also provides two handy functions when expecting file-like objects. is_filelike() checks whether a given object implements the rich file-like interface, while to_filelike() attempts to wrap an object in an appropriate file-like interface.
Using these building blocks, it is straightforward to create a variety of useful classes for manipulating file-like objects. The base filelike module provides two such functions: join concatenates several file-like objects together, while slice accesses a fixed portion of a file-like object:
>>> from StringIO import StringIO as SIO >>> f = filelike.join([SIO("hello"),SIO(" "),SIO("world")]) >>> f.read() 'hello world' >>> f = filelike.slice(SIO("hello there world"),6,11) >>> f.read() 'there' >>>
As long as the underlying file objects support it, these joins and slices can be read from, written to, seeked around and generally treated as if they were a standard file object.
Several more sophisticated "wrapper" classes have been implemented in the filelike.wrappers module, including:
- Translate: pass file contents through a translation function (think: compression, encryption, ...)
- Decrypt: on-the-fly reading and writing of an encrypted file (using the PEP 272 cipher API)
Here's a short example using the Decrypt class:
>>> # Create the decryption key >>> from Crypto.Cipher import DES >>> cipher = DES.new('abcdefgh',DES.MODE_ECB) >>> # Open the encrypted file >>> from filelike.wrappers import DecryptFile >>> f = DecryptFile(file("some_encrypted_file.bin","r"),cipher)
The object in f now behaves as a file-like object, transparently decrypting the file on-the-fly as it is read.
The module filelike.pipeline uses a bit of operator abuse to allow these wrappers to be composed in the style of a unix pipeline. Suppose that one wants to read only the first five lines of a decrypted file. The object f constructed below will do so:
>>> # Create the decryption key >>> from Crypto.Cipher import DES >>> cipher = DES.new('abcdefgh',DES.MODE_ECB) >>> # Open the encrypted file >>> from filelike.pipeline import DecryptFile, Head >>> f = file("some_encrypted_file.bin","r") > DecryptFile(cipher) | Head(lines=5)
Finally, the function filelike.open() acts much the same as the standard open() function, but does some clever things such as fetching URLs and decompressing on the fly. It's trivial to read from a remote, compressed file like so:
>>> f = filelike.open("http://www.rfk.id.au/static/test.txt.bz2")
As usual, the best way to demonstrate is by example. Here is a cheap imitation of StringIO using the facilities provided by FileLikeBase:
import filelike class MyStringIO(filelike.FileLikeBase): def __init__(self,string): super(MyStringIO,self).__init__() self._string = string self._pos = 0 def _read(self,sizehint=-1): if self._pos >= len(self._string): return None if sizehint < 0: newPos = len(self._string) else: newPos = min(self._pos + sizehint,len(self._string)) data = self._string[self._pos:newPos] self._pos = newPos return data def _write(self,data,flushing=False): newPos = self._pos + len(data) newString = self._string[:self._pos] newString = newString + data newString = newString + self._string[newPos:] self._string = newString self._pos = newPos _seek(self,offset,whence): if whence > 0: raise NotImplementedError self._pos = offset return None def _tell(self): return self._pos
The _read() method behaves like the normal read() method of the file object, but is not required to make any guarantees about the number of bytes it will return. The calling code may request a certain number of bytes, but this is only a hint - more or less bytes may actually be returned, and FileLikeBase takes care of the necessary buffering to ensure that the correct number of bytes is passed back to the calling code. To properly support this buffering it must return None instead of "" to indicate EOF.
The _write() method is similarly flexible in that it need not write all the data straight away. Any data that cannot be written should be returned, and it will be buffered by FileLikeBase for later writing. Since it must be possible to flush the write buffers, _write() must accept a keyword argument 'flushing' to indicate when this is occurring. If 'flushing' is set to True, all of the given data must be written.
The _seek() method should set the internal position of the file to approximately the specified position. If it cannot be set to exactly the position specified, it can be set to some smaller position. In this case _seek() must return the current data in the file between the requested and final position, which will be used at the next read or write to account for this discrepency. FileLikeBase knows how to simulate other seek modes (whence > 0) in terms of an absolute seek, and will do so if NotImplementedError is raised.
Finally, the _tell() method should return the current position of the internal file pointer.