Changeset - 18eebbc0ed28
[Not reviewed]
0 4 0
Brett Smith - 6 years ago 2017-12-19 15:19:56
brettcsmith@brettcsmith.org
hooks: run() return value controls processing of entry data.

Instead of using in-band signaling with the entry_data dict.
I don't know why I didn't think of this in the first place.
4 files changed with 23 insertions and 11 deletions:
0 comments (0 inline, 0 general)
CODE.rst
Show inline comments
...
 
@@ -6,93 +6,104 @@ Concepts
 

	
 
The main workflow of the program passes through three different types with different responsibilities.
 

	
 
Entry data
 
~~~~~~~~~~
 

	
 
Data for an output entry is kept and passed around in a dict with the following contents:
 

	
 
``date``
 
  A datetime.date object (if this is omitted, the ``default_date`` hook will fill in the default date from the user's configuration)
 

	
 
``payee``
 
  A string
 

	
 
``amount``
 
  A string or other object that can be safely converted to a decimal.Decimal
 

	
 
``currency``
 
  A string with a three-letter code, uppercase, identifying the transaction currency
 

	
 
It can optionally include additional keys for use as template variables.
 

	
 
Importers
 
~~~~~~~~~
 

	
 
At a high level, importers read a source file, and generate data for output entries.
 

	
 
Class method ``can_handle(source_file)``
 
  Returns true if the importer can generate entries from the given source file object, false otherwise.
 

	
 
``__init__(source_file)``
 
  Initializes an importer to generate entries from the given source file object.
 

	
 
``__iter__()``
 
  Returns a iterator of entry data dicts.
 

	
 
Class attribute ``TEMPLATE_KEY``
 
  A string with the full key to load the corresponding template from the user's configuration (e.g., ``'template patreon income'``).
 

	
 
Hooks
 
~~~~~
 

	
 
Hooks make arbitrary transformations to entry data dicts.  Every entry data dict generated by an importer is run through every hook before being output.
 

	
 
``__init__(config)``
 
  Initializes the hook with the user's configuration.
 

	
 
``run(entry_data)``
 
  This method makes the hook's transformations to the entry data dict, if any.  If this method sets ``entry_data['_hook_cancel']`` to a truthy value, that entry will not be output.
 
  This method can make arbitrary transformations to the entry data, or filter it so it isn't output.
 

	
 
  If this method returns ``None``, processing the entry data continues normally.  Most hooks should do this, and just transform entry data in place.
 

	
 
  If this method returns ``False``, processing the entry data stops immediately.  The entry will not appear in the program output.
 

	
 
  If this method returns any other value, the program replaces the entry data with the return value, and continues processing.
 

	
 
Templates
 
~~~~~~~~~
 

	
 
Templates receive entry data dicts and format them into final output entries.
 

	
 
``__init__(template_str)``
 
  Initializes the template from a single string, as read from the user's configuration.
 

	
 
``render(entry_data)``
 
  Returns a string with the output entry, using the given entry data.
 

	
 
Loading importers and hooks
 
---------------------------
 

	
 
Importers and hooks are both loaded and found dynamically when the program starts.  This makes it easy to extend the program: you just need to write the class following the established pattern, no registration needed.
 

	
 
import2ledger finds importers by looking at all ``.py`` files in the ``importers/`` directory, skipping files whose names start with ``.`` (hidden) or ``_`` (private).  It tries to import that file as a module.  If it succeeds, it looks for things in the module named ``*Importer``, and adds those to the list of importers.
 

	
 
Hooks follow the same pattern, searching the ``hooks/`` directory and looking for things named ``*Hook``.
 

	
 
Technically this is done by ``importers.load_all()`` and ``hooks.load_all()`` functions, but most of the code to do this is in the ``dynload`` module.
 

	
 
Main loop
 
---------
 

	
 
At a high level, import2ledger handles each input file this way::
 

	
 
  usable_importers = importers where can_handle(input_file) returns true
 
  for importer_class in usable_importers:
 
    template = built from importer_class.TEMPLATE_KEY
 
    input_file.seek(0)
 
    for entry_data in importer_class(input_file):
 
      for hook in all_hooks:
 
        hook.run(entry_data)
 
      if entry_data:
 
        template.render(entry_data)
 
        hook_return = hook.run(entry_data)
 
        if hook_return is False:
 
          break
 
        elif hook_return is not None:
 
          entry_data = hook_return
 
      else:
 
        if entry_data:
 
          template.render(entry_data)
 

	
 
Note in particular that multiple importers can handle the same input file.  This helps support inputs like Patreon's earnings CSV, where completely different transactions are generated from the same source.
 

	
 
Running tests
 
-------------
 

	
 
Run ``./setup.py test`` from the source directory.
import2ledger/__main__.py
Show inline comments
...
 
@@ -4,103 +4,105 @@ import logging
 
import sys
 

	
 
from . import config, errors, hooks, importers
 

	
 
logger = logging.getLogger('import2ledger')
 

	
 
class FileImporter:
 
    def __init__(self, config, stdout):
 
        self.config = config
 
        self.importers = list(importers.load_all())
 
        self.hooks = [hook(config) for hook in hooks.load_all()]
 
        self.stdout = stdout
 

	
 
    def import_file(self, in_file, in_path=None):
 
        if in_path is None:
 
            in_path = pathlib.Path(in_file.name)
 
        importers = []
 
        for importer in self.importers:
 
            in_file.seek(0)
 
            if importer.can_import(in_file):
 
                try:
 
                    template = self.config.get_template(importer.TEMPLATE_KEY)
 
                except errors.UserInputConfigurationError as error:
 
                    if error.strerror.startswith('template not defined '):
 
                        have_template = False
 
                    else:
 
                        raise
 
                else:
 
                    have_template = not template.is_empty()
 
                if have_template:
 
                    importers.append((importer, template))
 
        if not importers:
 
            raise errors.UserInputFileError("no importers available", in_file.name)
 
        source_vars = {
 
            'source_abspath': in_path.absolute().as_posix(),
 
            'source_name': in_path.name,
 
            'source_path': in_path.as_posix(),
 
        }
 
        with contextlib.ExitStack() as exit_stack:
 
            output_path = self.config.get_output_path()
 
            if output_path is None:
 
                out_file = self.stdout
 
            else:
 
                out_file = exit_stack.enter_context(output_path.open('a'))
 
            for importer, template in importers:
 
                default_date = self.config.get_default_date()
 
                in_file.seek(0)
 
                for entry_data in importer(in_file):
 
                    entry_data['_hook_cancel'] = False
 
                    for hook in self.hooks:
 
                        hook.run(entry_data)
 
                        if entry_data['_hook_cancel']:
 
                        hook_retval = hook.run(entry_data)
 
                        if hook_retval is None:
 
                            pass
 
                        elif hook_retval is False:
 
                            break
 
                        else:
 
                            entry_data = hook_retval
 
                    else:
 
                        del entry_data['_hook_cancel']
 
                        render_vars = collections.ChainMap(entry_data, source_vars)
 
                        print(template.render(render_vars), file=out_file, end='')
 

	
 
    def import_path(self, in_path):
 
        if in_path is None:
 
            raise errors.UserInputFileError("only seekable files are supported", '<stdin>')
 
        with in_path.open(errors='replace') as in_file:
 
            if not in_file.seekable():
 
                raise errors.UserInputFileError("only seekable files are supported", in_path)
 
            return self.import_file(in_file, in_path)
 

	
 
    def import_paths(self, path_seq):
 
        for in_path in path_seq:
 
            try:
 
                retval = self.import_path(in_path)
 
            except (OSError, errors.UserInputError) as error:
 
                yield in_path, error
 
            else:
 
                yield in_path, retval
 

	
 

	
 
def setup_logger(logger, main_config, stream):
 
    formatter = logging.Formatter('%(name)s: %(levelname)s: %(message)s')
 
    handler = logging.StreamHandler(stream)
 
    handler.setFormatter(formatter)
 
    logger.addHandler(handler)
 

	
 
def main(arglist=None, stdout=sys.stdout, stderr=sys.stderr):
 
    try:
 
        my_config = config.Configuration(arglist)
 
    except errors.UserInputError as error:
 
        my_config.error("{}: {!r}".format(error.strerror, error.user_input))
 
        return 3
 
    setup_logger(logger, my_config, stderr)
 
    importer = FileImporter(my_config, stdout)
 
    failures = 0
 
    for input_path, error in importer.import_paths(my_config.args.input_paths):
 
        if error is None:
 
            logger.info("%s: imported", input_path)
 
        else:
 
            logger.warning("%s: failed to import: %s", input_path or error.path, error)
 
            failures += 1
 
    if failures == 0:
 
        return 0
 
    else:
 
        return min(10 + failures, 99)
 

	
 
if __name__ == '__main__':
import2ledger/hooks/filter_by_date.py
Show inline comments
 
class FilterByDateHook:
 
    def __init__(self, config):
 
        self.config = config
 

	
 
    def run(self, entry_data):
 
        try:
 
            date = entry_data['date']
 
        except KeyError:
 
            pass
 
        else:
 
            if not self.config.date_in_want_range(date):
 
                entry_data['_hook_cancel'] = True
 
                return False
tests/test_hooks.py
Show inline comments
...
 
@@ -12,89 +12,88 @@ def test_load_all():
 
    assert add_entity.AddEntityHook in all_hooks
 

	
 
@pytest.mark.parametrize('payee,expected', [
 
    ('Alex Smith', 'Smith-Alex'),
 
    ('Dakota D.  Doe', 'Doe-Dakota-D'),
 
    ('Björk', 'Bjork'),
 
    ('Fran Doe-Smith', 'Doe-Smith-Fran'),
 
    ('Alex(Nickname) Smith', 'Smith-Alex'),
 
    ('稲荷', '稲荷'),
 
    ('Pøweł', 'Powel'),
 
    ('Elyse Jan Smith', 'Smith-Elyse-Jan'),
 
    ('Jan van Smith', 'van-Smith-Jan'),
 
    ('Francis da Silva', 'da-Silva-Francis'),
 
])
 
def test_add_entity(payee, expected):
 
    data = {'payee': payee}
 
    hook = add_entity.AddEntityHook(argparse.Namespace())
 
    hook.run(data)
 
    assert data['entity'] == expected
 

	
 

	
 
class DateRangeConfig:
 
    def __init__(self, start_date=None, end_date=None):
 
        self.start_date = start_date
 
        self.end_date = end_date
 

	
 
    def date_in_want_range(self, date):
 
        return (
 
            ((self.start_date is None) or (date >= self.start_date))
 
            and ((self.end_date is None) or (date <= self.end_date))
 
        )
 

	
 

	
 
@pytest.mark.parametrize('entry_date,start_date,end_date,allowed', [
 
    (datetime.date(2016, 5, 10), datetime.date(2016, 1, 1), datetime.date(2016, 12, 31), True),
 
    (datetime.date(2016, 1, 1), datetime.date(2016, 1, 1), datetime.date(2016, 12, 31), True),
 
    (datetime.date(2016, 12, 31), datetime.date(2016, 1, 1), datetime.date(2016, 12, 31), True),
 
    (datetime.date(2016, 1, 1), datetime.date(2016, 1, 1), None, True),
 
    (datetime.date(2016, 12, 31), None, datetime.date(2016, 12, 31), True),
 
    (datetime.date(1999, 1, 2), None, None, True),
 
    (datetime.date(2016, 1, 25), datetime.date(2016, 2, 1), datetime.date(2016, 12, 31), False),
 
    (datetime.date(2016, 12, 26), datetime.date(2016, 1, 1), datetime.date(2016, 11, 30), False),
 
    (datetime.date(2016, 1, 31), datetime.date(2016, 2, 1), None, False),
 
    (datetime.date(2016, 12, 1), None, datetime.date(2016, 11, 30), False),
 
])
 
def test_filter_by_date(entry_date, start_date, end_date, allowed):
 
    entry_data = {'date': entry_date}
 
    hook = filter_by_date.FilterByDateHook(DateRangeConfig(start_date, end_date))
 
    hook.run(entry_data)
 
    assert entry_data.get('_hook_cancel', False) == (not allowed)
 
    assert hook.run(entry_data) is (None if allowed else False)
 

	
 
class DefaultDateConfig:
 
    ONE_DAY = datetime.timedelta(days=1)
 

	
 
    def __init__(self, start_date=None):
 
        if start_date is None:
 
            start_date = datetime.date(2016, 3, 5)
 
        self.date = start_date - self.ONE_DAY
 

	
 
    def get_default_date(self, section_name=None):
 
        self.date += self.ONE_DAY
 
        return self.date
 

	
 

	
 
class TestDefaultDate:
 
    def test_simple_case(self):
 
        expect_date = datetime.date(2016, 2, 4)
 
        config = DefaultDateConfig(expect_date)
 
        data = {}
 
        hook = default_date.DefaultDateHook(config)
 
        hook.run(data)
 
        assert data['date'] == expect_date
 

	
 
    def test_no_caching(self):
 
        config = DefaultDateConfig()
 
        hook = default_date.DefaultDateHook(config)
 
        d1 = {}
 
        d2 = {}
 
        hook.run(d1)
 
        hook.run(d2)
 
        assert d1['date'] != d2['date']
 

	
 
    def test_no_override(self):
 
        expect_date = datetime.date(2016, 2, 6)
 
        config = DefaultDateConfig(expect_date + datetime.timedelta(days=300))
 
        hook = default_date.DefaultDateHook(config)
 
        data = {'date': expect_date}
 
        hook.run(data)
 
        assert data['date'] is expect_date
0 comments (0 inline, 0 general)