Review of project "SortedContainers"

  • An open source project by Grant Jenks.

    Link to source: https://github.com/grantjenks/sorted_containers
    Link to documentation: http://www.grantjenks.com/docs/sortedcontainers/

    MetricScoreRationale
    DocumentationNone
    Code QualityNone
    Ease of UseNone
    Ease of ContributionNone
    Project InfrastructureNone
    Overall (not an average)None

    Review

    Review date: None

    Project

    "SortedContainers is an Apache2 licensed containers library, written in pure-Python, and fast as C-extensions." That last part, "fast as C-extensions," was difficult to believe. I would need some sort of performance comparison to be convinced this is true. The author includes this in the docs. It is.

    SortedContainers provides three new data structures, all of which are always in sorted order: SortedList, SortedDict, and SortedSet. In SortedDict, the ordering is defined by the relative ordering of the keys. Most of the time, you can get away with Python's built-in data structures. Sometimes, however, you really need to use a sorted data structure. The author lists a number of competing packages and asserts that SortedContainers makes the best compromise between speed and requirements: usually as fast as C-based counterparts and written in pure Python, so no C compiler necessary.

    After using the package, I have to agree with the author. SortedContainers data structures are just really fast. Why this is the case is especially interesting, but I won't ruin the fun. You can read about the implementation details here.

    The package is listed as being compatible with versions 2.6, 2.7, 3.2, 3.3, and 3.4. It does not, however, contain a tox.ini file, so testing those versions is a time-consuming process. The absence of a tox.ini file is thankfully one of the few weaknesses I found.

    Code

    This is well-written code. Everything that can be documented, is (and well). Code is PEP-8 compliant and generally well laid out. While SortedDict.__setitem__ clocks in at almost 100 lines, almost all other functions are short and readable. There are some naming conventions I would have avoided (idx, oldval, newval, etc), but that's pretty nit-picky (though PEP-8 does back me up here).

    Generally, the code is a joy to read. It's well thought out, arranged well, and documented the appropriate amount. Reading the source would be a great exercise for those new to the language, as a number of slightly advanced algorithms are implemented in clear, easy-to-read (dare I say "idiomatic"?) Python.

    I ran pylint with the vanilla options and the score was a 8.83/10. Many of the issues pylint picked up were related to classes accessing the internals of nested classes, missing docstrings (of small-wrapper functions), and superfluous parenthesis. Decidedly minor, all of them.

    Test coverage was fantastic at 96%, but not quite the 100% advertised by the author (unless hitting 100% requires running different versions of the interpreter, certainly a possibility). In general anything in the 90s puts you ahead of about 95% of the packages out there.

    The tests themselves are also well-written. Like the code, it is thoughtfully done and easy to read.

    Project Infrastructure

    Merely average, I'm afraid. There is no continuous integration to speak of (this is a minimum requirement, in my opinion), nor any automatic reporting of test coverage. The omission of tox is especially noticeable due to the author's claim regarding the versions of Python it works with. Overall, this is probably the area of the project that could use the most work.

    Documentation

    The documentation is thorough and well-written. As far as I can tell, every class and function was fully documented in a style akin to the official Python documentation for the associated data structure.

    I say, "as far as I can tell" because the documentation is not auto-generated. Rather, separate .rst files are written from scratch. While giving the author greater control over the documentation, this is a practice that always makes me a little uneasy. You just have to trust that the documentation in the code isn't out of date (or, worse, the main documentation wasn't updated with the code).

    Auto-documentation using Sphinx has few downsides for a project like this. As an added benefit, we would get more information (return value, argument types) out of the current documentation if it were written in this way. All in all, though, not a huge deal (after all, cPython is documented in this fashion).

    Summary

    SortedContainers is a solid, production-ready project with great documentation and tests. A bit of missing project infrastructure is all that the project needs to fix to be truly considered a "perfect" project.

    Recommendations

    Use tox, like, yesterday! If you say the project is compatible with different versions of the interpreter, make it easy for me to prove that to myself. Also, use a continuous integration service; it's free and gives prospective users much more confidence that the project is active and not broken. Finally, make it easier for the user to run the tests by augmenting setup.py to run them when python setup.py test is invoked. Otherwise, it's unclear what the user is supposed to do to run the tests.

comments powered by Disqus