Additional Tools

Citations and Bibliography

Using Zotero’s Firefox plugin and Zotero Better Bibtex for;

  • automated .bib file updating
  • drag and drop cite keys \cite{kirkeminde_thermodynamic_2012}
  • latexmk -bibtex -pdf (in nbpublish.py) handles creation of the bibliography
  • \usepackage{doi} turns the DOI numbers into url links
    • in Zotero-Better-Bibtex you have the option set to only export DOI, if both DOI and URL are present.

Can use:

<cite data-cite="kirkeminde_thermodynamic_2012">(Kirkeminde, 2012)</cite>

to make it look better in html, but not specifically available for drag and drop in Zotero.

Live Slideshows

The Reveal.js - Jupyter/IPython Slideshow Extension (RISE) notebook extension offers rendering as a Reveal.js-based slideshow, where you can execute code or show to the audience whatever you can show/do inside the notebook itself! Click on the image to see a demo:

RISE Demo

Working With External Data

A goal for scientific publishing is automated reproducibility of analyses, which the Jupyter notebook excels at. But, more than that, it should be possible to efficiently reproduce the analysis with different data sets. This entails having one point of access to a data set within the notebook, rather than having copy-pasted data into variables, i.e. this:

data = read_in_data('data_key')
variable1 = data.key1
variable2 = data.key2
...

rather than this:

variable1 = 12345
variable2 = 'something'
...

The best-practice for accessing heirarchical data (in my opinion) is to use the JSON format (as long as the data isn’t relational), because it is:

  • applicable for any data structure
  • lightweight and easy to read and edit
  • has a simple read/write mapping to python objects (using json)
  • widely used (especially in web technologies)

A good way to store multiple bits of JSON data is in a mongoDB and accessing it via pymongo. This will also make it easy to move all the data to a cloud server at a later time, if required.

conda install pymongo

But, if the data is coming from files output from different simulation or experimental code, where the user has no control of the output format. Then writing JSON parsers may be the way to go, and this is where jsonextended comes in, which implements:

  • a lightweight plugin system to define bespoke classes for parsing different file extensions and data types.
  • a ‘lazy loader’ for treating an entire directory structure as a nested dictionary.

For example:

from jsonextended import plugins, edict
plugins.load_plugins_dir('path/to/folder_of_parsers','parsers')
data = edict.LazyLoad('path/to/data')
variable1 = data.folder1.file1_json.key1
variable2 = data[['folder1','file1.json','key2']]
variable3 = data[['folder1','file2.csv','key1']]
variable4 = data[['folder2','subfolder1','file3.other','key1']]
...

If you are dealing with numerical data arrays which are to large to be loaded directly in to memory, then the h5py interface to the HDF5 binary data format, allows for the manipultion of even multi-terabyte datasets stored on disk, as if they were real NumPy arrays. These files are also supported by jsonextended lazy loading.

Miscellaneous

I also use the Firefox Split Pannel extension to view the {name}_viewpdf.html page and monitor changes to the pdf.

bookbook is another package with some conversion capabilities.