Package Guide
The design of this package serves one purpose: to make exporting highlights from PDF files to a CSV file as simple as calling a single function. The format of the resulting file corresponds to the requirements defined by the Readwise service for the bulk import of CSV files. It makes it possible not only to extract and store highlights but also to benefit from them using spaced repetition.
Installation
The package is available in the General registry, so the installation is not different from the standard procedure: from the Julia REPL, type ]
to enter the Pkg REPL mode and run:
pkg> add PDFHighlights
Importing highlights
You can import highlights from a PDF file or a directory containing PDF files using the import_highlights
function:
using PDFHighlights
import_highlights("highlights.csv", pdf)
This function prints to standard output. For example, for the PDF used for tests in this package, the output will be as follows:
CSV: "highlights.csv" PDF: "TestPDF.pdf" Highlights (found / added): 7 / 7
Every highlight and associated metadata get represented by a row in a CSV file. These rows are generated by this function and discarded if identical rows already exist in the target file. Therefore, the reinvocation of this function gives the following output:
CSV: "highlights.csv" PDF: "TestPDF.pdf" Highlights (found / added): 7 / 0
For this reason, the function name has a verb import
: it allows you to update existing CSV files (obtained by this package, presumably) with new highlights. Third-party CSV files may be supported if they match the format.
An empty CSV file with a correct header can be created using the initialize
function:
initialize("highlights.csv")
Retrieving pieces
For more crafty workflows, you can use the remaining functions from the public interface. They allow you to retrieve pieces of data related to highlights. Confusion of terminology may be here, as CSV files require slightly different field names. Here is a table showing what can you can extract from each file type:
CSV | Highlights | Titles | Authors | Notes | Locations |
---|---|---|---|---|---|
Highlights | Titles | Authors | Comments | Pages |
Each peace has its own function. For example, you can get a PDF title like this:
get_title(pdf)
"A dummy PDF for tests"
This and some other functions have recursive analogs:
get_titles(dir)
1-element Array{String,1}: "A dummy PDF for tests"
There are also functions returning multiple pieces at once. For example, to get the author and the title:
get_author_title(pdf)
("Pavel Sobolev", "A dummy PDF for tests")
See the full list of functions in the Extracting data section in the public interface description.