PDF

Module

PDFHighlights.Internal.PDF — Module

This module contains functions that can only be applied to PDF files.

source

Constants

PDFHighlights.Internal.PDF.GET_AUTHOR_TITLE_OUTPUTS — Constant

A dictionary that is created at runtime to store the results of the get_author_title function.

source

PDFHighlights.Internal.PDF.GET_HIGHLIGHTS_COMMENTS_PAGES_OUTPUTS — Constant

A dictionary that is created at runtime to store the results of the get_highlights_comments_pages function.

source

Functions

PDFHighlights.Internal.PDF._concatenate — Method

_concatenate(
    highlights::Vector{String},
    comments::Vector{String},
    pages::Vector{Int32},
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}

Concatenate highlights based on comments; merge pages.

Arguments

highlights::Vector{String}: the highlights vector
comments::Vector{String}: the comments vector
pages::Vector{Int32}: the pages vector

Returns

Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the concatenated arguments

Example

using PDFHighlights

PDFHighlights.Internal.PDF._concatenate(
    String["Highlight 1", "High-", "light 2"],
    String["Comment 1", ".c1 Comment 2", ".c2"],
    Int32[1, 2, 3],
) ==
(
    String["Highlight 1", "Highlight 2"],
    String["Comment 1", "Comment 2"],
    Int32[1, 2],
)

source

PDFHighlights.Internal.PDF._get_authors_from_PDF — Method

_get_authors_from_PDF(dir::String) -> Vector{String}

Extract all authors from all PDFs found recursively in the passed directory.

Arguments

dir::String: a directory with PDF files

Returns

Vector{String}: the authors

Throws

Exceptions from: get_authors_titles

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

PDFHighlights.Internal.PDF._get_authors_from_PDF(path_to_pdf_dir) == ["Pavel Sobolev"]

source

PDFHighlights.Internal.PDF._get_highlights_from_PDF — Method

_get_highlights_from_PDF(target::String; concatenate::Bool=true) -> Vector{String}

Extract all highlights from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=true: if true, concatenate the highlights

Returns

Vector{String}: the highlights

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf_dir) ==
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf) ==
String[
    "Highlight 1",
    "Highlight 2 Highlight 3",
    "Highlight 4",
    "Highhighlight 5",
    "6th Highhigh light-",
    "High light 7",
    "8th Highlight-",
]

source

PDFHighlights.Internal.PDF._get_titles_from_PDF — Method

_get_titles_from_PDF(target::String) -> Vector{String}

Extract all titles from all PDFs found recursively in the passed directory.

Arguments

dir::String: a directory with PDF files

Returns

Vector{String}: the titles

Throws

Exceptions from: get_authors_titles

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

PDFHighlights.Internal.PDF._get_titles_from_PDF(path_to_pdf_dir) ==
["A dummy PDF for tests"]

source

PDFHighlights.Internal.PDF._sort! — Method

_sort!(
    lines::Vector{String},
    lines_x_anchors::Vector{Float64},
    lines_yl_anchors::Vector{Float64},
    lines_yu_anchors::Vector{Float64},
) -> Vector{String}

Sort the lines vector using the arrays of anchors. Basic principle: if rectangles cross by ordinate, sort them by abscissa.

Arguments

lines::Vector{String}: the lines vector of the highlight (text found in the rectangles of the highlight)
lines_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the highlight rectangles
lines_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the highlight rectangles
lines_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the highlight rectangles

Returns

Vector{String}: the sorted lines vector of the highlight

Example

using PDFHighlights

lines = ["High", "high", "light"]
quad_x_anchors = [0.21, 0.15, 0.17]
quad_yl_anchors = [0.10, 0.10, 0.10]
quad_yu_anchors = [0.15, 0.12, 0.15]

PDFHighlights.Internal.PDF._sort!(
    lines,
    quad_x_anchors,
    quad_yl_anchors,
    quad_yu_anchors,
) == ["high", "light", "High"]

source

PDFHighlights.Internal.PDF._sort! — Method

_sort!(
    highlights::Vector{String},
    comments::Vector{String},
    pages::Vector{Int32},
    highlights_x_anchors::Vector{Float64},
    highlights_yl_anchors::Vector{Float64},
    highlights_yu_anchors::Vector{Float64},
) -> Tuple{Vector{String}, Vector{String}}

Sort the highlights and comments vectors using the arrays of anchors and pages. Basic principle: if highlights cross by ordinate, sort them by abscissa.

Arguments

highlights::Vector{String}: the highlights
comments::Vector{String}: the comments
pages::Vector{Int32}: the pages
highlights_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the upper-left rectangle of the highlight
highlights_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the lower-left rectangle of the highlight
highlights_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the upper-left rectangle of the highlight

Returns

Tuple{Vector{String}, Vector{String}}: the sorted vectors of the highlights and comments

Example

using PDFHighlights

highlights = ["High", "high", "light"]
comments = ["Com", "com", "ment"]
pages = Int32[1, 1, 1]
highlights_x_anchors = [0.21, 0.15, 0.17]
highlights_yl_anchors = [0.10, 0.10, 0.10]
highlights_yu_anchors = [0.12, 0.15, 0.15]

PDFHighlights.Internal.PDF._sort!(
    highlights,
    comments,
    pages,
    highlights_x_anchors,
    highlights_yl_anchors,
    highlights_yu_anchors,
) ==
(
    ["high", "light", "High"],
    ["com", "ment", "Com"],
)

source

PDFHighlights.Internal.PDF.get_author — Method

get_author(pdf::String) -> String

Extract the author from the PDF.

Arguments

pdf::String: absolute or relative path to the PDF file

Returns

String: the author

Throws

Exceptions from: get_author_title

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf"
)

get_author(path_to_pdf) == "Pavel Sobolev"

source

PDFHighlights.Internal.PDF.get_author_title — Method

get_author_title(pdf::String) -> Tuple{String, String}

Extract the author and title from the PDF.

Arguments

pdf::String: absolute or relative path to the PDF file

Returns

Tuple{String, String}: the author and title

Throws

FileDoesNotExist: the specified file doesn't exist
NotPDF: the specified path does not end in .pdf

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf",
)

get_author_title(path_to_pdf) == ("Pavel Sobolev", "A dummy PDF for tests")

source

PDFHighlights.Internal.PDF.get_authors_titles — Method

get_authors_titles(dir::String) -> Tuple{Vector{String}, Vector{String}}

Extract the authors and titles from all PDFs found recursively in the passed directory.

Arguments

dir::String: a directory with PDF files

Returns

Tuple{Vector{String}, Vector{String}}: the authors and titles

Throws

DirectoryDoesNotExist: the specified directory doesn't exist
Exceptions from: get_author_title

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

get_authors_titles(path_to_pdf_dir) == (["Pavel Sobolev"], ["A dummy PDF for tests"])

source

PDFHighlights.Internal.PDF.get_comments — Method

get_comments(target::String; concatenate::Bool=false) -> Vector{String}

Extract the comments from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=false: if true, concatenate the highlights

Returns

Vector{String}: the comments

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_comments(path_to_pdf_dir; concatenate=true) ==
get_comments(path_to_pdf; concatenate=true) ==
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""]

source

PDFHighlights.Internal.PDF.get_comments_pages — Method

get_comments_pages(
    target::String;
    concatenate::Bool=false,
) -> Tuple{Vector{String}, Vector{Int32}}

Extract the comments and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=false: if true, concatenate the highlights

Returns

Tuple{Vector{String}, Vector{Int32}}: the comments and pages

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_comments_pages(path_to_pdf_dir; concatenate=true) ==
get_comments_pages(path_to_pdf; concatenate=true) ==
(
    String["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
    Int32[1, 2, 4, 6, 7, 8, 9],
)

source

PDFHighlights.Internal.PDF.get_highlights_comments — Method

get_highlights_comments(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}}

Extract the highlights and comments from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=true: if true, concatenate the highlights

Returns

Tuple{Vector{String}, Vector{String}}: the highlights and comments

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_comments(path_to_pdf_dir) ==
get_highlights_comments(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    ["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
)

source

PDFHighlights.Internal.PDF.get_highlights_comments_pages — Method

get_highlights_comments_pages(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}

Extract the highlights, comments, and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=true: if true, concatenate the highlights

Returns

Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the highlights, comments, and pages

Throws

DoesNotExist: the specified file or directory doesn't exist
NotPDF: the specified path does not end in .pdf

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_comments_pages(path_to_pdf_dir) ==
get_highlights_comments_pages(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    ["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
    Int32[1, 2, 4, 6, 7, 8, 9],
)

source

PDFHighlights.Internal.PDF.get_highlights_pages — Method

get_highlights_pages(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{Int32}}

Extract the highlights and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=true: if true, concatenate the highlights

Returns

Tuple{Vector{String}, Vector{Int32}}: the highlights and pages

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_pages(path_to_pdf_dir) ==
get_highlights_pages(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    Int32[1, 2, 4, 6, 7, 8, 9],
)

source

PDFHighlights.Internal.PDF.get_pages — Method

get_pages(
    target::String;
    concatenate::Bool=false
) -> Vector{Int32}

Extract the pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

target::String: a PDF file or a directory with PDF files

Keywords

concatenate::Bool=false: if true, concatenate the highlights

Returns

Vector{Int32}: the pages

Throws

Exceptions from: get_highlights_comments_pages

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_pages(path_to_pdf_dir) == get_pages(path_to_pdf) == Int32[1, 2, 3, 4, 5, 6, 7, 8, 9]

source

PDFHighlights.Internal.PDF.get_title — Method

get_title(pdf::String) -> String

Extract the title from the PDF.

Arguments

pdf::String: absolute or relative path to the PDF file

Returns

String: the title

Throws

Exceptions from: get_author_title

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf",
)

get_title(path_to_pdf) == "A dummy PDF for tests"

source

Macros

PDFHighlights.Internal.PDF.@unsafe_wrap — Macro

@unsafe_wrap(array::Expr, len::Union{Symbol, Expr}) -> Expr

Wrap a Julia Array object around the data at the address given by array pointer with length equal to len.

Arguments

array::Expr: expression that will yield a pointer to the array data
len::Union{Symbol, Expr}: name of the variable which holds the length of this array, or an expression that will yield it

Returns

Expr: a wrapping expression

Example

using PDFHighlights

_array = :(array[index])
_len = :len

@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array[index], len)) ==
:(unsafe_wrap(Array, $(_array), $(_len); own = true))