PDF

Module

Constants

Functions

PDFHighlights.Internal.PDF._concatenateMethod
_concatenate(
    highlights::Vector{String},
    comments::Vector{String},
    pages::Vector{Int32},
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}

Concatenate highlights based on comments; merge pages.

Arguments

  • highlights::Vector{String}: the highlights vector
  • comments::Vector{String}: the comments vector
  • pages::Vector{Int32}: the pages vector

Returns

  • Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the concatenated arguments

Example

using PDFHighlights

PDFHighlights.Internal.PDF._concatenate(
    String["Highlight 1", "High-", "light 2"],
    String["Comment 1", ".c1 Comment 2", ".c2"],
    Int32[1, 2, 3],
) ==
(
    String["Highlight 1", "Highlight 2"],
    String["Comment 1", "Comment 2"],
    Int32[1, 2],
)
source
PDFHighlights.Internal.PDF._get_authors_from_PDFMethod
_get_authors_from_PDF(dir::String) -> Vector{String}

Extract all authors from all PDFs found recursively in the passed directory.

Arguments

  • dir::String: a directory with PDF files

Returns

  • Vector{String}: the authors

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

PDFHighlights.Internal.PDF._get_authors_from_PDF(path_to_pdf_dir) == ["Pavel Sobolev"]
source
PDFHighlights.Internal.PDF._get_highlights_from_PDFMethod
_get_highlights_from_PDF(target::String; concatenate::Bool=true) -> Vector{String}

Extract all highlights from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=true: if true, concatenate the highlights

Returns

  • Vector{String}: the highlights

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf_dir) ==
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf) ==
String[
    "Highlight 1",
    "Highlight 2 Highlight 3",
    "Highlight 4",
    "Highhighlight 5",
    "6th Highhigh light-",
    "High light 7",
    "8th Highlight-",
]
source
PDFHighlights.Internal.PDF._get_titles_from_PDFMethod
_get_titles_from_PDF(target::String) -> Vector{String}

Extract all titles from all PDFs found recursively in the passed directory.

Arguments

  • dir::String: a directory with PDF files

Returns

  • Vector{String}: the titles

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

PDFHighlights.Internal.PDF._get_titles_from_PDF(path_to_pdf_dir) ==
["A dummy PDF for tests"]
source
PDFHighlights.Internal.PDF._sort!Method
_sort!(
    lines::Vector{String},
    lines_x_anchors::Vector{Float64},
    lines_yl_anchors::Vector{Float64},
    lines_yu_anchors::Vector{Float64},
) -> Vector{String}

Sort the lines vector using the arrays of anchors. Basic principle: if rectangles cross by ordinate, sort them by abscissa.

Arguments

  • lines::Vector{String}: the lines vector of the highlight (text found in the rectangles of the highlight)
  • lines_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the highlight rectangles
  • lines_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the highlight rectangles
  • lines_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the highlight rectangles

Returns

  • Vector{String}: the sorted lines vector of the highlight

Example

using PDFHighlights

lines = ["High", "high", "light"]
quad_x_anchors = [0.21, 0.15, 0.17]
quad_yl_anchors = [0.10, 0.10, 0.10]
quad_yu_anchors = [0.15, 0.12, 0.15]

PDFHighlights.Internal.PDF._sort!(
    lines,
    quad_x_anchors,
    quad_yl_anchors,
    quad_yu_anchors,
) == ["high", "light", "High"]
source
PDFHighlights.Internal.PDF._sort!Method
_sort!(
    highlights::Vector{String},
    comments::Vector{String},
    pages::Vector{Int32},
    highlights_x_anchors::Vector{Float64},
    highlights_yl_anchors::Vector{Float64},
    highlights_yu_anchors::Vector{Float64},
) -> Tuple{Vector{String}, Vector{String}}

Sort the highlights and comments vectors using the arrays of anchors and pages. Basic principle: if highlights cross by ordinate, sort them by abscissa.

Arguments

  • highlights::Vector{String}: the highlights
  • comments::Vector{String}: the comments
  • pages::Vector{Int32}: the pages
  • highlights_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the upper-left rectangle of the highlight
  • highlights_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the lower-left rectangle of the highlight
  • highlights_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the upper-left rectangle of the highlight

Returns

  • Tuple{Vector{String}, Vector{String}}: the sorted vectors of the highlights and comments

Example

using PDFHighlights

highlights = ["High", "high", "light"]
comments = ["Com", "com", "ment"]
pages = Int32[1, 1, 1]
highlights_x_anchors = [0.21, 0.15, 0.17]
highlights_yl_anchors = [0.10, 0.10, 0.10]
highlights_yu_anchors = [0.12, 0.15, 0.15]

PDFHighlights.Internal.PDF._sort!(
    highlights,
    comments,
    pages,
    highlights_x_anchors,
    highlights_yl_anchors,
    highlights_yu_anchors,
) ==
(
    ["high", "light", "High"],
    ["com", "ment", "Com"],
)
source
PDFHighlights.Internal.PDF.get_authorMethod
get_author(pdf::String) -> String

Extract the author from the PDF.

Arguments

  • pdf::String: absolute or relative path to the PDF file

Returns

  • String: the author

Throws

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf"
)

get_author(path_to_pdf) == "Pavel Sobolev"
source
PDFHighlights.Internal.PDF.get_author_titleMethod
get_author_title(pdf::String) -> Tuple{String, String}

Extract the author and title from the PDF.

Arguments

  • pdf::String: absolute or relative path to the PDF file

Returns

  • Tuple{String, String}: the author and title

Throws

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf",
)

get_author_title(path_to_pdf) == ("Pavel Sobolev", "A dummy PDF for tests")
source
PDFHighlights.Internal.PDF.get_authors_titlesMethod
get_authors_titles(dir::String) -> Tuple{Vector{String}, Vector{String}}

Extract the authors and titles from all PDFs found recursively in the passed directory.

Arguments

  • dir::String: a directory with PDF files

Returns

  • Tuple{Vector{String}, Vector{String}}: the authors and titles

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")

get_authors_titles(path_to_pdf_dir) == (["Pavel Sobolev"], ["A dummy PDF for tests"])
source
PDFHighlights.Internal.PDF.get_commentsMethod
get_comments(target::String; concatenate::Bool=false) -> Vector{String}

Extract the comments from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=false: if true, concatenate the highlights

Returns

  • Vector{String}: the comments

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_comments(path_to_pdf_dir; concatenate=true) ==
get_comments(path_to_pdf; concatenate=true) ==
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""]
source
PDFHighlights.Internal.PDF.get_comments_pagesMethod
get_comments_pages(
    target::String;
    concatenate::Bool=false,
) -> Tuple{Vector{String}, Vector{Int32}}

Extract the comments and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=false: if true, concatenate the highlights

Returns

  • Tuple{Vector{String}, Vector{Int32}}: the comments and pages

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_comments_pages(path_to_pdf_dir; concatenate=true) ==
get_comments_pages(path_to_pdf; concatenate=true) ==
(
    String["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
    Int32[1, 2, 4, 6, 7, 8, 9],
)
source
PDFHighlights.Internal.PDF.get_highlights_commentsMethod
get_highlights_comments(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}}

Extract the highlights and comments from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=true: if true, concatenate the highlights

Returns

  • Tuple{Vector{String}, Vector{String}}: the highlights and comments

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_comments(path_to_pdf_dir) ==
get_highlights_comments(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    ["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
)
source
PDFHighlights.Internal.PDF.get_highlights_comments_pagesMethod
get_highlights_comments_pages(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}

Extract the highlights, comments, and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=true: if true, concatenate the highlights

Returns

  • Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the highlights, comments, and pages

Throws

  • DoesNotExist: the specified file or directory doesn't exist
  • NotPDF: the specified path does not end in .pdf

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_comments_pages(path_to_pdf_dir) ==
get_highlights_comments_pages(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    ["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
    Int32[1, 2, 4, 6, 7, 8, 9],
)
source
PDFHighlights.Internal.PDF.get_highlights_pagesMethod
get_highlights_pages(
    target::String;
    concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{Int32}}

Extract the highlights and pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=true: if true, concatenate the highlights

Returns

  • Tuple{Vector{String}, Vector{Int32}}: the highlights and pages

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_highlights_pages(path_to_pdf_dir) ==
get_highlights_pages(path_to_pdf) ==
(
    [
        "Highlight 1",
        "Highlight 2 Highlight 3",
        "Highlight 4",
        "Highhighlight 5",
        "6th Highhigh light-",
        "High light 7",
        "8th Highlight-",
    ],
    Int32[1, 2, 4, 6, 7, 8, 9],
)
source
PDFHighlights.Internal.PDF.get_pagesMethod
get_pages(
    target::String;
    concatenate::Bool=false
) -> Vector{Int32}

Extract the pages from a passed PDF or all PDFs found recursively in the passed directory.

Arguments

  • target::String: a PDF file or a directory with PDF files

Keywords

  • concatenate::Bool=false: if true, concatenate the highlights

Returns

  • Vector{Int32}: the pages

Throws

Example

using PDFHighlights

path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")

get_pages(path_to_pdf_dir) == get_pages(path_to_pdf) == Int32[1, 2, 3, 4, 5, 6, 7, 8, 9]
source
PDFHighlights.Internal.PDF.get_titleMethod
get_title(pdf::String) -> String

Extract the title from the PDF.

Arguments

  • pdf::String: absolute or relative path to the PDF file

Returns

  • String: the title

Throws

Example

using PDFHighlights

path_to_pdf = joinpath(
    pathof(PDFHighlights) |> dirname |> dirname,
    "test",
    "pdf",
    "TestPDF.pdf",
)

get_title(path_to_pdf) == "A dummy PDF for tests"
source

Macros

PDFHighlights.Internal.PDF.@unsafe_wrapMacro
@unsafe_wrap(array::Expr, len::Union{Symbol, Expr}) -> Expr

Wrap a Julia Array object around the data at the address given by array pointer with length equal to len.

Arguments

  • array::Expr: expression that will yield a pointer to the array data
  • len::Union{Symbol, Expr}: name of the variable which holds the length of this array, or an expression that will yield it

Returns

  • Expr: a wrapping expression

Example

using PDFHighlights

_array = :(array[index])
_len = :len

@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array[index], len)) ==
:(unsafe_wrap(Array, $(_array), $(_len); own = true))

See also: @unsafe_wrap_strings

source
PDFHighlights.Internal.PDF.@unsafe_wrapMacro
@unsafe_wrap(array::Symbol, len::Symbol) -> Expr

Wrap a Julia Array object around the data at the address given by array[] pointer with length equal to len[].

Arguments

  • array::Symbol: name of the variable which holds the pointer to the array data
  • len::Symbol: name of the variable which holds the pointer to the length of this array

Returns

  • Expr: a wrapping expression

Example

using PDFHighlights

_array = :array
_len = :len

@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array, len)) ==
:(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true))

See also: @unsafe_wrap_strings

source
PDFHighlights.Internal.PDF.@unsafe_wrap_stringsMacro
@unsafe_wrap_strings(array::Union{Symbol, Expr}, len::Union{Symbol, Expr}) -> Expr

Wrap a Julia Array object around the array of C-style strings at the address given by array (or array[]) pointer with length equal to len (or len[]); convert each string to a Julia string encoded as UTF-8.

Arguments

  • array::Union{Symbol, Expr}: name of the variable which holds the pointer to the array data, or expression that will yield it
  • len::Union{Symbol, Expr}: name of the variable which holds the length of this array (or a pointer to it), or expression that will yield it

Returns

  • Expr: a wrapping expression

Example

using PDFHighlights
using PDFHighlights: Internal.PDF.@unsafe_wrap

_array = :array
_len = :len

@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap_strings(array, len)) ==
:(unsafe_string.(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true)))

See also: @unsafe_wrap

source