Module
PDFHighlights.Internal.PDF — ModuleThis module contains functions that can only be applied to PDF files.
Constants
PDFHighlights.Internal.PDF.GET_AUTHOR_TITLE_OUTPUTS — ConstantA dictionary that is created at runtime to store the results of the get_author_title function.
PDFHighlights.Internal.PDF.GET_HIGHLIGHTS_COMMENTS_PAGES_OUTPUTS — ConstantA dictionary that is created at runtime to store the results of the get_highlights_comments_pages function.
Functions
PDFHighlights.Internal.PDF._concatenate — Method_concatenate(
highlights::Vector{String},
comments::Vector{String},
pages::Vector{Int32},
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}Concatenate highlights based on comments; merge pages.
Arguments
highlights::Vector{String}: the highlights vectorcomments::Vector{String}: the comments vectorpages::Vector{Int32}: the pages vector
Returns
Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the concatenated arguments
Example
using PDFHighlights
PDFHighlights.Internal.PDF._concatenate(
String["Highlight 1", "High-", "light 2"],
String["Comment 1", ".c1 Comment 2", ".c2"],
Int32[1, 2, 3],
) ==
(
String["Highlight 1", "Highlight 2"],
String["Comment 1", "Comment 2"],
Int32[1, 2],
)PDFHighlights.Internal.PDF._get_authors_from_PDF — Method_get_authors_from_PDF(dir::String) -> Vector{String}Extract all authors from all PDFs found recursively in the passed directory.
Arguments
dir::String: a directory with PDF files
Returns
Vector{String}: the authors
Throws
- Exceptions from:
get_authors_titles
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
PDFHighlights.Internal.PDF._get_authors_from_PDF(path_to_pdf_dir) == ["Pavel Sobolev"]PDFHighlights.Internal.PDF._get_highlights_from_PDF — Method_get_highlights_from_PDF(target::String; concatenate::Bool=true) -> Vector{String}Extract all highlights from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true: iftrue, concatenate the highlights
Returns
Vector{String}: the highlights
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf_dir) ==
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf) ==
String[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
]PDFHighlights.Internal.PDF._get_titles_from_PDF — Method_get_titles_from_PDF(target::String) -> Vector{String}Extract all titles from all PDFs found recursively in the passed directory.
Arguments
dir::String: a directory with PDF files
Returns
Vector{String}: the titles
Throws
- Exceptions from:
get_authors_titles
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
PDFHighlights.Internal.PDF._get_titles_from_PDF(path_to_pdf_dir) ==
["A dummy PDF for tests"]PDFHighlights.Internal.PDF._sort! — Method_sort!(
lines::Vector{String},
lines_x_anchors::Vector{Float64},
lines_yl_anchors::Vector{Float64},
lines_yu_anchors::Vector{Float64},
) -> Vector{String}Sort the lines vector using the arrays of anchors. Basic principle: if rectangles cross by ordinate, sort them by abscissa.
Arguments
lines::Vector{String}: the lines vector of the highlight (text found in the rectangles of the highlight)lines_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the highlight rectangleslines_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the highlight rectangleslines_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the highlight rectangles
Returns
Vector{String}: the sorted lines vector of the highlight
Example
using PDFHighlights
lines = ["High", "high", "light"]
quad_x_anchors = [0.21, 0.15, 0.17]
quad_yl_anchors = [0.10, 0.10, 0.10]
quad_yu_anchors = [0.15, 0.12, 0.15]
PDFHighlights.Internal.PDF._sort!(
lines,
quad_x_anchors,
quad_yl_anchors,
quad_yu_anchors,
) == ["high", "light", "High"]PDFHighlights.Internal.PDF._sort! — Method_sort!(
highlights::Vector{String},
comments::Vector{String},
pages::Vector{Int32},
highlights_x_anchors::Vector{Float64},
highlights_yl_anchors::Vector{Float64},
highlights_yu_anchors::Vector{Float64},
) -> Tuple{Vector{String}, Vector{String}}Sort the highlights and comments vectors using the arrays of anchors and pages. Basic principle: if highlights cross by ordinate, sort them by abscissa.
Arguments
highlights::Vector{String}: the highlightscomments::Vector{String}: the commentspages::Vector{Int32}: the pageshighlights_x_anchors::Vector{Float64}: the coordinate vector of the abscissa of the left side of the upper-left rectangle of the highlighthighlights_yl_anchors::Vector{Float64}: the coordinate vector of the ordinate of the lower-left corner of the lower-left rectangle of the highlighthighlights_yu_anchors::Vector{Float64}: the coordinate vector of the ordinate of the upper-left corner of the upper-left rectangle of the highlight
Returns
Tuple{Vector{String}, Vector{String}}: the sorted vectors of the highlights and comments
Example
using PDFHighlights
highlights = ["High", "high", "light"]
comments = ["Com", "com", "ment"]
pages = Int32[1, 1, 1]
highlights_x_anchors = [0.21, 0.15, 0.17]
highlights_yl_anchors = [0.10, 0.10, 0.10]
highlights_yu_anchors = [0.12, 0.15, 0.15]
PDFHighlights.Internal.PDF._sort!(
highlights,
comments,
pages,
highlights_x_anchors,
highlights_yl_anchors,
highlights_yu_anchors,
) ==
(
["high", "light", "High"],
["com", "ment", "Com"],
)PDFHighlights.Internal.PDF.get_author — Methodget_author(pdf::String) -> StringExtract the author from the PDF.
Arguments
pdf::String: absolute or relative path to the PDF file
Returns
String: the author
Throws
- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf"
)
get_author(path_to_pdf) == "Pavel Sobolev"PDFHighlights.Internal.PDF.get_author_title — Methodget_author_title(pdf::String) -> Tuple{String, String}Extract the author and title from the PDF.
Arguments
pdf::String: absolute or relative path to the PDF file
Returns
Tuple{String, String}: the author and title
Throws
FileDoesNotExist: the specified file doesn't existNotPDF: the specified path does not end in.pdf
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf",
)
get_author_title(path_to_pdf) == ("Pavel Sobolev", "A dummy PDF for tests")PDFHighlights.Internal.PDF.get_authors_titles — Methodget_authors_titles(dir::String) -> Tuple{Vector{String}, Vector{String}}Extract the authors and titles from all PDFs found recursively in the passed directory.
Arguments
dir::String: a directory with PDF files
Returns
Tuple{Vector{String}, Vector{String}}: the authors and titles
Throws
DirectoryDoesNotExist: the specified directory doesn't exist- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
get_authors_titles(path_to_pdf_dir) == (["Pavel Sobolev"], ["A dummy PDF for tests"])PDFHighlights.Internal.PDF.get_comments — Methodget_comments(target::String; concatenate::Bool=false) -> Vector{String}Extract the comments from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false: iftrue, concatenate the highlights
Returns
Vector{String}: the comments
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_comments(path_to_pdf_dir; concatenate=true) ==
get_comments(path_to_pdf; concatenate=true) ==
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""]PDFHighlights.Internal.PDF.get_comments_pages — Methodget_comments_pages(
target::String;
concatenate::Bool=false,
) -> Tuple{Vector{String}, Vector{Int32}}Extract the comments and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false: iftrue, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{Int32}}: the comments and pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_comments_pages(path_to_pdf_dir; concatenate=true) ==
get_comments_pages(path_to_pdf; concatenate=true) ==
(
String["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
Int32[1, 2, 4, 6, 7, 8, 9],
)PDFHighlights.Internal.PDF.get_highlights_comments — Methodget_highlights_comments(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}}Extract the highlights and comments from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true: iftrue, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{String}}: the highlights and comments
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_comments(path_to_pdf_dir) ==
get_highlights_comments(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
)PDFHighlights.Internal.PDF.get_highlights_comments_pages — Methodget_highlights_comments_pages(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}Extract the highlights, comments, and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true: iftrue, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{String}, Vector{Int32}}: the highlights, comments, and pages
Throws
DoesNotExist: the specified file or directory doesn't existNotPDF: the specified path does not end in.pdf
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_comments_pages(path_to_pdf_dir) ==
get_highlights_comments_pages(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
Int32[1, 2, 4, 6, 7, 8, 9],
)PDFHighlights.Internal.PDF.get_highlights_pages — Methodget_highlights_pages(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{Int32}}Extract the highlights and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true: iftrue, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{Int32}}: the highlights and pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_pages(path_to_pdf_dir) ==
get_highlights_pages(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
Int32[1, 2, 4, 6, 7, 8, 9],
)PDFHighlights.Internal.PDF.get_pages — Methodget_pages(
target::String;
concatenate::Bool=false
) -> Vector{Int32}Extract the pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false: iftrue, concatenate the highlights
Returns
Vector{Int32}: the pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_pages(path_to_pdf_dir) == get_pages(path_to_pdf) == Int32[1, 2, 3, 4, 5, 6, 7, 8, 9]PDFHighlights.Internal.PDF.get_title — Methodget_title(pdf::String) -> StringExtract the title from the PDF.
Arguments
pdf::String: absolute or relative path to the PDF file
Returns
String: the title
Throws
- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf",
)
get_title(path_to_pdf) == "A dummy PDF for tests"Macros
PDFHighlights.Internal.PDF.@unsafe_wrap — Macro@unsafe_wrap(array::Expr, len::Union{Symbol, Expr}) -> ExprWrap a Julia Array object around the data at the address given by array pointer with length equal to len.
Arguments
array::Expr: expression that will yield a pointer to the array datalen::Union{Symbol, Expr}: name of the variable which holds the length of this array, or an expression that will yield it
Returns
Expr: a wrapping expression
Example
using PDFHighlights
_array = :(array[index])
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array[index], len)) ==
:(unsafe_wrap(Array, $(_array), $(_len); own = true))See also: @unsafe_wrap_strings
PDFHighlights.Internal.PDF.@unsafe_wrap — Macro@unsafe_wrap(array::Symbol, len::Symbol) -> ExprWrap a Julia Array object around the data at the address given by array[] pointer with length equal to len[].
Arguments
array::Symbol: name of the variable which holds the pointer to the array datalen::Symbol: name of the variable which holds the pointer to the length of this array
Returns
Expr: a wrapping expression
Example
using PDFHighlights
_array = :array
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array, len)) ==
:(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true))See also: @unsafe_wrap_strings
PDFHighlights.Internal.PDF.@unsafe_wrap_strings — Macro@unsafe_wrap_strings(array::Union{Symbol, Expr}, len::Union{Symbol, Expr}) -> ExprWrap a Julia Array object around the array of C-style strings at the address given by array (or array[]) pointer with length equal to len (or len[]); convert each string to a Julia string encoded as UTF-8.
Arguments
array::Union{Symbol, Expr}: name of the variable which holds the pointer to the array data, or expression that will yield itlen::Union{Symbol, Expr}: name of the variable which holds the length of this array (or a pointer to it), or expression that will yield it
Returns
Expr: a wrapping expression
Example
using PDFHighlights
using PDFHighlights: Internal.PDF.@unsafe_wrap
_array = :array
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap_strings(array, len)) ==
:(unsafe_string.(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true)))See also: @unsafe_wrap