Module
PDFHighlights.Internal.PDF
— ModuleThis module contains functions that can only be applied to PDF files.
Constants
PDFHighlights.Internal.PDF.GET_AUTHOR_TITLE_OUTPUTS
— ConstantA dictionary that is created at runtime to store the results of the get_author_title
function.
PDFHighlights.Internal.PDF.GET_HIGHLIGHTS_COMMENTS_PAGES_OUTPUTS
— ConstantA dictionary that is created at runtime to store the results of the get_highlights_comments_pages
function.
Functions
PDFHighlights.Internal.PDF._concatenate
— Method_concatenate(
highlights::Vector{String},
comments::Vector{String},
pages::Vector{Int32},
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}
Concatenate highlights based on comments; merge pages.
Arguments
highlights::Vector{String}
: the highlights vectorcomments::Vector{String}
: the comments vectorpages::Vector{Int32}
: the pages vector
Returns
Tuple{Vector{String}, Vector{String}, Vector{Int32}}
: the concatenated arguments
Example
using PDFHighlights
PDFHighlights.Internal.PDF._concatenate(
String["Highlight 1", "High-", "light 2"],
String["Comment 1", ".c1 Comment 2", ".c2"],
Int32[1, 2, 3],
) ==
(
String["Highlight 1", "Highlight 2"],
String["Comment 1", "Comment 2"],
Int32[1, 2],
)
PDFHighlights.Internal.PDF._get_authors_from_PDF
— Method_get_authors_from_PDF(dir::String) -> Vector{String}
Extract all authors from all PDFs found recursively in the passed directory.
Arguments
dir::String
: a directory with PDF files
Returns
Vector{String}
: the authors
Throws
- Exceptions from:
get_authors_titles
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
PDFHighlights.Internal.PDF._get_authors_from_PDF(path_to_pdf_dir) == ["Pavel Sobolev"]
PDFHighlights.Internal.PDF._get_highlights_from_PDF
— Method_get_highlights_from_PDF(target::String; concatenate::Bool=true) -> Vector{String}
Extract all highlights from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true
: iftrue
, concatenate the highlights
Returns
Vector{String}
: the highlights
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf_dir) ==
PDFHighlights.Internal.PDF._get_highlights_from_PDF(path_to_pdf) ==
String[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
]
PDFHighlights.Internal.PDF._get_titles_from_PDF
— Method_get_titles_from_PDF(target::String) -> Vector{String}
Extract all titles from all PDFs found recursively in the passed directory.
Arguments
dir::String
: a directory with PDF files
Returns
Vector{String}
: the titles
Throws
- Exceptions from:
get_authors_titles
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
PDFHighlights.Internal.PDF._get_titles_from_PDF(path_to_pdf_dir) ==
["A dummy PDF for tests"]
PDFHighlights.Internal.PDF._sort!
— Method_sort!(
lines::Vector{String},
lines_x_anchors::Vector{Float64},
lines_yl_anchors::Vector{Float64},
lines_yu_anchors::Vector{Float64},
) -> Vector{String}
Sort the lines vector using the arrays of anchors. Basic principle: if rectangles cross by ordinate, sort them by abscissa.
Arguments
lines::Vector{String}
: the lines vector of the highlight (text found in the rectangles of the highlight)lines_x_anchors::Vector{Float64}
: the coordinate vector of the abscissa of the left side of the highlight rectangleslines_yl_anchors::Vector{Float64}
: the coordinate vector of the ordinate of the lower-left corner of the highlight rectangleslines_yu_anchors::Vector{Float64}
: the coordinate vector of the ordinate of the upper-left corner of the highlight rectangles
Returns
Vector{String}
: the sorted lines vector of the highlight
Example
using PDFHighlights
lines = ["High", "high", "light"]
quad_x_anchors = [0.21, 0.15, 0.17]
quad_yl_anchors = [0.10, 0.10, 0.10]
quad_yu_anchors = [0.15, 0.12, 0.15]
PDFHighlights.Internal.PDF._sort!(
lines,
quad_x_anchors,
quad_yl_anchors,
quad_yu_anchors,
) == ["high", "light", "High"]
PDFHighlights.Internal.PDF._sort!
— Method_sort!(
highlights::Vector{String},
comments::Vector{String},
pages::Vector{Int32},
highlights_x_anchors::Vector{Float64},
highlights_yl_anchors::Vector{Float64},
highlights_yu_anchors::Vector{Float64},
) -> Tuple{Vector{String}, Vector{String}}
Sort the highlights and comments vectors using the arrays of anchors and pages. Basic principle: if highlights cross by ordinate, sort them by abscissa.
Arguments
highlights::Vector{String}
: the highlightscomments::Vector{String}
: the commentspages::Vector{Int32}
: the pageshighlights_x_anchors::Vector{Float64}
: the coordinate vector of the abscissa of the left side of the upper-left rectangle of the highlighthighlights_yl_anchors::Vector{Float64}
: the coordinate vector of the ordinate of the lower-left corner of the lower-left rectangle of the highlighthighlights_yu_anchors::Vector{Float64}
: the coordinate vector of the ordinate of the upper-left corner of the upper-left rectangle of the highlight
Returns
Tuple{Vector{String}, Vector{String}}
: the sorted vectors of the highlights and comments
Example
using PDFHighlights
highlights = ["High", "high", "light"]
comments = ["Com", "com", "ment"]
pages = Int32[1, 1, 1]
highlights_x_anchors = [0.21, 0.15, 0.17]
highlights_yl_anchors = [0.10, 0.10, 0.10]
highlights_yu_anchors = [0.12, 0.15, 0.15]
PDFHighlights.Internal.PDF._sort!(
highlights,
comments,
pages,
highlights_x_anchors,
highlights_yl_anchors,
highlights_yu_anchors,
) ==
(
["high", "light", "High"],
["com", "ment", "Com"],
)
PDFHighlights.Internal.PDF.get_author
— Methodget_author(pdf::String) -> String
Extract the author from the PDF.
Arguments
pdf::String
: absolute or relative path to the PDF file
Returns
String
: the author
Throws
- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf"
)
get_author(path_to_pdf) == "Pavel Sobolev"
PDFHighlights.Internal.PDF.get_author_title
— Methodget_author_title(pdf::String) -> Tuple{String, String}
Extract the author and title from the PDF.
Arguments
pdf::String
: absolute or relative path to the PDF file
Returns
Tuple{String, String}
: the author and title
Throws
FileDoesNotExist
: the specified file doesn't existNotPDF
: the specified path does not end in.pdf
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf",
)
get_author_title(path_to_pdf) == ("Pavel Sobolev", "A dummy PDF for tests")
PDFHighlights.Internal.PDF.get_authors_titles
— Methodget_authors_titles(dir::String) -> Tuple{Vector{String}, Vector{String}}
Extract the authors and titles from all PDFs found recursively in the passed directory.
Arguments
dir::String
: a directory with PDF files
Returns
Tuple{Vector{String}, Vector{String}}
: the authors and titles
Throws
DirectoryDoesNotExist
: the specified directory doesn't exist- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
get_authors_titles(path_to_pdf_dir) == (["Pavel Sobolev"], ["A dummy PDF for tests"])
PDFHighlights.Internal.PDF.get_comments
— Methodget_comments(target::String; concatenate::Bool=false) -> Vector{String}
Extract the comments from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false
: iftrue
, concatenate the highlights
Returns
Vector{String}
: the comments
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_comments(path_to_pdf_dir; concatenate=true) ==
get_comments(path_to_pdf; concatenate=true) ==
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""]
PDFHighlights.Internal.PDF.get_comments_pages
— Methodget_comments_pages(
target::String;
concatenate::Bool=false,
) -> Tuple{Vector{String}, Vector{Int32}}
Extract the comments and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false
: iftrue
, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{Int32}}
: the comments and pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_comments_pages(path_to_pdf_dir; concatenate=true) ==
get_comments_pages(path_to_pdf; concatenate=true) ==
(
String["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
Int32[1, 2, 4, 6, 7, 8, 9],
)
PDFHighlights.Internal.PDF.get_highlights_comments
— Methodget_highlights_comments(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}}
Extract the highlights and comments from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true
: iftrue
, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{String}}
: the highlights and comments
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_comments(path_to_pdf_dir) ==
get_highlights_comments(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
)
PDFHighlights.Internal.PDF.get_highlights_comments_pages
— Methodget_highlights_comments_pages(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{String}, Vector{Int32}}
Extract the highlights, comments, and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true
: iftrue
, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{String}, Vector{Int32}}
: the highlights, comments, and pages
Throws
DoesNotExist
: the specified file or directory doesn't existNotPDF
: the specified path does not end in.pdf
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_comments_pages(path_to_pdf_dir) ==
get_highlights_comments_pages(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
["Comment 1", "Comment 2 Comment 3", "Comment 4", "", "", "", ""],
Int32[1, 2, 4, 6, 7, 8, 9],
)
PDFHighlights.Internal.PDF.get_highlights_pages
— Methodget_highlights_pages(
target::String;
concatenate::Bool=true
) -> Tuple{Vector{String}, Vector{Int32}}
Extract the highlights and pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=true
: iftrue
, concatenate the highlights
Returns
Tuple{Vector{String}, Vector{Int32}}
: the highlights and pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_highlights_pages(path_to_pdf_dir) ==
get_highlights_pages(path_to_pdf) ==
(
[
"Highlight 1",
"Highlight 2 Highlight 3",
"Highlight 4",
"Highhighlight 5",
"6th Highhigh light-",
"High light 7",
"8th Highlight-",
],
Int32[1, 2, 4, 6, 7, 8, 9],
)
PDFHighlights.Internal.PDF.get_pages
— Methodget_pages(
target::String;
concatenate::Bool=false
) -> Vector{Int32}
Extract the pages from a passed PDF or all PDFs found recursively in the passed directory.
Arguments
target::String
: a PDF file or a directory with PDF files
Keywords
concatenate::Bool=false
: iftrue
, concatenate the highlights
Returns
Vector{Int32}
: the pages
Throws
- Exceptions from:
get_highlights_comments_pages
Example
using PDFHighlights
path_to_pdf_dir = joinpath(pathof(PDFHighlights) |> dirname |> dirname, "test", "pdf")
path_to_pdf = joinpath(path_to_pdf_dir, "TestPDF.pdf")
get_pages(path_to_pdf_dir) == get_pages(path_to_pdf) == Int32[1, 2, 3, 4, 5, 6, 7, 8, 9]
PDFHighlights.Internal.PDF.get_title
— Methodget_title(pdf::String) -> String
Extract the title from the PDF.
Arguments
pdf::String
: absolute or relative path to the PDF file
Returns
String
: the title
Throws
- Exceptions from:
get_author_title
Example
using PDFHighlights
path_to_pdf = joinpath(
pathof(PDFHighlights) |> dirname |> dirname,
"test",
"pdf",
"TestPDF.pdf",
)
get_title(path_to_pdf) == "A dummy PDF for tests"
Macros
PDFHighlights.Internal.PDF.@unsafe_wrap
— Macro@unsafe_wrap(array::Expr, len::Union{Symbol, Expr}) -> Expr
Wrap a Julia Array
object around the data at the address given by array
pointer with length equal to len
.
Arguments
array::Expr
: expression that will yield a pointer to the array datalen::Union{Symbol, Expr}
: name of the variable which holds the length of this array, or an expression that will yield it
Returns
Expr
: a wrapping expression
Example
using PDFHighlights
_array = :(array[index])
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array[index], len)) ==
:(unsafe_wrap(Array, $(_array), $(_len); own = true))
See also: @unsafe_wrap_strings
PDFHighlights.Internal.PDF.@unsafe_wrap
— Macro@unsafe_wrap(array::Symbol, len::Symbol) -> Expr
Wrap a Julia Array
object around the data at the address given by array[]
pointer with length equal to len[]
.
Arguments
array::Symbol
: name of the variable which holds the pointer to the array datalen::Symbol
: name of the variable which holds the pointer to the length of this array
Returns
Expr
: a wrapping expression
Example
using PDFHighlights
_array = :array
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap(array, len)) ==
:(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true))
See also: @unsafe_wrap_strings
PDFHighlights.Internal.PDF.@unsafe_wrap_strings
— Macro@unsafe_wrap_strings(array::Union{Symbol, Expr}, len::Union{Symbol, Expr}) -> Expr
Wrap a Julia Array
object around the array of C-style strings at the address given by array
(or array[]
) pointer with length equal to len
(or len[]
); convert each string to a Julia string encoded as UTF-8.
Arguments
array::Union{Symbol, Expr}
: name of the variable which holds the pointer to the array data, or expression that will yield itlen::Union{Symbol, Expr}
: name of the variable which holds the length of this array (or a pointer to it), or expression that will yield it
Returns
Expr
: a wrapping expression
Example
using PDFHighlights
using PDFHighlights: Internal.PDF.@unsafe_wrap
_array = :array
_len = :len
@macroexpand(PDFHighlights.Internal.PDF.@unsafe_wrap_strings(array, len)) ==
:(unsafe_string.(unsafe_wrap(Array, $(_array)[], $(_len)[]; own = true)))
See also: @unsafe_wrap