NEWS
ore 1.7.4
- Named groups would not be propagated to match matrices unless the regex was
pre-compiled using 'ore()'. This has been corrected.
- A compiler warning about a 'printf'-type format specification has been
resolved.
ore 1.7.3
- The package will now properly detect a plain locale like "UTF-8" on start-up.
- C prototype warnings have been resolved, and the problematic 'sprintf()'
function is now avoided.
ore 1.7.2
- A handful of small memory leaks have been plugged.
- The 'README' has been updated to detail the current divergence between CRAN
and mainline versions of the package.
ore 1.7.1
- The ‘binary' argument to 'ore_file()' is now stored in an attribute in its
return value. The 'ore_search()' function additionally uses this attribute
to determine whether or not to set the 'text' element of its return value.
Treating a binary file’s entire contents as a string is unwise, and may
include embedded nuls and other problem bytes.
- A minor tweak has been made to the 'es()' function, which appreciably
improves its performance in simple cases.
- A potential buffer overrun and protection bug have both been corrected.
ore 1.7.0 (2021-10-12)
- The Onigmo library has been updated to version 6.2.0.
- R connections are now supported as text sources, allowing URLs, gzipped files
and pipes (amongst others; see ‘?connections') to be straightforwardly
interfaced with the package. The package’s C backend has been substantially
refactored to support files and connections in all the core functions.
- The new 'ore_switch()' function selects between a number of possible outputs
by matching each element of its input against a series of regular
expressions. This can be a useful way to handle different possible forms of
the same information.
- The new 'ore_repl()' function is a relative of the existing 'ore_subst()',
but differs in how it is vectorised. Unlike 'ore_subst()', it will replicate
the source text if necessary to ensure that all specified replacements are
used. The 'es()' function now uses 'ore_repl()', and so can produce output
vectors longer than its input.
- 'ore_subst()' gains a "start" argument, like 'ore_search()', and now accepts
multiple replacements, which will be applied in sequence to different
matches.
- 'ore_match()' is a new alias of 'ore_search()'.
- Substitution group references can now include "\\0" for the whole match
string, and allow for group numbers higher than 9. Group numbers that are out
of range should now produce an error, rather than potentially leading to a
segfault. Named group references should also be more robust.
- A multilingual, multiscript dataset, "glass", is now included with the
package thanks to Frank da Cruz and many contributors.
- Names and 'NA' values are now propagated from text arguments that are
character vectors.
- There is a new option ('ore.keepNA') to propagate 'NA's in 'ore_ismatch()',
and the infix functions '%~%' and '%~~%', rather than convert them
implicitly to 'FALSE'. This is off by default for backwards compatibility.
- Printing has been improved, and in particular "orematches" objects now show
(just) the elements that matched.
- Underscore-separated names are now used in preference in the package
documentation, following the trend in other packages, but the period-
separated versions are still available.
- Detection of the platform's native encoding has been improved, and this will
be asserted in regular expressions created by 'ore()', rather than them being
marked as being in "unknown" encoding by default as before. Handling of
encodings has been refined elsewhere too.
ore 1.6.3 (2019-11-02)
- The ore.file() function now performs path expansion on its argument.
- The package is now more careful to check that "external" pointers are valid
before dereferencing them, thereby avoiding potential segfaults. (Reported
by Thomas Weise, #5).
- Multiarchitecture support is now explicitly declared, ensuring that 32-bit
and 64-bit versions are built when installing from source on Windows.
(Reported by Ioannis Mamalikidis, #6).
ore 1.6.2 (2018-08-30)
- Additional warnings from GCC 8 and rchk have been resolved.
ore 1.6.1 (2018-05-22)
- The Onigmo library has been updated to version 6.1.3.
- Various compiler and sanitiser warnings have been resolved.
ore 1.6.0 (2017-04-13)
- The Onigmo library has been updated to version 6.1.1.
- Files will now be searched incrementally, meaning that large files do not
need to be read in their entirety if the match is near the beginning. This is
disabled by default if all matches are requested.
- Additional arguments to ore.search() are now accessible through an ellipsis
argument to ore.ismatch().
- The results of a substitution function are now recycled across matches, and
can therefore be of any length.
- A handful of subtle garbage-collection heisenbugs have been corrected.
(Reported by Tomas Kalibera.)
ore 1.5.0 (2016-09-09)
- The object returned by ore.search() now generally has the class "orematches".
This class has its own indexing and print methods to avoid copious output and
allow quicker access to multiple matches.
- The ore.search() and ore.split() functions now accept zero-length text
vectors, returning an empty list rather than producing an error.
- The matches() and groups() functions now use the last match by default, if
called with no argument.
ore 1.4.0 (2016-05-07)
- The new ore.escape() function can be used to escape characters which would
usually have special significance in a regular expression. This can be
helpful when incorporating arbitrary strings into a regex using ore().
ore 1.3.0 (2016-03-02)
- It is now possible to search directly in text files, using their native
encoding, or in binary files, byte-by-byte. To do this, a call to the new
ore.file() function should replace the usual text argument to ore.search().
All of Onigmo's many encodings are supported for text files.
- A new infix operator, %~|%, can be used filter vectors by whether they match
a regular expression.
- The "matches" and "groups" methods for lists now drop nonmatching elements by
default. This helps avoid many NULLs in results.
- The "print" method for "orematch" objects now works correctly with
double-width characters.
ore 1.2.2 (2016-01-27)
- Zero-length matches could lead to segmentation faults. This has been
corrected.
- Calls to ore.dict() which include other function calls are now handled
properly. (Reported by Jon Olav Vik.)
- The es() function would previously only honour its rounding arguments if all
matches were numeric. This has been corrected.
ore 1.2.1 (2015-08-10)
- The es() function would previously only replace the first embedded
expression. That has been corrected.
- Attributes are now ignored when comparing "ore" objects in tests. This led to
a failure on versions of R prior to 3.2.0, at least on Windows.
ore 1.2.0 (2015-07-17)
- Oniguruma's support for alternative regular expression syntaxes is now
exposed by the ore() function. At present only "fixed" (i.e., literal)
patterns are supported in addition to the default, which has always been
"ruby".
- The ore.lastmatch() function gains a "simplify" argument.
- Underscore-separated function names are now available as aliases to their
period-separated equivalents. For example, you can use ore_search() instead
of ore.search(), if you prefer.
- The use of variables holding patterns could sometimes cause errors in ore().
This has been corrected.
ore 1.1.0 (2015-02-26)
- The package now provides functionality for creating and working with a
pattern dictionary, and a few predefined patterns are included. This allows
common regular expressions to be easily named, stored, and integrated into
larger expressions later. The key new function is ore.dict().
- The new es() function provides expression substitution, a convenient way to
embed R expressions within strings, and a good example of the package's
substitution functions in action.
- Printing of "orematch" objects is now much clearer, particularly for long or
multiline search texts.
- Group matches are now passed to substitution functions, in an attribute.
- The "ore.colour" (or "ore.color") option can now be used to determine whether
to print in colour, rather than using the crayon package (although the latter
is still the default).
ore 1.0.7
- Printing "orematch" objects no longer leads to errors. This was due to a
mismatch in function names between the C code and the calling R code.
ore 1.0.6 (2015-01-01)
- All of the core functions have now switched to primarily C implementations,
providing substantial performance gains. Minor changes in the handling of
text encodings have occurred alongside this.
- The limit of 128 matches per string has been removed, due to more
sophisticated memory management.
- The arguments to ore.subst() have been reordered slightly, to avoid
inadvertent partial matching when using a substitution function.
ore 1.0.5
- Calculation of match offsets within the C code have been made much more
efficient.
ore 1.0.4 (2014-11-25)
- Default methods have been added for matches() and groups(), which return NA.
- The "rex" test is now skipped if that package is not available.
- Various low-level warnings/errors from UBSAN have been addressed.
ore 1.0.3
- Almost all of the work of ore.search() is now done in C code rather than R,
for performance. In testing, ore is now appreciably faster than base R for
simple searches, when the regex is precompiled. Optimisation of other
functions will follow in later releases.
- The package no longer requires any specified minimum version of R (although
it has not been tested on old releases).
ore 1.0.2
- A bug in the ore.subst() function that could produce low-level errors or
segmentation faults has been fixed.
ore 1.0.1 (2014-10-22)
- Tests should no longer fail in locales which do not use a UTF-8 encoding.
- Documentation tweaks.
ore 1.0.0 (2014-10-20)