Title: | Dictionaries for the 'SemNetCleaner' Package |
---|---|
Description: | Implements dictionaries that can be used in the 'SemNetCleaner' package. Also includes several functions aimed at facilitating the text cleaning analysis in the 'SemNetCleaner' package. This package is designed to integrate and update word lists and dictionaries based on each user's individual needs by allowing users to store and save their own dictionaries. Dictionaries can be added to the 'SemNetDictionaries' package by submitting user-defined dictionaries to <https://github.com/AlexChristensen/SemNetDictionaries>. |
Authors: | Alexander P. Christensen [aut, cre] |
Maintainer: | Alexander P. Christensen <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 0.2.0 |
Built: | 2024-10-27 03:34:38 UTC |
Source: | https://github.com/alexchristensen/semnetdictionaries |
Implements dictionaries that can be used in the SemNetCleaner-package
.
Also includes several functions aimed at facilitating the text cleaning analysis in the SemNetCleaner-package
.
This package is designed to integrate and update word lists and dictionaries based on each
user's individual needs by allowing users to store and save their own dictionaries.
Dictionaries can be added to the SemNetDictionaries
package by submitting user-defined
dictionaries to https://github.com/AlexChristensen/SemNetDictionaries.
Alexander Christensen <[email protected]>
Useful links:
Report bugs at https://github.com/AlexChristensen/SemNetDictionaries/issues
A database of possible animals responses (n = 1211)
data(animals.dictionary)
data(animals.dictionary)
animals.dictionary (vector, length = 1211)
To add additional animals to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("animals.dictionary")
data("animals.dictionary")
A database of possible animals monikers and common spelling errors
data(animals.moniker)
data(animals.moniker)
animals.moniker (list, length = 236)
To add additional animals monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("animals.moniker")
data("animals.moniker")
A function designed to create post-hoc dictionaries in the
SemNetDictionaries
package. This allows for new semantic categories or word lists
to be saved for future use (i.e., your own personal dictionary).
Dictionaries created using this function can either be saved as an R object to your global
environment or as a .rds file on your current computer. Open-source community-derived
dictionaries can be uploaded to and downloaded from
https://github.com/AlexChristensen/SemNetDictionaries
append.dictionary( ..., dictionary.name = "appendix", save.location = c("envir", "wd", "choose", "path"), path = NULL, textcleaner = FALSE, package = FALSE )
append.dictionary( ..., dictionary.name = "appendix", save.location = c("envir", "wd", "choose", "path"), path = NULL, textcleaner = FALSE, package = FALSE )
... |
Character vector. A vector of words to create or add to a dictionary |
dictionary.name |
Character.
Name of dictionary to create or add words to.
Defaults to |
save.location |
Character.
A choice for where to store appendix dictionary.
Defaults to
|
path |
Character.
A path to an existing directory.
Only necessary for |
textcleaner |
Boolean.
Argument for skipping asking to save the dictionary twice.
Defaults to |
package |
Boolean. Argument not meant for user use. Allows me to update the package's dictionaries efficiently |
Appendix dictionaries are useful for storing spelling
definitions that are not available in the SemNetDictionaries
package. This function enables the storage of personalized dictionaries,
which can be used in combination with other dictionaries to facilitate
the cleaning of text data.
Dictionaries are either stored in R
's global environment,
where they will be deleted once R
is closed (unless you save them),
or in a directory you choose. A menu will pop-up asking whether you would like to
save or update your dictionary.
You have two options:
Yes
(or 1
):
Gives this function permission to
save (or update) your dictionary to a chosen directory.
If save.location = "envir"
, your file will
be deleted after closing R
No
(or 2
):
Does NOT give this function permission to save
your dictionary to your computer. save.location = "envir"
will
always return your dictionary as a vector object to R
's
global environment
To save your dictionary file, you can either:
Manually save:
Use saveRDS and save using the "*.dictionary"
suffix
save.location = "choose"
:
A file explorer menu will pop-up and a directory can be manually selected
save.location = "path"
:
The file will automatically be saved to the directory you provide
Note that save.location = "choose"
and save.location = "path"
will
automatically update your dictionary if there is a file with the same name enter
into the dictionary.name
argument.
To find where your dictionaries are stored, use the
find.dictionaries
function.
These dictionaries are only stored on
your private computer and must either be publicly shared or
transferred to other computers in order to use them elsewhere.
If you would like to share a dictionary for others to use, then please submit
a pull request or post an issue with your dictionary on my GitHub:
AlexChristensen/SemNetDictionaries.
Alexander Christensen <[email protected]>
find.dictionaries
to find where dictionaries are stored,
dictionaries
to identify dictionaries in
SemNetDictionaries
# Create a dictionary new.dictionary <- append.dictionary(c("words","are","fun"), save.location = "envir")
# Create a dictionary new.dictionary <- append.dictionary(c("words","are","fun"), save.location = "envir")
A database to convert between British and US spellings (n = 780)
data(brit2us)
data(brit2us)
brit2us (list, length = 780)
data("brit2us")
data("brit2us")
A general dictionary of over 80,000 words from the Corpus of Contemporary American English derived from https://www.wordfrequency.info/samples.asp.
data(coca.dictionary)
data(coca.dictionary)
coca.dictionary (vector, length = 80381)
To add additional words to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("coca.dictionary")
data("coca.dictionary")
A database of word forms for the Corpus of Contemporary American English dictionary
data(coca.moniker)
data(coca.moniker)
coca.moniker (list, length = 20267)
To add additional COCA monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("coca.moniker")
data("coca.moniker")
A general dictionary of over 109,000 words from the Corpus of Contemporary American English dictionary
(coca.dictionary
) and Hunspell dictionary (hunspell.dictionary
).
data(cocaspell.dictionary)
data(cocaspell.dictionary)
cocaspell.dictionary (vector, length = 109169)
To add additional words to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("cocaspell.dictionary")
data("cocaspell.dictionary")
A database of word forms for the Corpus of Contemporary American English and Hunspell dictionaries
data(cocaspell.moniker)
data(cocaspell.moniker)
cocaspell.moniker (list, length = 29610)
To add additional COCA and Hunspell monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("cocaspell.moniker")
data("cocaspell.moniker")
A wrapper function to identify all dictionaries included in
SemNetDictionaries
dictionaries(quiet)
dictionaries(quiet)
quiet |
Boolean.
Determines whether the return should be quiet (does not print dictionaries).
Defaults to |
Returns the names of dictionaries in SemNetDictionaries
Alexander Christensen <[email protected]>
find.dictionaries
to find where dictionaries are stored,
append.dictionary
to create a new dictionary
# List names of dictionaries in 'SemNetDictionaries' dictionaries()
# List names of dictionaries in 'SemNetDictionaries' dictionaries()
A wrapper function to identify the save location
of appendix dictionaries from append.dictionary
find.dictionaries(..., add.path = NULL)
find.dictionaries(..., add.path = NULL)
... |
Vector.
Appendix dictionary files names (if they are known).
If left empty, the function will search across
all files for files in folders on your desktop
that end in |
add.path |
Character.
Path to additional dictionaries to be found.
DOES NOT search recursively (through all folders in path)
to avoid time intensive search.
Set to |
names |
Returns the names of the appendix dictionary file(s) found on your computer |
files |
Returns the dictionary file(s) that are stored in each given path. If there is no output
(e.g., |
Alexander Christensen <[email protected]>
append.dictionary
to create a new dictionary,
dictionaries
to identify dictionaries in
SemNetDictionaries
, and
load.dictionaries
to load multiple dictionaries
# Make a dictionary example.dictionary <- append.dictionary(c("words","are","fun"), save.location = "envir") # Dictionary can now be found find.dictionaries("example") # No appendix dictionaries found find.dictionaries() # For your computer's timing to complete search t0 <- Sys.time() find.dictionaries() Sys.time() - t0
# Make a dictionary example.dictionary <- append.dictionary(c("words","are","fun"), save.location = "envir") # Dictionary can now be found find.dictionaries("example") # No appendix dictionaries found find.dictionaries() # For your computer's timing to complete search t0 <- Sys.time() find.dictionaries() Sys.time() - t0
A database of possible fruits responses (n = 488)
data(fruits.dictionary)
data(fruits.dictionary)
fruits.dictionary (vector, length = 488)
To add additional fruits to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("fruits.dictionary")
data("fruits.dictionary")
A database of possible fruits monikers and common spelling errors
data(fruits.moniker)
data(fruits.moniker)
fruits.moniker (list, length = 39)
To add additional fruits monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("fruits.moniker")
data("fruits.moniker")
A general dictionary of over 370,000 words (n = 370,103) derived from https://github.com/dwyl/english-words. All punctuation have been removed.
data(general.dictionary)
data(general.dictionary)
general.dictionary (vector, length = 370103)
To add additional words to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("general.dictionary")
data("general.dictionary")
A database of possible good synonym responses (n = 284)
To add additional good synonyms to the dictionary, please make an
appendix dictionary (append.dictionary
)
data(good.dictionary)
data(good.dictionary)
good.dictionary (vector, length = 284)
data("good.dictionary")
data("good.dictionary")
A database of possible good monikers and common spelling errors
data(good.moniker)
data(good.moniker)
good.moniker (list, length = 4)
To add additional good monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("good.moniker")
data("good.moniker")
A database of possible hot synonym responses (n = 281)
To add additional hot synonyms to the dictionary, please make an
appendix dictionary (append.dictionary
)
data(hot.dictionary)
data(hot.dictionary)
hot.dictionary (vector, length = 281)
data("hot.dictionary")
data("hot.dictionary")
A database of possible hot monikers and common spelling errors
data(hot.moniker)
data(hot.moniker)
hot.moniker (list, length = 15)
To add additional hot monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("hot.moniker")
data("hot.moniker")
A general dictionary of over 62,000 words from the hunspell dictionary derived from http://wordlist.aspell.net/dicts/.
data(hunspell.dictionary)
data(hunspell.dictionary)
hunspell.dictionary (vector, length = 62893)
To add additional words to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("hunspell.dictionary")
data("hunspell.dictionary")
A database of possible jobs and related words (n = 1471)
data(jobs.dictionary)
data(jobs.dictionary)
jobs.dictionary (vector, length = 1471)
To add additional jobs to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("jobs.dictionary")
data("jobs.dictionary")
A database of possible jobs monikers and common spelling errors
data(jobs.moniker)
data(jobs.moniker)
jobs.moniker (list, length = 117)
To add additional jobs monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("jobs.moniker")
data("jobs.moniker")
A wrapper function to load dictionaries into
the 'SemNetCleaner' package. Searches for dictionaries in R
's global
environment, the SemNetDictionaries
package, and on your computer.
Outputs a unique word list that is combined from all dictionaries entered
in the dictionary
argument
load.dictionaries(..., add.path = NULL)
load.dictionaries(..., add.path = NULL)
... |
Character. Dictionaries to load Dictionaries in your global environment
MUST be objects called |
add.path |
Character.
Path to additional dictionaries to be found.
DOES NOT search recursively (through all folders in path)
to avoid time intensive search.
Set to
|
Returns a vector of unique words that have been combined and alphabetized from the specified dictionaries
Alexander Christensen <[email protected]>
# Find dictionaries to load dictionaries() # Load "animals" dictionary load.dictionaries("animals") # Create a dictionary new.dictionary <- append.dictionary("words", "are", "fun") # Load created dictionary load.dictionaries("new") # Load animals and new dictionary load.dictionaries("animals", "new") # Single letter dictionary load.dictionaries("d") # Multiple letters dictionary load.dictionaries("a", "d") # Category and letters dictionary load.dictionaries("animals", "a")
# Find dictionaries to load dictionaries() # Load "animals" dictionary load.dictionaries("animals") # Create a dictionary new.dictionary <- append.dictionary("words", "are", "fun") # Load created dictionary load.dictionaries("new") # Load animals and new dictionary load.dictionaries("animals", "new") # Single letter dictionary load.dictionaries("d") # Multiple letters dictionary load.dictionaries("a", "d") # Category and letters dictionary load.dictionaries("animals", "a")
A wrapper function to load monikers into
the 'SemNetCleaner' package. Searches for monikers in R
's
SemNetDictionaries
package. Outputs a unique word list
that is combined from all dictionaries entered in the moniker
argument
load.monikers(moniker, vector = TRUE)
load.monikers(moniker, vector = TRUE)
moniker |
Character vector.
monikers to load (must be a dictionary in
|
vector |
Boolean.
Should output be a vector? If |
Returns a vector of unique words that have been combined and alphabetized from the specified monikers
Alexander Christensen <[email protected]>
#find dictionaries to load dictionaries() #load "animals" monikers load.monikers("animals")
#find dictionaries to load dictionaries() #load "animals" monikers load.monikers("animals")
A general dictionary of 10,000 of the most common U.S. English words derived from https://github.com/first20hours/google-10000-english.
data(most_common.dictionary)
data(most_common.dictionary)
most_common.dictionary (vector, length = 9329)
To add additional words to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("most_common.dictionary")
data("most_common.dictionary")
An interactive Shiny application for playing https://www.powerlanguage.co.uk/wordle/
ShinyWoRdle()
ShinyWoRdle()
if(interactive()) {ShinyWoRdle()}
if(interactive()) {ShinyWoRdle()}
A selection of stop words that can be removed from semantic responses (n = 56)
data(stop_words.dictionary)
data(stop_words.dictionary)
stop_words.dictionary (vector, length = 56)
To add additional animals to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("stop_words.dictionary")
data("stop_words.dictionary")
A database of possible vegetables responses (n = 284)
data(vegetables.dictionary)
data(vegetables.dictionary)
vegetables.dictionary (vector, length = 284)
To add additional vegetables to the dictionary, please make an
appendix dictionary (append.dictionary
)
data("vegetables.dictionary")
data("vegetables.dictionary")
A database of possible vegetables monikers and common spelling errors
data(vegetables.moniker)
data(vegetables.moniker)
vegetables.moniker (list, length = 35)
To add additional vegetables monikers to the database, please submit a pull request or issue to https://github.com/AlexChristensen/SemNetDictionaries
data("vegetables.moniker")
data("vegetables.moniker")