Package: text 1.2.3

Oscar Kjell

text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Authors:Oscar Kjell [aut, cre], Salvatore Giorgi [aut], Andrew Schwartz [aut]

text_1.2.3.tar.gz
text_1.2.3.zip(r-4.5)text_1.2.3.zip(r-4.4)text_1.2.3.zip(r-4.3)
text_1.2.3.tgz(r-4.4-any)text_1.2.3.tgz(r-4.3-any)
text_1.2.3.tar.gz(r-4.5-noble)text_1.2.3.tar.gz(r-4.4-noble)
text_1.2.3.tgz(r-4.4-emscripten)text_1.2.3.tgz(r-4.3-emscripten)
text.pdf |text.html
text/json (API)
NEWS

# Install 'text' in R:
install.packages('text', repos = c('https://oscarkjell.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/oscarkjell/text/issues

Datasets:

On CRAN:

deep-learningmachine-learningnlptransformers

54 exports 130 stars 5.45 score 109 dependencies 1 dependents 9 mentions 350 scripts 1.9k downloads

Last updated 18 days agofrom:a032855707. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKAug 31 2024
R-4.5-winNOTEAug 31 2024
R-4.5-linuxNOTEAug 31 2024
R-4.4-winNOTEAug 31 2024
R-4.4-macNOTEAug 31 2024
R-4.3-winNOTEAug 31 2024
R-4.3-macNOTEAug 31 2024

Exports:find_textrppfind_textrpp_envtextCentralitytextCentralityPlottextClassifytextDescriptivestextDimNametextDistancetextDistanceMatrixtextDistanceNormtextEmbedtextEmbedLayerAggregationtextEmbedRawLayerstextEmbedReducetextEmbedStatictextFineTuneDomaintextFineTuneTasktextGenerationtextModelLayerstextModelstextModelsRemovetextNERtextPCAtextPCAPlottextPlottextPredicttextPredictAlltextPredictTesttextProjectiontextProjectionPlottextQAtextrpp_initializetextrpp_installtextrpp_install_virtualenvtextrpp_uninstalltextSimilaritytextSimilarityMatrixtextSimilarityNormtextSumtextTokenizetextTopicstextTopicsReducetextTopicsTesttextTopicsTreetextTopicsWordcloudtextTraintextTrainListstextTrainNtextTrainNPlottextTrainRandomForesttextTrainRegressiontextTranslatetextWordPredictiontextZeroShot

Dependencies:briocallrclasscliclockcodetoolscolorspacecowplotcpp11crayondata.tabledescdiagramdialsDiceDesigndiffobjdigestdoFuturedplyrevaluatefansifarverforeachfsfurrrfuturefuture.applygenericsggplot2ggrepelglobalsgluegowerGPfitgtablehardhathereipredisobanditeratorsjsonliteKernSmoothlabelinglatticelavalhslifecyclelistenvlubridatemagrittrMASSMatrixmgcvmodelenvmunsellnlmennetnumDerivoverlappingparallellyparsnippillarpkgbuildpkgconfigpkgloadpngpraiseprettyunitsprocessxprodlimprogressrpspurrrR6rappdirsRColorBrewerRcppRcppTOMLrecipesrematch2reticulaterlangrpartrprojrootrsamplescalessfdshapesliderSQUAREMstringistringrsurvivaltestthattibbletidyrtidyselecttimechangetimeDatetunetzdbutf8vctrsviridisLitewaldowarpwithrworkflowsyardstick

How to best manage computationally heavy analyses

Rendered fromhuggingface_in_r_and_computer_capacity.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2022-07-13
Started: 2022-07-13

Extended Installation Guide

Rendered fromhuggingface_in_r_extended_installation_guide.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2023-09-22
Started: 2022-07-13

Pre-registration and Researcher Degrees of Freedom

Rendered frompre_registration_and_transformers.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2024-07-29
Started: 2022-12-01

Pre-Trained Models

Rendered frompre_trained_models.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2024-04-22
Started: 2024-04-18

Psychological Methods: the Text Tutorial

Rendered frompsychological_methods.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2022-12-01
Started: 2022-12-01

HuggingFace language models are downloaded in .cache

Rendered fromremoving_huggingface_transformers_cache_files.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2022-07-13
Started: 2022-07-13

Creating a Singularity Container to Run HuggingFace Transformers Models in R

Rendered fromsingularity_transformers_container.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2023-01-06
Started: 2022-07-13

Getting started

Rendered fromtext.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2024-02-14
Started: 2019-12-30

HuggingFace Transformers in R: Word Embeddings Defaults and Specifications

Rendered fromhuggingface_in_r.Rmdusingknitr::rmarkdownon Aug 31 2024.

Last update: 2024-07-29
Started: 2022-07-13

Readme and manuals

Help Manual

Help pageTopics
Example data for plotting a Semantic Centrality Plot.centrality_data_harmony
Data for plotting a Dot Product Projection Plot.DP_projections_HILS_SWLS_100
Example text and numeric data.Language_based_assessment_data_3_100
Text and numeric data for 10 participants.Language_based_assessment_data_8
Example data for plotting a Principle Component Projection Plot.PC_projections_satisfactionwords_40
Word embeddings from textEmbedRawLayers functionraw_embeddings_1
Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.textCentrality
Plot words according to semantic similarity to the aggregated word embedding.textCentralityPlot
Predict label and probability of a text using a pretrained classifier language model. (experimental)textClassify
Compute descriptive statistics of character variables.textDescriptives
Change the names of the dimensions in the word embeddings.textDimName
Compute the semantic distance between two text variables.textDistance
Compute semantic distance scores between all combinations in a word embeddingtextDistanceMatrix
Compute the semantic distance between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct/concept).textDistanceNorm
Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe.textEmbed
Select and aggregate layers of hidden states to form a word embedding.textEmbedLayerAggregation
Extract layers of hidden states (word embeddings) for all character variables in a given dataframe.textEmbedRawLayers
Pre-trained dimension reduction (experimental)textEmbedReduce
Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variablestextEmbedStatic
Domain Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneDomain
Task Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneTask
Predicts the words that will follow a specified text prompt. (experimental)textGeneration
Get the number of layers in a given model.textModelLayers
Check downloaded, available models.textModels
Delete a specified model and model associated files.textModelsRemove
Named Entity Recognition. (experimental)textNER
Compute 2 PCA dimensions of the word embeddings for individual words.textPCA
Plot words according to 2-D plot from 2 PCA components.textPCAPlot
Plot words from textProjection() or textWordPrediction().textPlot
Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.textPredict
Predict from several models, selecting the correct inputtextPredictAll
Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.textPredictTest
Compute Supervised Dimension Projection and related variables for plotting words.textProjection
Plot words according to Supervised Dimension Projection.textProjectionPlot
Question Answering. (experimental)textQA
Initialize text required python packagestextrpp_initialize
Install text required python packages in conda or virtualenv environmenttextrpp_install textrpp_install_virtualenv
Uninstall textrpp conda environmenttextrpp_uninstall
Compute the semantic similarity between two text variables.textSimilarity
Compute semantic similarity scores between all combinations in a word embeddingtextSimilarityMatrix
Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct).textSimilarityNorm
Summarize texts. (experimental)textSum
Tokenize according to different huggingface transformerstextTokenize
This function creates and trains a BERTopic model (based on bertopic python packaged) on a text-variable in a tibble/data.frame. (EXPERIMENTAL)textTopics
textTopicsReduce (EXPERIMENTAL)textTopicsReduce
This function tests the relationship between a single topic or all topics and a variable of interest. Available tests include correlation, t-test, linear regression, binary regression, and ridge regression. (EXPERIMENTAL - under development)textTopicsTest
textTopicsTest (EXPERIMENTAL) to get the hierarchical topic treetextTopicsTree
This functions plots wordclouds of topics from a Topic Model based on their significance determined by a linear or binary regressiontextTopicsWordcloud
Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.textTrain
Individually trains word embeddings from several text variables to several numeric or categorical variables.textTrainLists
(experimental) Compute cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation.textTrainN
(experimental) Plot cross-validated correlation coefficients across different sample-sizes from the object returned by the textTrainN function. If the number of cross-validations exceed one, then error-bars will be included in the plot.textTrainNPlot
Train word embeddings to a categorical variable using random forest.textTrainRandomForest
Train word embeddings to a numeric variable.textTrainRegression
Translation. (experimental)textTranslate
Compute predictions based on single words for plotting words. The word embeddings of single words are trained to predict the mean value associated with that word. P-values does NOT work yet (experimental).textWordPrediction
Zero Shot Classification (Experimental)textZeroShot
Word embeddings for 4 text variables for 40 participantsword_embeddings_4