Package: text 1.4.1

Oscar Kjell

text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Authors:Oscar Kjell [aut, cre], Salvatore Giorgi [aut], Andrew Schwartz [aut]

text_1.4.1.tar.gz
text_1.4.1.zip(r-4.5)text_1.4.1.zip(r-4.4)text_1.4.1.zip(r-4.3)
text_1.4.1.tgz(r-4.5-any)text_1.4.1.tgz(r-4.4-any)text_1.4.1.tgz(r-4.3-any)
text_1.4.1.tar.gz(r-4.5-noble)text_1.4.1.tar.gz(r-4.4-noble)
text_1.4.1.tgz(r-4.4-emscripten)text_1.4.1.tgz(r-4.3-emscripten)
text.pdf |text.html✨
text/json (API)
NEWS

# Install 'text' in R:

install.packages('text', repos = c('https://oscarkjell.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/oscarkjell/text/issues

Pkgdown site:https://r-text.org

Uses libs:

openjdk– OpenJDK Java runtime, using Hotspot JIT

Datasets:

DP_projections_HILS_SWLS_100 - Data for plotting a Dot Product Projection Plot.
Language_based_assessment_data_3_100 - Example text and numeric data.
Language_based_assessment_data_8 - Text and numeric data for 10 participants.
PC_projections_satisfactionwords_40 - Example data for plotting a Principle Component Projection Plot.
centrality_data_harmony - Example data for plotting a Semantic Centrality Plot.
raw_embeddings_1 - Word embeddings from textEmbedRawLayers function
word_embeddings_4 - Word embeddings for 4 text variables for 40 participants

On CRAN:

deep-learning machine-learning nlp transformers openjdk

13.21 score 145 stars 1 packages 436 scripts 3.7k downloads 9 mentions 62 exports 131 dependencies

Last updated 5 days agofrom:af1b92d3f2. Checks:4 OK, 5 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 22 2025
R-4.5-win	OK	Mar 22 2025
R-4.5-mac	OK	Mar 22 2025
R-4.5-linux	OK	Mar 22 2025
R-4.4-win	NOTE	Mar 22 2025
R-4.4-mac	NOTE	Mar 22 2025
R-4.4-linux	NOTE	Mar 22 2025
R-4.3-win	NOTE	Mar 22 2025
R-4.3-mac	NOTE	Mar 22 2025

Exports:find_textrpp_env textAssess textCentrality textCentralityPlot textClassify textClean textCleanNonASCII textCleanNonASCIIinfo textDescriptives textDimName textDistance textDistanceMatrix textDistanceNorm textDomainCompare textEmbed textEmbedLayerAggregation textEmbedRawLayers textEmbedReduce textEmbedStatic textFindNonASCII textFineTuneDomain textFineTuneTask textGeneration textLBAM textModelLayers textModels textModelsRemove textNER textPCA textPCAPlot textPlot textPredict textPredictAll textPredictExamples textPredictTest textProjection textProjectionPlot textQA textrpp_initialize textrpp_install textrpp_install_virtualenv textrpp_uninstall textSimilarity textSimilarityMatrix textSimilarityNorm textSum textTokenize textTokenizeAndCount textTopics textTopicsReduce textTopicsTest textTopicsTree textTopicsWordcloud textTrain textTrainExamples textTrainLists textTrainN textTrainNPlot textTrainRandomForest textTrainRegression textTranslate textZeroShot

Dependencies:backports bit bit64 checkmate class cli clipr clock codetools colorspace commonmark cowplot cpp11 crayon curl data.table diagram dials DiceDesign digest doFuture dplyr fansi farver float foreach furrr future future.apply generics ggplot2 ggrepel ggwordcloud globals glue gower GPfit gridtext gtable gtools hardhat here hms ipred isoband ISOcodes iterators jpeg jsonlite KernSmooth labeling lattice lava lgr lhs lifecycle listenv lubridate magrittr mallet markdown MASS Matrix MatrixExtra mgcv mlapi modelenv munsell ngram nlme nnet numDeriv parallelly parsnip pillar pkgconfig png prettyunits prodlim progress progressr purrr R6 rappdirs RColorBrewer Rcpp RcppArmadillo RcppEigen RcppProgress RcppTOML readr recipes reticulate RhpcBLASctl rJava rlang rpart rprojroot rsample rsparse RSpectra scales sfd shape slider sparsevctrs SQUAREM stopwords stringi stringr survival text2vec textmineR tibble tidyr tidyselect timechange timeDate topics tune tzdb utf8 vctrs viridisLite vroom warp withr workflows xfun xml2 yardstick

How to best manage computationally heavy analyses

Rendered fromhuggingface_in_r_and_computer_capacity.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2022-07-13
Started: 2022-07-13

Extended Installation Guide

Rendered fromhuggingface_in_r_extended_installation_guide.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2024-11-06
Started: 2022-07-13

Implicit Motives Tutorial

Rendered fromimplicit_motives_tutorial.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2025-02-17
Started: 2024-11-25

The Language-Based Assessment Model (L-BAM) Library

Rendered fromLBAM.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2025-03-07
Started: 2024-11-01

L-BAM Tutorial

Rendered fromlbam_tutorial.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2024-11-26
Started: 2024-11-25

Pre-registration and Researcher Degrees of Freedom

Rendered frompre_registration_and_transformers.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2024-07-29
Started: 2022-12-01

Psychological Methods: the Text Tutorial

Rendered frompsychological_methods.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2022-12-01
Started: 2022-12-01

HuggingFace language models are downloaded in .cache

Rendered fromremoving_huggingface_transformers_cache_files.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2022-07-13
Started: 2022-07-13

Creating a Singularity Container to Run HuggingFace Transformers Models in R

Rendered fromsingularity_transformers_container.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2023-01-06
Started: 2022-07-13

Getting started

Rendered fromtext.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2025-03-10
Started: 2019-12-30

HuggingFace Transformers in R: Word Embeddings Defaults and Specifications

Rendered fromhuggingface_in_r.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2025-03-10
Started: 2022-07-13

Citation

Development and contributors

Readme and manuals

Help Manual

Help page	Topics
Example data for plotting a Semantic Centrality Plot.	centrality_data_harmony
Data for plotting a Dot Product Projection Plot.	DP_projections_HILS_SWLS_100
Example text and numeric data.	Language_based_assessment_data_3_100
Text and numeric data for 10 participants.	Language_based_assessment_data_8
Example data for plotting a Principle Component Projection Plot.	PC_projections_satisfactionwords_40
Word embeddings from textEmbedRawLayers function	raw_embeddings_1
Semantic similarity score between single words' and an aggregated word embeddings	textCentrality
Plots words from textCentrality()	textCentralityPlot
Cleans text from standard personal information	textClean
Clean non-ASCII characters	textCleanNonASCII
Clean non-ASCII characters	textCleanNonASCIIinfo
Compute descriptive statistics of character variables.	textDescriptives
Change dimension names	textDimName
Semantic distance	textDistance
Semantic distance across multiple word embeddings	textDistanceMatrix
Semantic distance between a text variable and a word norm	textDistanceNorm
Compare two language domains	textDomainCompare
textEmbed() extracts layers and aggregate them to word embeddings, for all character variables in a given dataframe.	textEmbed
Aggregate layers	textEmbedLayerAggregation
Extract layers of hidden states	textEmbedRawLayers
Pre-trained dimension reduction (experimental)	textEmbedReduce
Apply static word embeddings	textEmbedStatic
Detect non-ASCII characters	textFindNonASCII
Domain Adapted Pre-Training (EXPERIMENTAL - under development)	textFineTuneDomain
Task Adapted Pre-Training (EXPERIMENTAL - under development)	textFineTuneTask
Text generation	textGeneration
The LBAM library	textLBAM
Number of layers	textModelLayers
Check downloaded, available models.	textModels
Delete a specified model	textModelsRemove
Named Entity Recognition. (experimental)	textNER
textPCA()	textPCA
textPCAPlot	textPCAPlot
Plot words	textPlot
textPredict, textAssess and textClassify	textAssess textClassify textPredict
Predict from several models, selecting the correct input	textPredictAll
Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.	textPredictTest
Supervised Dimension Projection	textProjection
Plot Supervised Dimension Projection	textProjectionPlot
Question Answering. (experimental)	textQA
Initialize text required python packages	textrpp_initialize
Install text required python packages in conda or virtualenv environment	textrpp_install textrpp_install_virtualenv
Uninstall textrpp conda environment	textrpp_uninstall
Semantic Similarity	textSimilarity
Semantic similarity across multiple word embeddings	textSimilarityMatrix
Semantic similarity between a text variable and a word norm	textSimilarityNorm
Summarize texts. (experimental)	textSum
Tokenize text-variables	textTokenize
Tokenize and count	textTokenizeAndCount
BERTopics	textTopics
textTopicsReduce (EXPERIMENTAL)	textTopicsReduce
Wrapper for topicsTest function from the topics package	textTopicsTest
textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree	textTopicsTree
Plot word clouds	textTopicsWordcloud
Trains word embeddings	textTrain
Show language examples (Experimental)	textPredictExamples textTrainExamples
Train lists of word embeddings	textTrainLists
Cross-validated accuracies across sample-sizes	textTrainN
Plot cross-validated accuracies across sample sizes	textTrainNPlot
Trains word embeddings usig random forest	textTrainRandomForest
Train word embeddings to a numeric variable.	textTrainRegression
Translation. (experimental)	textTranslate
Zero Shot Classification (Experimental)	textZeroShot
Word embeddings for 4 text variables for 40 participants	word_embeddings_4