Package: text 1.2.17

Oscar Kjell

text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Authors:Oscar Kjell [aut, cre], Salvatore Giorgi [aut], Andrew Schwartz [aut]

text_1.2.17.tar.gz
text_1.2.17.zip(r-4.5)text_1.2.17.zip(r-4.4)text_1.2.17.zip(r-4.3)
text_1.2.17.tgz(r-4.4-any)text_1.2.17.tgz(r-4.3-any)
text_1.2.17.tar.gz(r-4.5-noble)text_1.2.17.tar.gz(r-4.4-noble)
text_1.2.17.tgz(r-4.4-emscripten)text_1.2.17.tgz(r-4.3-emscripten)
text.pdf |text.html
text/json (API)
NEWS

# Install 'text' in R:
install.packages('text', repos = c('https://oscarkjell.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/oscarkjell/text/issues

Datasets:

On CRAN:

deep-learningmachine-learningnlptransformers

12.91 score 138 stars 1 packages 390 scripts 2.6k downloads 9 mentions 58 exports 108 dependencies

Last updated 2 days agofrom:cbfe7f206f. Checks:OK: 1 ERROR: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 21 2024
R-4.5-winERRORNov 21 2024
R-4.5-linuxERRORNov 21 2024
R-4.4-winERRORNov 21 2024
R-4.4-macERRORNov 21 2024
R-4.3-winERRORNov 21 2024
R-4.3-macERRORNov 21 2024

Exports:find_textrpp_envtextAssesstextCentralitytextCentralityPlottextClassifytextCleanNonASCIItextDescriptivestextDimNametextDistancetextDistanceMatrixtextDistanceNormtextDomainComparetextEmbedtextEmbedLayerAggregationtextEmbedRawLayerstextEmbedReducetextEmbedStatictextFindNonASCIItextFineTuneDomaintextFineTuneTasktextGenerationtextLBAMtextModelLayerstextModelstextModelsRemovetextNERtextPCAtextPCAPlottextPlottextPredicttextPredictAlltextPredictTesttextProjectiontextProjectionPlottextQAtextrpp_initializetextrpp_installtextrpp_install_virtualenvtextrpp_uninstalltextSimilaritytextSimilarityMatrixtextSimilarityNormtextSumtextTokenizetextTokenizeAndCounttextTopicstextTopicsReducetextTopicsTesttextTopicsTreetextTopicsWordcloudtextTraintextTrainListstextTrainNtextTrainNPlottextTrainRandomForesttextTrainRegressiontextTranslatetextZeroShot

Dependencies:briocallrclasscliclockcodetoolscolorspacecowplotcpp11crayondata.tabledescdiagramdialsDiceDesigndiffobjdigestdoFuturedplyrevaluatefansifarverforeachfsfurrrfuturefuture.applygenericsggplot2ggrepelglobalsgluegowerGPfitgtablehardhathereipredisobanditeratorsjsonliteKernSmoothlabelinglatticelavalhslifecyclelistenvlubridatemagrittrMASSMatrixmgcvmodelenvmunsellnlmennetnumDerivoverlappingparallellyparsnippillarpkgbuildpkgconfigpkgloadpngpraiseprettyunitsprocessxprodlimprogressrpspurrrR6rappdirsRColorBrewerRcppRcppTOMLrecipesreticulaterlangrpartrprojrootrsamplescalessfdshapesliderSQUAREMstringistringrsurvivaltestthattibbletidyrtidyselecttimechangetimeDatetunetzdbutf8vctrsviridisLitewaldowarpwithrworkflowsyardstick

How to best manage computationally heavy analyses

Rendered fromhuggingface_in_r_and_computer_capacity.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2022-07-13
Started: 2022-07-13

Extended Installation Guide

Rendered fromhuggingface_in_r_extended_installation_guide.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2024-11-06
Started: 2022-07-13

The Language-Based Assessment Model (L-BAM) Library

Rendered fromLBAM.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2024-11-17
Started: 2024-11-01

Pre-registration and Researcher Degrees of Freedom

Rendered frompre_registration_and_transformers.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2024-07-29
Started: 2022-12-01

Psychological Methods: the Text Tutorial

Rendered frompsychological_methods.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2022-12-01
Started: 2022-12-01

HuggingFace language models are downloaded in .cache

Rendered fromremoving_huggingface_transformers_cache_files.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2022-07-13
Started: 2022-07-13

Creating a Singularity Container to Run HuggingFace Transformers Models in R

Rendered fromsingularity_transformers_container.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2023-01-06
Started: 2022-07-13

Getting started

Rendered fromtext.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2024-02-14
Started: 2019-12-30

HuggingFace Transformers in R: Word Embeddings Defaults and Specifications

Rendered fromhuggingface_in_r.Rmdusingknitr::rmarkdownon Nov 21 2024.

Last update: 2024-07-29
Started: 2022-07-13

Readme and manuals

Help Manual

Help pageTopics
Example data for plotting a Semantic Centrality Plot.centrality_data_harmony
Data for plotting a Dot Product Projection Plot.DP_projections_HILS_SWLS_100
Example text and numeric data.Language_based_assessment_data_3_100
Text and numeric data for 10 participants.Language_based_assessment_data_8
Example data for plotting a Principle Component Projection Plot.PC_projections_satisfactionwords_40
Word embeddings from textEmbedRawLayers functionraw_embeddings_1
Semantic similarity score between single words' and an aggregated word embeddingstextCentrality
Plots words from textCentrality()textCentralityPlot
Clean non-ASCII characterstextCleanNonASCII
Compute descriptive statistics of character variables.textDescriptives
Change dimension namestextDimName
Semantic distancetextDistance
Semantic distance across multiple word embeddingstextDistanceMatrix
Semantic distance between a text variable and a word normtextDistanceNorm
Compare two language domainstextDomainCompare
Embed texttextEmbed
Aggregate layerstextEmbedLayerAggregation
Extract layers of hidden statestextEmbedRawLayers
Pre-trained dimension reduction (experimental)textEmbedReduce
Apply static word embeddingstextEmbedStatic
Detect non-ASCII characterstextFindNonASCII
Domain Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneDomain
Task Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneTask
Text generationtextGeneration
The LBAM librarytextLBAM
Number of layerstextModelLayers
Check downloaded, available models.textModels
Delete a specified modeltextModelsRemove
Named Entity Recognition. (experimental)textNER
textPCA()textPCA
textPCAPlottextPCAPlot
Plot wordstextPlot
textPredict, textAssess and textClassifytextAssess textClassify textPredict
Predict from several models, selecting the correct inputtextPredictAll
Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.textPredictTest
Supervised Dimension ProjectiontextProjection
Plot Supervised Dimension ProjectiontextProjectionPlot
Question Answering. (experimental)textQA
Initialize text required python packagestextrpp_initialize
Install text required python packages in conda or virtualenv environmenttextrpp_install textrpp_install_virtualenv
Uninstall textrpp conda environmenttextrpp_uninstall
Semantic SimilaritytextSimilarity
Semantic similarity across multiple word embeddingstextSimilarityMatrix
Semantic similarity between a text variable and a word normtextSimilarityNorm
Summarize texts. (experimental)textSum
Tokenize text-variablestextTokenize
Tokenize and counttextTokenizeAndCount
BERTopicstextTopics
textTopicsReduce (EXPERIMENTAL)textTopicsReduce
This function tests the relationship between a single topic or all topics and a variable of interest. Available tests include correlation, t-test, linear regression, binary regression, and ridge regression. (EXPERIMENTAL - under development)textTopicsTest
textTopicsTest (EXPERIMENTAL) to get the hierarchical topic treetextTopicsTree
Plots wordcloud (experimental)textTopicsWordcloud
Trains word embeddingstextTrain
Train lists of word embeddingstextTrainLists
Cross-validated accuracies across sample-sizestextTrainN
Plot cross-validated accuracies across sample sizestextTrainNPlot
Trains word embeddings usig random foresttextTrainRandomForest
Train word embeddings to a numeric variable.textTrainRegression
Translation. (experimental)textTranslate
Zero Shot Classification (Experimental)textZeroShot
Word embeddings for 4 text variables for 40 participantsword_embeddings_4