| Title: | Aligned Corpus Toolkit |
|---|---|
| Description: | The Aligned Corpus Toolkit (act) is designed for linguists that work with time aligned transcription data. It offers functions to import and export various annotation file formats ('ELAN' .eaf, 'EXMARaLDA .exb and 'Praat' .TextGrid files), create print transcripts in the style of conversation analysis, search transcripts (span searches across multiple annotations, search in normalized annotations, make concordances etc.), export and re-import search results (.csv and 'Excel' .xlsx format), create cuts for the search results (print transcripts, audio/video cuts using 'FFmpeg' and video sub titles in 'Subrib title' .srt format), modify the data in a corpus (search/replace, delete, filter etc.), interact with 'Praat' using 'Praat'-scripts, and exchange data with the 'rPraat' package. The package is itself written in R and may be expanded by other users. |
| Authors: | Oliver Ehmer [aut, cre] |
| Maintainer: | Oliver Ehmer <[email protected]> |
| License: | GPL-3 |
| Version: | 2.3.0 |
| Built: | 2026-05-25 10:22:20 UTC |
| Source: | https://github.com/oliverehmer/act |
The Aligned Corpus Toolkit (act) is designed for linguists that work with time aligned transcription data. It offers functions to import and export various annotation file formats ('ELAN' .eaf, 'EXMARaLDA .exb and 'Praat' .TextGrid files), create print transcripts in the style of conversation analysis, search transcripts (span searches across multiple annotations, search in normalized annotations, make concordances etc.), export and re-import search results (.csv and 'Excel' .xlsx format), create cuts for the search results (print transcripts, audio/video cuts using 'FFmpeg' and video sub titles in 'Subrib title' .srt format), modify the data in a corpus (search/replace, delete, filter etc.), interact with 'Praat' using 'Praat'-scripts, and exchange data with the 'rPraat' package. The package is itself written in R and may be expanded by other users.
...
The package has numerous options that change the internal workings of the package.
Please see act::options_show() and the information given there.
library(act) # ========== Example data set # There is an example data set consisting of annotation files and corresponding # media files. # While the annotation files are copied to your computer when installing # the media files are not. # You can either download the full data set from GitHub or decide to work # only with the annotation files. # The example data set (only annotation files) is stored at the following location: path <- system.file("extdata", "examplecorpus", package="act") # Since this folder is quite difficult to access, you might consider copying the # contents of this folder to a more convenient location. # The following commands will create a new folder called 'examplecorpus' in the # folder 'path'. # You will find the data there. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" sourcepath <- system.file("extdata", "examplecorpus", package="act") if (!dir.exists(path)) {dir.create(path)} file.copy(sourcepath, dirname(path), recursive=TRUE) ## End(Not run) # To download the full data set (including media files) from GitHub # use the following code. # In the first line specify an existing folder on your computer. # The following lines will then download the example data set from GitHub # and copy them to a sub folder called 'examplecorpus' in the folder 'path'. # You will find the data there. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" path <- "/Users/oliverehmer/Desktop" sourceurl <- "https://github.com/oliverehmer/act_examplecorpus/archive/master.zip" temp <- tempfile() download.file(sourceurl, temp) unzip(zipfile=temp, exdir=path) path <- file.path(path, "act_examplecorpus-main") ## End(Not run) # ========== Create a corpus object and load data # Now that we have the example data accessible, we can create a corpus object. # The corpus object is a structured collection of all the information that you can # work with using act. # It will contain the information of each transcript, links to media files and further # meta data. # --- Locate folder with annotation files # When creating a corpus object you will need to specify where your annotation # files ('Praat' '.TextGrids' or 'ELAN' .eaf) are located. # We will use the example data, that we have just located in 'path'. path # In case that you want to use your own data, you can set the path here: ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" ## End(Not run) # --- Create corpus object and load annotation files # The following command will create a corpus object, with the name 'examplecorpus'. examplecorpus <- act::corpus_new( pathsAnnotationFiles = path, pathsMediaFiles = path, name = "examplecorpus" ) # The act package assumes, that annotation files and media files have the same base # name and differ only in the suffix (e.g. 'filename.TextGrid' and 'filename.wav'/ # 'filename.mp4'). # This allows act to automatically link media files to the transcripts. # --- Information about your corpus # The following command will give you a summary of the data contained in your corpus object. examplecorpus # More detailed information about the transcripts in your corpus object is available by # calling the function act::info() act::info(examplecorpus) # If you are working in R studio, a nice way of inspecting this information is the following: ## Not run: View(act::info(examplecorpus)$transcripts) View(act::info(examplecorpus)$tiers) ## End(Not run) # ========== all data # You can also get all data that is in the loaded annotation files in a data frame: all_annotations <- act::annotations_all(examplecorpus) ## Not run: View(all_annotations) ## End(Not run) # ========== Search # Let's do some searches in the data. # Search for the 1. Person Singular Pronoun in Spanish 'yo' in the examplecorpus mysearch <- act::search_new(x=examplecorpus, pattern= "yo") # Have a look at the result: mysearch # Directly view all search results in the viewer ## Not run: View(mysearch@results) ## End(Not run) # --- Search original vs. normalized content # You can either search in the original 'content' of the annotations, # or you can search in a 'normalized' version of the annotations. # Let's compare the two modes. mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) # There is a difference in the number of results. [email protected] [email protected] # The difference is because during in the normalized version, for instance, capital letters # will be converted to small letters. # In our case, one annotation in the example corpus contains a "yO" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # During normalization a range of normalization procedures will be applied, using a # replacement matrix. This matrix searches and replaces certain patterns, that you want to # exclude from the normalized content. # By default, normalization gets rid of all transcription conventions of GAT. # You may, in addition, also customize the replacement matrix to your own needs/transcription # conventions. # --- Search original content vs. full text # There are two search modes. # The 'fulltext' mode will will find matches across annotations. # The 'content' mode will will respect the temporal boundaries of the original annotations. # Let's define a search pattern with a certain span. myRegEx <- "\\bno\\b.{1,20}pero" # This regular expression matches the Spanish word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. # The 'content' search mode will not find any hit. mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="content") [email protected] # The 'fulltext' search mode will not find two hits that extend over several annotations. mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") [email protected] cat(mysearch@results$hit[1]) cat(mysearch@results$hit[2])library(act) # ========== Example data set # There is an example data set consisting of annotation files and corresponding # media files. # While the annotation files are copied to your computer when installing # the media files are not. # You can either download the full data set from GitHub or decide to work # only with the annotation files. # The example data set (only annotation files) is stored at the following location: path <- system.file("extdata", "examplecorpus", package="act") # Since this folder is quite difficult to access, you might consider copying the # contents of this folder to a more convenient location. # The following commands will create a new folder called 'examplecorpus' in the # folder 'path'. # You will find the data there. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" sourcepath <- system.file("extdata", "examplecorpus", package="act") if (!dir.exists(path)) {dir.create(path)} file.copy(sourcepath, dirname(path), recursive=TRUE) ## End(Not run) # To download the full data set (including media files) from GitHub # use the following code. # In the first line specify an existing folder on your computer. # The following lines will then download the example data set from GitHub # and copy them to a sub folder called 'examplecorpus' in the folder 'path'. # You will find the data there. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" path <- "/Users/oliverehmer/Desktop" sourceurl <- "https://github.com/oliverehmer/act_examplecorpus/archive/master.zip" temp <- tempfile() download.file(sourceurl, temp) unzip(zipfile=temp, exdir=path) path <- file.path(path, "act_examplecorpus-main") ## End(Not run) # ========== Create a corpus object and load data # Now that we have the example data accessible, we can create a corpus object. # The corpus object is a structured collection of all the information that you can # work with using act. # It will contain the information of each transcript, links to media files and further # meta data. # --- Locate folder with annotation files # When creating a corpus object you will need to specify where your annotation # files ('Praat' '.TextGrids' or 'ELAN' .eaf) are located. # We will use the example data, that we have just located in 'path'. path # In case that you want to use your own data, you can set the path here: ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER" ## End(Not run) # --- Create corpus object and load annotation files # The following command will create a corpus object, with the name 'examplecorpus'. examplecorpus <- act::corpus_new( pathsAnnotationFiles = path, pathsMediaFiles = path, name = "examplecorpus" ) # The act package assumes, that annotation files and media files have the same base # name and differ only in the suffix (e.g. 'filename.TextGrid' and 'filename.wav'/ # 'filename.mp4'). # This allows act to automatically link media files to the transcripts. # --- Information about your corpus # The following command will give you a summary of the data contained in your corpus object. examplecorpus # More detailed information about the transcripts in your corpus object is available by # calling the function act::info() act::info(examplecorpus) # If you are working in R studio, a nice way of inspecting this information is the following: ## Not run: View(act::info(examplecorpus)$transcripts) View(act::info(examplecorpus)$tiers) ## End(Not run) # ========== all data # You can also get all data that is in the loaded annotation files in a data frame: all_annotations <- act::annotations_all(examplecorpus) ## Not run: View(all_annotations) ## End(Not run) # ========== Search # Let's do some searches in the data. # Search for the 1. Person Singular Pronoun in Spanish 'yo' in the examplecorpus mysearch <- act::search_new(x=examplecorpus, pattern= "yo") # Have a look at the result: mysearch # Directly view all search results in the viewer ## Not run: View(mysearch@results) ## End(Not run) # --- Search original vs. normalized content # You can either search in the original 'content' of the annotations, # or you can search in a 'normalized' version of the annotations. # Let's compare the two modes. mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) # There is a difference in the number of results. mysearch.norm@results.nr mysearch.org@results.nr # The difference is because during in the normalized version, for instance, capital letters # will be converted to small letters. # In our case, one annotation in the example corpus contains a "yO" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # During normalization a range of normalization procedures will be applied, using a # replacement matrix. This matrix searches and replaces certain patterns, that you want to # exclude from the normalized content. # By default, normalization gets rid of all transcription conventions of GAT. # You may, in addition, also customize the replacement matrix to your own needs/transcription # conventions. # --- Search original content vs. full text # There are two search modes. # The 'fulltext' mode will will find matches across annotations. # The 'content' mode will will respect the temporal boundaries of the original annotations. # Let's define a search pattern with a certain span. myRegEx <- "\\bno\\b.{1,20}pero" # This regular expression matches the Spanish word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. # The 'content' search mode will not find any hit. mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="content") mysearch@results.nr # The 'fulltext' search mode will not find two hits that extend over several annotations. mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch@results.nr cat(mysearch@results$hit[1]) cat(mysearch@results$hit[2])
Merges annotations from all transcripts in a corpus and returns a data frame.
annotations_all(x)annotations_all(x)
x |
Corpus object. |
data frame
library(act) #Get data frame with all annotations allannotations <- act::annotations_all(examplecorpus) #Have a look at the number of annotations nrow(allannotations)library(act) #Get data frame with all annotations allannotations <- act::annotations_all(examplecorpus) #Have a look at the number of annotations nrow(allannotations)
Delete annotations in a corpus object.
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames and filterTierNames.
In case that you want to select transcripts and/or tiers by using regular expressions use the function act::search_makefilter first.
annotations_delete( x, pattern = "", filterTranscriptNames = NULL, filterTierNames = NULL )annotations_delete( x, pattern = "", filterTranscriptNames = NULL, filterTierNames = NULL )
x |
Corpus object. |
pattern |
Character string; regular expression; all annotations that match this expression will be deleted. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
filterTierNames |
Character string; names of the tiers to be included. |
Corpus object.
library(act) # Set the regular expression which annotations should be deleted. # In this case: all annotations that contain the letter "a" myRegEx <- "a" # Have a look at all annotations in the first transcript examplecorpus@transcripts[[1]]@annotations$content # Some of them match to the regular expression hits <- grep(pattern=myRegEx, x=examplecorpus@transcripts[[1]]@annotations$content) examplecorpus@transcripts[[1]]@annotations$content[hits] # Others don't match the regular expression examplecorpus@transcripts[[1]]@annotations$content[-hits] # Run the function and delete the annotations that match the regular expression test <- act::annotations_delete (x=examplecorpus, pattern=myRegEx) # Compare how many data rows are in the first transcript in # the example corpus and in the newly created test corpus: nrow(examplecorpus@transcripts[[1]]@annotations) nrow(test@transcripts[[1]]@annotations) # Only the annotations are left, that did not match the regular expression: test@transcripts[[1]]@annotations$contentlibrary(act) # Set the regular expression which annotations should be deleted. # In this case: all annotations that contain the letter "a" myRegEx <- "a" # Have a look at all annotations in the first transcript examplecorpus@transcripts[[1]]@annotations$content # Some of them match to the regular expression hits <- grep(pattern=myRegEx, x=examplecorpus@transcripts[[1]]@annotations$content) examplecorpus@transcripts[[1]]@annotations$content[hits] # Others don't match the regular expression examplecorpus@transcripts[[1]]@annotations$content[-hits] # Run the function and delete the annotations that match the regular expression test <- act::annotations_delete (x=examplecorpus, pattern=myRegEx) # Compare how many data rows are in the first transcript in # the example corpus and in the newly created test corpus: nrow(examplecorpus@transcripts[[1]]@annotations) nrow(test@transcripts[[1]]@annotations) # Only the annotations are left, that did not match the regular expression: test@transcripts[[1]]@annotations$content
Delete empty annotations in a corpus object.
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames and filterTierNames.
In case that you want to select transcripts and/or tiers by using regular expressions use the function act::search_makefilter first.
annotations_delete_empty( x, trim = FALSE, filterTranscriptNames = NULL, filterTierNames = NULL )annotations_delete_empty( x, trim = FALSE, filterTranscriptNames = NULL, filterTierNames = NULL )
x |
Corpus object. |
trim |
Logical; if |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
filterTierNames |
Character string; names of the tiers to be included. |
Corpus object.
library(act) # In the example corpus are no empty annotations. # Empty annotations are deleted by default when annotation files are loaded. # So let's first make an empty annotation. # Check the first annotation in the first transcript examplecorpus@transcripts[[1]]@annotations$content[[1]] # Empty the contents of this annotation examplecorpus@transcripts[[1]]@annotations$content[[1]] <- "" # Run the function test <- act::annotations_delete_empty (x=examplecorpus) # Compare how many data rows are in the first transcript in # the example corpus and in the newly created test corpus: nrow(examplecorpus@transcripts[[1]]@annotations) nrow(test@transcripts[[1]]@annotations)library(act) # In the example corpus are no empty annotations. # Empty annotations are deleted by default when annotation files are loaded. # So let's first make an empty annotation. # Check the first annotation in the first transcript examplecorpus@transcripts[[1]]@annotations$content[[1]] # Empty the contents of this annotation examplecorpus@transcripts[[1]]@annotations$content[[1]] <- "" # Run the function test <- act::annotations_delete_empty (x=examplecorpus) # Compare how many data rows are in the first transcript in # the example corpus and in the newly created test corpus: nrow(examplecorpus@transcripts[[1]]@annotations) nrow(test@transcripts[[1]]@annotations)
The function will insert the results of a search as annotations into a specified destination tier. Results from different tiers will all be inserted into the same destination tier. It can be specified how overlapping search results or search results from the same annotation will be handled. Please note that hits with exactly the same start and end time will always be merged (e.g. they are not treated as overlapping).
annotations_insert_from_search_to_tier( x, s, destTier = "destinationTier", destTierAddMissing = TRUE, contentFromColname = "hit", interruptOverlap = FALSE, filterTranscriptNames = NULL, filterTierNames = NULL, collapseString = " | " )annotations_insert_from_search_to_tier( x, s, destTier = "destinationTier", destTierAddMissing = TRUE, contentFromColname = "hit", interruptOverlap = FALSE, filterTranscriptNames = NULL, filterTierNames = NULL, collapseString = " | " )
x |
Corpus object. |
s |
Search object. |
destTier |
Character string; name of the tier to which the hit should be copied (if no copying is intended set to NA). |
destTierAddMissing |
Logical; if |
contentFromColname |
Character string; names of the search results column from which the content of the new annotation shall be copied. Chose for example "resultID", “hit” or “content” or the name of any column that you have added to the results data frame. |
interruptOverlap |
Integer; How to proceed in case of overlapping search results: TRUE=insertion will be interrupted, FALSE=overlapping annotations will be merged. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included from search results. |
filterTierNames |
Character string; names of the tiers to be included form search results. |
collapseString |
Character string; will be used to collapse multiple search results into one string. |
Corpus object.
library(act) # Have a look at the first transcript in the example corpus: printtranscript <- act::export_txt(examplecorpus@transcripts[[1]]) cat(printtranscript) # In line 01 there is the word "UN". # Replace this word by "XXX" in the entire corpus test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="XXX") # Have a look at the first transcript in the corpus object test: printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 there is now "XXX" instead of "UN" # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check that the first transcript now contains the newTier examplecorpus@transcripts[[1]]@tiers # Now replace "UN" by "YYY" in the entire corpus and # copy the search hit to "newTier". test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="YYY", destTier = "newTier") # Have a look again at the first transcript in the corpus object test. printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been replaced by "YYY. # In line 02 you see that it has been copied to the tier "newTier". # If you only want to copy a search hit but not replace it in the original # leave replacement="", which is the default test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", destTier = "newTier") printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been maintained. # In line 02 you see that "UN" it has been copied to the tier "newTier".library(act) # Have a look at the first transcript in the example corpus: printtranscript <- act::export_txt(examplecorpus@transcripts[[1]]) cat(printtranscript) # In line 01 there is the word "UN". # Replace this word by "XXX" in the entire corpus test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="XXX") # Have a look at the first transcript in the corpus object test: printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 there is now "XXX" instead of "UN" # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check that the first transcript now contains the newTier examplecorpus@transcripts[[1]]@tiers # Now replace "UN" by "YYY" in the entire corpus and # copy the search hit to "newTier". test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="YYY", destTier = "newTier") # Have a look again at the first transcript in the corpus object test. printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been replaced by "YYY. # In line 02 you see that it has been copied to the tier "newTier". # If you only want to copy a search hit but not replace it in the original # leave replacement="", which is the default test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", destTier = "newTier") printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been maintained. # In line 02 you see that "UN" it has been copied to the tier "newTier".
This functions performs a search and replace in the contents of an annotation. A simple matrix consisting of two columns will be used. The first column of the matrix needs to contain the search string, the second column the replacement string. The matrix needs to be in CSV format.
annotations_matrix(x, pathReplacementMatrix, filterTranscriptNames = NULL)annotations_matrix(x, pathReplacementMatrix, filterTranscriptNames = NULL)
x |
Corpus object. |
pathReplacementMatrix |
Character string; path to replacement matrix (a CSV file). |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
Corpus object.
matrix_load() for loading the matrix
and matrix_save() for saving the matrix to a CSV file.
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames.
In case that you want to select transcripts by using regular expressions use the function act::search_makefilter first.
media_delete, media_path_to_existing_file
library(act) # An example replacement matrix comes with the package. # It replaces most of the GAT conventions. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Have a look at the matrix mymatrix <- act::matrix_load(path) mymatrix # Apply matrix to examplecorpus test <- act::annotations_matrix(x=examplecorpus, pathReplacementMatrix=path) # Compare some annotations in the original examplecorpus object and # in the modified corpus object test examplecorpus@transcripts[[1]]@annotations$content[[1]] test@transcripts[[1]]@annotations$content[[1]] examplecorpus@transcripts[[2]]@annotations$content[[3]] test@transcripts[[2]]@annotations$content[[3]]library(act) # An example replacement matrix comes with the package. # It replaces most of the GAT conventions. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Have a look at the matrix mymatrix <- act::matrix_load(path) mymatrix # Apply matrix to examplecorpus test <- act::annotations_matrix(x=examplecorpus, pathReplacementMatrix=path) # Compare some annotations in the original examplecorpus object and # in the modified corpus object test examplecorpus@transcripts[[1]]@annotations$content[[1]] test@transcripts[[1]]@annotations$content[[1]] examplecorpus@transcripts[[2]]@annotations$content[[3]] test@transcripts[[2]]@annotations$content[[3]]
The function searches within the contents of annotations and replaces the search hits. In addition the search hit may be copied to another tier. In case that there is NO overlapping annotation in the destination tier a new annotation will be created (based on the time values of the original annotation). In case that there is an overlapping annotation in the destination tier, the search result will be added at the end.
annotations_replace_copy( x, pattern, replacement = NULL, destTier = NULL, destTierAddMissing = TRUE, filterTranscriptNames = NULL, filterTierNames = NULL, collapseString = " | " )annotations_replace_copy( x, pattern, replacement = NULL, destTier = NULL, destTierAddMissing = TRUE, filterTranscriptNames = NULL, filterTierNames = NULL, collapseString = " | " )
x |
Corpus object. |
pattern |
Character string; search pattern as regular expression. |
replacement |
Character string; replacement. |
destTier |
Character string; name of the tier to which the hit should be copied (if no copying is intended set to NA). |
destTierAddMissing |
Logical; if |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
filterTierNames |
Character string; names of the tiers to be included. |
collapseString |
Character string; will be used to collapse multiple search results into one string. |
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames and filterTierNames.
In case that you want to select transcripts and/or tiers by using regular expressions use the function act::search_makefilter first.
Corpus object.
library(act) # Have a look at the first transcript in the example corpus: printtranscript <- act::export_txt(examplecorpus@transcripts[[1]]) cat(printtranscript) # In line 01 there is the word "UN". # Replace this word by "XXX" in the entire corpus test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="XXX") # Have a look at the first transcript in the corpus object test: printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 there is now "XXX" instead of "UN" # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check that the first transcript now contains the newTier examplecorpus@transcripts[[1]]@tiers # Now replace "UN" by "YYY" in the entire corpus and # copy the search hit to "newTier". test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="YYY", destTier = "newTier") # Have a look again at the first transcript in the corpus object test. printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been replaced by "YYY. # In line 02 you see that it has been copied to the tier "newTier". # If you only want to copy a search hit but not replace it in the original # leave replacement="", which is the default test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", destTier = "newTier") printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been maintained. # In line 02 you see that "UN" it has been copied to the tier "newTier".library(act) # Have a look at the first transcript in the example corpus: printtranscript <- act::export_txt(examplecorpus@transcripts[[1]]) cat(printtranscript) # In line 01 there is the word "UN". # Replace this word by "XXX" in the entire corpus test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="XXX") # Have a look at the first transcript in the corpus object test: printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 there is now "XXX" instead of "UN" # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check that the first transcript now contains the newTier examplecorpus@transcripts[[1]]@tiers # Now replace "UN" by "YYY" in the entire corpus and # copy the search hit to "newTier". test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", replacement="YYY", destTier = "newTier") # Have a look again at the first transcript in the corpus object test. printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been replaced by "YYY. # In line 02 you see that it has been copied to the tier "newTier". # If you only want to copy a search hit but not replace it in the original # leave replacement="", which is the default test <- act::annotations_replace_copy(x=examplecorpus, pattern="\\bUN\\b", destTier = "newTier") printtranscript <- act::export_txt(test@transcripts[[1]]) cat(printtranscript) # In line 01 you see that "UN" has been maintained. # In line 02 you see that "UN" it has been copied to the tier "newTier".
Exports all (or some) transcript objects in a corpus object to different annotation file formats.
If only some transcripts or tiers should be affected set the parameter filterTranscriptNames and filterTierNames.
In case that you want to select transcripts and/or tiers by using regular expressions use the function act::search_makefilter first.
corpus_export( x, folderOutput, filterTranscriptNames = NULL, filterTierNames = NULL, formats = c("docx", "eaf", "exb", "edl", "srt", "textgrid", "txt"), createMediaLinks = TRUE, createFolderOutput = TRUE, l = NULL )corpus_export( x, folderOutput, filterTranscriptNames = NULL, filterTierNames = NULL, formats = c("docx", "eaf", "exb", "edl", "srt", "textgrid", "txt"), createMediaLinks = TRUE, createFolderOutput = TRUE, l = NULL )
x |
Corpus object. |
folderOutput |
Character string; path to a folder where the transcription files will be saved. By default the forlder will be created recursively it does not exist. |
filterTranscriptNames |
Vector of character strings; names of transcripts to be included. If left unspecified, all transcripts will be exported. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
formats |
Vector with one or more character strings; output formats, accepted values: 'eaf', 'exb', and 'textgrid', 'srt' and 'edl', 'docx' and 'txt'. If left unspecified, all supported formats will be exported. |
createMediaLinks |
Logical; if |
createFolderOutput |
Logical; if |
l |
Layout object. layout of print transcripts (affects only 'txt' and 'docx' files). |
export_eaf, export_textgrid, export_exb, export_txt, export_docx, export_rpraat, export_srt
library(act) # Set destination folder folderOutput <- tempdir() # It makes more sense, however, to you define a folder # that is easier to access on your computer ## Not run: folderOutput <- "PATH_TO_AN_EMPTY_FOLDER_ON_YOUR_COMPUTER" ## End(Not run) # Exports all transcript objects in all supported formats act::corpus_export(x=examplecorpus, folderOutput=folderOutput) # Exports all transcript objects in 'Praat' .TextGrid format act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="textgrid") # Exports all transcript objects in 'ELAN' .eaf format. # By default WITH media links act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="eaf") # Same same, but now WITHOUT media links. # Only Media links are only exported that are in # the '@media.path' attribute in the transcript object(s)) act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="eaf", createMediaLinks=FALSE) # Exports in 'ELAN' .eaf and Praat' .TextGrid format act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats=c("eaf", "textgrid"))library(act) # Set destination folder folderOutput <- tempdir() # It makes more sense, however, to you define a folder # that is easier to access on your computer ## Not run: folderOutput <- "PATH_TO_AN_EMPTY_FOLDER_ON_YOUR_COMPUTER" ## End(Not run) # Exports all transcript objects in all supported formats act::corpus_export(x=examplecorpus, folderOutput=folderOutput) # Exports all transcript objects in 'Praat' .TextGrid format act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="textgrid") # Exports all transcript objects in 'ELAN' .eaf format. # By default WITH media links act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="eaf") # Same same, but now WITHOUT media links. # Only Media links are only exported that are in # the '@media.path' attribute in the transcript object(s)) act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats="eaf", createMediaLinks=FALSE) # Exports in 'ELAN' .eaf and Praat' .TextGrid format act::corpus_export(x=examplecorpus, folderOutput=folderOutput, formats=c("eaf", "textgrid"))
Scans all path specified in if [email protected] for annotation files.
Supported file formats will be loaded as transcript objects into the corpus object.
All previously loaded transcript objects will be deleted.
corpus_import(x, createFulltext = TRUE, assignMedia = TRUE)corpus_import(x, createFulltext = TRUE, assignMedia = TRUE)
x |
Corpus object. |
createFulltext |
Logical; if |
assignMedia |
Logical; if |
If assignMedia=TRUE the paths defined in [email protected] will be scanned for media files.
Based on their file names the media files and annotations files will be matched.
Only the the file types set in options()$act.fileformats.audio and options()$act.fileformats.video will be recognized.
You can modify these options to recognize other media types.
See @import.results of the corpus object to check the results of importing the files.
To get a detailed overview of the corpus object use act::info(x), for a summary use act::info_summarized(x).
Corpus object.
library(act) # The example files that come with the act library are located here: path <- system.file("extdata", "examplecorpus", package="act") # This is the examplecorpus object that comes with the library examplecorpus # Make sure that the input folder of the example corpus object is set correctly [email protected] <- path [email protected] <- path # Load annotation files into the corpus object (again) examplecorpus <- act::corpus_import(x=examplecorpus) # Creating the full texts may take a long time. # If you do NOT want to create the full texts immediately use the following command: examplecorpus <- act::corpus_import(x=examplecorpus, createFulltext=FALSE )library(act) # The example files that come with the act library are located here: path <- system.file("extdata", "examplecorpus", package="act") # This is the examplecorpus object that comes with the library examplecorpus # Make sure that the input folder of the example corpus object is set correctly examplecorpus@paths.annotation.files <- path examplecorpus@paths.media.files <- path # Load annotation files into the corpus object (again) examplecorpus <- act::corpus_import(x=examplecorpus) # Creating the full texts may take a long time. # If you do NOT want to create the full texts immediately use the following command: examplecorpus <- act::corpus_import(x=examplecorpus, createFulltext=FALSE )
Create a new corpus object and loads annotation files. Currently 'ELAN' .eaf, 'EXMARaLDA .exb and 'Praat' .TextGrid files are supported.
The parameter pathsAnnotationFiles defines where the annotation files are located.
If skipDoubleFiles=TRUE duplicated files will be skipped, otherwise the will be renamed.
If importFiles=TRUE the corpus object will be created but files will not be loaded. To load the files then call corpus_import.
corpus_new( pathsAnnotationFiles = NULL, pathsMediaFiles = NULL, name = "New Corpus", importFiles = TRUE, skipDoubleFiles = TRUE, createFulltext = TRUE, assignMedia = TRUE, pathNormalizationMatrix = NULL, namesInclude = character(), namesExclude = character(), namesExtractPatterns = character(), namesSearchPatterns = character(), namesSearchReplacements = character(), namesToUpper = FALSE, namesToLower = FALSE, namesTrim = TRUE, namesDefault = "no_name" )corpus_new( pathsAnnotationFiles = NULL, pathsMediaFiles = NULL, name = "New Corpus", importFiles = TRUE, skipDoubleFiles = TRUE, createFulltext = TRUE, assignMedia = TRUE, pathNormalizationMatrix = NULL, namesInclude = character(), namesExclude = character(), namesExtractPatterns = character(), namesSearchPatterns = character(), namesSearchReplacements = character(), namesToUpper = FALSE, namesToLower = FALSE, namesTrim = TRUE, namesDefault = "no_name" )
pathsAnnotationFiles |
Vector of character strings; paths to annotations files or folders that contain annotation files. |
pathsMediaFiles |
Vector of character strings; paths to media files or folders that contain media files. |
name |
Character string; name of the corpus to be created. |
importFiles |
Logical; if |
skipDoubleFiles |
Logical; if |
createFulltext |
Logical; if |
assignMedia |
Logical; if |
pathNormalizationMatrix |
Character string; path to the replacement matrix used for normalizing the annotations; if argument left open, the default normalization matrix of the package will be used. |
namesInclude |
Character strings; Only files matching this regular expression will be imported into the corpus. |
namesExclude |
Character strings; Files matching this regular expression will be skipped and not imported into the corpus. |
namesExtractPatterns |
Vector of character strings; Only the part of the file name matching these expressions will be taken as trasncript name. |
namesSearchPatterns |
Vector of character strings; Search pattern as regular expression. Leave empty for no search-replace in the names. |
namesSearchReplacements |
Vector of character strings; Replacements for search. Leave empty for no search-replace in the names. |
namesToUpper |
Logical; Convert transcript names all to upper case. |
namesToLower |
Logical; Convert transcript names all to lower case. |
namesTrim |
Logical; Remove leading and trailing spaces in names. |
namesDefault |
Character string; Default value for empty transcript names (e.g., resulting from search-replace operations) |
The parameter pathsMediaFiles defines where the corresponding media files are located.
If assignMedia=TRUE the paths defined in [email protected] will be scanned for media files and will be matched to the transcript object based on their names.
Only the the file types set in options()$act.fileformats.audio and options()$act.fileformats.video will be recognized.
You can modify these options to recognize other media types.
See @import.results of the corpus object to check the results of importing the files.
To get a detailed overview of the corpus object use act::info(x), for a summary use act::info_summarized(x).
Corpus object.
library(act) # The example files that come with the act library are located here: path <- system.file("extdata", "examplecorpus", package="act") # The example corpus comes without media files. # It is recommended to download a full example corpus also including the media files. # You can use the following commands. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER/examplecorpus" temp <- tempfile() download.file(options()$act.examplecorpusURL, temp) unzip(zipfile=temp, exdir=path) ## End(Not run) # The following command creates a new corpus object mycorpus <- act::corpus_new(name = "mycorpus", pathsAnnotationFiles = path, pathsMediaFiles = path) # Get a summary mycorpuslibrary(act) # The example files that come with the act library are located here: path <- system.file("extdata", "examplecorpus", package="act") # The example corpus comes without media files. # It is recommended to download a full example corpus also including the media files. # You can use the following commands. ## Not run: path <- "EXISTING_FOLDER_ON_YOUR_COMPUTER/examplecorpus" temp <- tempfile() download.file(options()$act.examplecorpusURL, temp) unzip(zipfile=temp, exdir=path) ## End(Not run) # The following command creates a new corpus object mycorpus <- act::corpus_new(name = "mycorpus", pathsAnnotationFiles = path, pathsMediaFiles = path) # Get a summary mycorpus
This is the main object the act package uses. It collects the annotations and meta data from loaded annotation files.
Some of the slots are defined by the user.
Some slots report results, such as @import.results and @history and .
Other slots are settings and are used when performing functions on the corpus object.
nameCharacter string; Name of the corpus.
transcriptsList of transcript objects; Each annotation file that has been load is stored in this list as a transcript object.
paths.annotation.filesVector of character strings; Path(s) to one or several folders where your annotation files are located.
paths.media.filesVector of character strings; Path(s) to one or several folders where your media files are located.
normalization.matrixData.frame; Replacement matrix used for normalizing the annotations. To change the normalization matrix use [email protected] <- act::matrix_load(path="...")
import.skip.double.filesLogical; if TRUE files with the same names will be skipped (only one of them will be loaded), if FALSE transcripts will be renamed to make the names unique.
import.names.includeVector of character strings; Only files matching this regular expression will be imported into the corpus.
import.names.excludeVector of character strings; Files matching this regular expression will be skipped and not imported into the corpus.
import.names.modifyList; Options how to modify the names of the transcript objects when they are added to the corpus. These options are useful, for instacne, if your annotation files contain character sequences that you do not want to include into the transcript name in the corpus (e.g. if you regularly add a date to the file name of your annotations files as 'myFile_2020-09-21.TextGrid').
import.resultsData.frame; information about the import of the annotation files.
historyList; History of modifications made by any of the package functions to the corpus.
library(act) examplecorpuslibrary(act) examplecorpus
Example corpus with data loaded from the example annotations files that come with the package
data(examplecorpus)data(examplecorpus)
An object of class "corpus"
You can download the corresponding media files from
www.oliverehmer.de in the section "Digital Humanities or
from Github: https://github.com/oliverehmer/act_examplecorpus/. Alternatively you can use the download commands in the example section below.
GAT: Ehmer, Oliver/Satti, Luis Ignacio/Martinez, Angelita/Pfaender, Stefan (2019): Un sistema para transcribir el habla en la interaccion: GAT 2.0 Gespraechsforschung - Online-Zeitschrift zur verbalen Interaktion (www.gespraechsforschung-ozs.de) 20, 64-114. http://www.gespraechsforschung-online.de/2019.html
SYNC: Ehmer, Oliver (2020, in press): Synchronization in demonstrations. Multimodal practices for instructing body knowledge. Linguistics Vanguard. https://www.degruyter.com/view/journals/lingvan/lingvan-overview.xml
library(act) # Summary of the data in the corpus examplecorpus # Summary of the data in th second transcripts in the corpus examplecorpus@transcripts[[2]] ## Not run: # Download example corpus with media files destinationpath <- temp <- tempfile() download.file(options()$act.examplecorpusURL, temp) unzip(zipfile=temp, exdir=destinationpath) # Set the URL for the ZIP archive url <- options()$act.examplecorpusURL # Define output path output_dir <- "/EXISTING_FOLDERON_YOUR_COMPUTER/examplecorpus" output_path <- file.path(output_dir, "act_examplecorpus.zip") # Download the ZIP file download.file(url, output_path, mode = "wb") # Unzip it to the output directory unzip(output_path, exdir = output_dir) # Rename the extracted folder folder_old <- file.path(output_dir, "act_examplecorpus-main") folder_new <- file.path(output_dir, "act_examplecorpus") if (dir.exists(folder_new)) unlink(folder_new, recursive = TRUE) file.rename(folder_old, folder_new) ## End(Not run)library(act) # Summary of the data in the corpus examplecorpus # Summary of the data in th second transcripts in the corpus examplecorpus@transcripts[[2]] ## Not run: # Download example corpus with media files destinationpath <- temp <- tempfile() download.file(options()$act.examplecorpusURL, temp) unzip(zipfile=temp, exdir=destinationpath) # Set the URL for the ZIP archive url <- options()$act.examplecorpusURL # Define output path output_dir <- "/EXISTING_FOLDERON_YOUR_COMPUTER/examplecorpus" output_path <- file.path(output_dir, "act_examplecorpus.zip") # Download the ZIP file download.file(url, output_path, mode = "wb") # Unzip it to the output directory unzip(output_path, exdir = output_dir) # Rename the extracted folder folder_old <- file.path(output_dir, "act_examplecorpus-main") folder_new <- file.path(output_dir, "act_examplecorpus") if (dir.exists(folder_new)) unlink(folder_new, recursive = TRUE) file.rename(folder_old, folder_new) ## End(Not run)
LAYOUT Using the layout object you may
Adjust with, abbreviation of speakers, etc.
set filters to include/exclude tiers matching regular expressions. – assign template files for .docx formatting using format templates
export_docx( t, l = NULL, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, insertArrowAnnotationID = "", headerPreface = NULL, headerTitle = NULL, headerSubtitle = NULL, headerDescription = NULL, headerInsertSource = TRUE, layerNames = NULL )export_docx( t, l = NULL, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, insertArrowAnnotationID = "", headerPreface = NULL, headerTitle = NULL, headerSubtitle = NULL, headerDescription = NULL, headerInsertSource = TRUE, layerNames = NULL )
t |
Transcript object. |
l |
Layout object. |
pathOutput |
Character string; path where to save the transcript. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
insertArrowAnnotationID |
Integer; ID of the annotation in front of which the arrow will be placed. |
headerPreface |
Character string; text used as preface before title. |
headerTitle |
Character string; text used as title. |
headerSubtitle |
Character string; text used as sub title. |
headerDescription |
Character string; text used as description after sub title. |
headerInsertSource |
Logical; if |
layerNames |
Vector of character strings; Names of columns present in 't@annotations' to be exported (in addition to column content'). |
FORMATING
adjust the the defaults format templates in the default .docx template.
define further templates and add them to a styles matrix. The paths to both files need to be set in your l layout object. Please check the slots [email protected] and [email protected]. You can see the structure of the default styles matrix in each new layout object in l@ [email protected]. Use the [email protected]<-act:: export_docx_styles_load(...) to assign a custom styles matrix from a csv files. The default format templates are
Header:
header.preface (formats: s@results$header.description)
header.title (formats: s@results$header.description)
header.subtitle (formats: s@results$header.description)
header.description (formats: s@results$header.description)
Transcript body
body.default (formats: any annotation in „t@annotations“
LAYERS In addition to the original transcript content in transcript@annotations$content you can output further layers. The layers need to be assigned to the data.frame transcript@annotations as new columns. To output the layers, pass the name(s) of the repective column(s) as character vector in the parameter layerNames.
Officer doc; transcript as object from library officer.
corpus_export, export_eaf, export_exb, export_rpraat, export_srt, export_textgrid
library(act) # Get a transcript t <- examplecorpus@transcripts[[1]] # Create print transcript printtranscript <- act::export_docx (t=t) # Display on screen printtranscriptlibrary(act) # Get a transcript t <- examplecorpus@transcripts[[1]] # Create print transcript printtranscript <- act::export_docx (t=t) # Display on screen printtranscript
This function is only for checking how the export styles matrix (as .csv) will be loaded internally.
export_docx_styles_load(path = NULL, encoding = "UTF-8", path_docx = NA)export_docx_styles_load(path = NULL, encoding = "UTF-8", path_docx = NA)
path |
Character string; path to the export styles matrix .csv. If argument is left open, the default export styles data frame of the package will be returned. |
encoding |
Character string; encoding of the file. |
path_docx |
Character string; Path to a template file. If given, it will be checked if all styles defined in exportStylesMatrix are present in the .docx file. if |
Data.frame
library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # Have a look at the matrix colnames(mymatrix) mymatrix #the original path of the matrix is stored in the attributes attr(mymatrix, 'path')library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # Have a look at the matrix colnames(mymatrix) mymatrix #the original path of the matrix is stored in the attributes attr(mymatrix, 'path')
Save styles matrix for .docx transcripts
export_docx_styles_save(exportStyles, path, encoding = "UTF-8")export_docx_styles_save(exportStyles, path, encoding = "UTF-8")
exportStyles |
Data frame; export styles matrix. |
path |
Character string; path where the matrix will be saved. |
encoding |
Character string; encoding of the file. |
nothing
library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # ' # Create temporary file path path <- tempfile(pattern = "mymatrix", tmpdir=tempdir(), fileext = ".csv") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", "mymatrix.csv") ## End(Not run) # Save the matrix act::matrix_save(mymatrix, path=path)library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # ' # Create temporary file path path <- tempfile(pattern = "mymatrix", tmpdir=tempdir(), fileext = ".csv") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", "mymatrix.csv") ## End(Not run) # Save the matrix act::matrix_save(mymatrix, path=path)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_eaf( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, createMediaLinks = TRUE )export_eaf( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, createMediaLinks = TRUE )
t |
Transcript object; transcript to be exported. |
pathOutput |
Character string; path where .eaf file will be saved. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
createMediaLinks |
Logical; if |
The .eaf file will be written to the file specified in pathOutput.
If pathOutput is left empty, the function will return the contents of the .eaf itself.
Contents of the .eaf file (only if pathOutput is left empty)
corpus_export, export_exb, export_txt, export_rpraat, export_srt, export_textgrid, export_docx
library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".eaf") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".eaf", sep="")) ## End(Not run) # Export WITH media links act::export_eaf(t=t, pathOutput=path) # Export WITHOUT media links act::export_eaf(t=t, pathOutput=path, createMediaLinks = FALSE)library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".eaf") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".eaf", sep="")) ## End(Not run) # Export WITH media links act::export_eaf(t=t, pathOutput=path) # Export WITHOUT media links act::export_eaf(t=t, pathOutput=path, createMediaLinks = FALSE)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_edl( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, fps = 50 )export_edl( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, fps = 50 )
t |
Transcript object; transcript to be saved. |
pathOutput |
Character string; path where .edl will be saved. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
fps |
Double; Frame rate per seconds of your project, e.g. 60, 50, 30, 29 |
Creates a 'event list text' .edl file with definitions for Blackmagic DaVinci Resolve
It will be written to the file specified in pathOutput.
If pathOutput is left empty, the function will return the contents of the .edl itself.
Contents of the .edl file (only if pathOutput is left empty)
corpus_export, export_eaf, export_exb, export_srt, export_txt, export_docx, export_rpraat, export_textgrid
library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".srt") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".srt", sep="")) ## End(Not run) # Export act::export_srt(t=t, pathOutput=path)library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".srt") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".srt", sep="")) ## End(Not run) # Export act::export_srt(t=t, pathOutput=path)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_exb( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, createMediaLinks = TRUE )export_exb( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, createMediaLinks = TRUE )
t |
Transcript object; transcript to be exported. |
pathOutput |
Character string; path where .exb file will be saved. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
createMediaLinks |
Logical; if |
The .exb file will be written to the file specified in pathOutput.
If pathOutput is left empty, the function will return the contents of the .exb itself.
Contents of the .exb file (only if pathOutput is left empty)
corpus_export, export_eaf, export_txt, export_docx, export_rpraat, export_srt, export_textgrid
library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".exb") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".exb", sep="")) ## End(Not run) # Export WITH media links act::export_exb(t=t, pathOutput=path) # Export WITHOUT media links act::export_exb(t=t, pathOutput=path, createMediaLinks = FALSE)library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".exb") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".exb", sep="")) ## End(Not run) # Export WITH media links act::export_exb(t=t, pathOutput=path) # Export WITHOUT media links act::export_exb(t=t, pathOutput=path, createMediaLinks = FALSE)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_rpraat( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL )export_rpraat( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL )
t |
Transcript object; transcript to be converted. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
This function is to create compatibility with the rPraat package.
It converts an act transcript to a rPraat TextGrid object.
Credits: Thanks to Tomáš Bořil, the author of the rPraat package, for commenting on the exchange functions.
rPraat TextGrid object
import_rpraat, corpus_export, export_eaf, export_exb, export_txt, export_docx, export_srt, export_textgrid
library(act) # Convert rpraat.tg <- act::export_rpraat(t=examplecorpus@transcripts[[1]]) # Now you can use the object in the rPraat pachage. # For instance you can plot the TextGrid ## Not run: rPraat::tg.plot(rpraat.tg) ## End(Not run)library(act) # Convert rpraat.tg <- act::export_rpraat(t=examplecorpus@transcripts[[1]]) # Now you can use the object in the rPraat pachage. # For instance you can plot the TextGrid ## Not run: rPraat::tg.plot(rpraat.tg) ## End(Not run)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_srt( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, speakerShow = TRUE, speakerWidth = 3, speakerEnding = ":" )export_srt( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, speakerShow = TRUE, speakerWidth = 3, speakerEnding = ":" )
t |
Transcript object; transcript to be saved. |
pathOutput |
Character string; path where .srt will be saved. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
speakerShow |
Logical; if |
speakerWidth |
Integer; width of speaker abbreviation, -1 for full name without shortening. |
speakerEnding |
Character string; string that is added at the end of the speaker name. |
Creates a 'Subrip title' .srt subtitle file.
It will be written to the file specified in pathOutput.
If pathOutput is left empty, the function will return the contents of the .srt itself.
Contents of the .srt file (only if pathOutput is left empty)
corpus_export, export_eaf, export_exb, export_txt, export_docx, export_rpraat, export_textgrid
library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".srt") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".srt", sep="")) ## End(Not run) # Export act::export_srt(t=t, pathOutput=path)library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".srt") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".srt", sep="")) ## End(Not run) # Export act::export_srt(t=t, pathOutput=path)
Advice: In most situations it is more convenient to use act::corpus_export for exporting annotation files.
export_textgrid( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL )export_textgrid( t, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL )
t |
Transcript object; transcript to be saved. |
pathOutput |
Character string; path where .TextGrid will be saved. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
The .TextGrid file will be written to the file specified in pathOutput.
If pathOutput is left empty, the function will return the contents of the .TextGrid itself.
Contents of the .TextGrid file (only if pathOutput is left empty)
corpus_export, export_eaf, export_exb, export_txt, export_docx, export_rpraat, export_srt
library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".TextGrid") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".TextGrid", sep="")) ## End(Not run) # Export act::export_textgrid(t=t, pathOutput=path)library(act) # Get the transcript you want to export t <- examplecorpus@transcripts[[1]] # Create temporary file path path <- tempfile(pattern = t@name, tmpdir = tempdir(), fileext = ".TextGrid") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".TextGrid", sep="")) ## End(Not run) # Export act::export_textgrid(t=t, pathOutput=path)
If you want to modify the layout of the print transcripts, create a new layout object with mylayout <- methods::new("layout"), modify the settings and pass it as argument l.
In the layout object you may also set additional filters to include/exclude tiers matching regular expressions.
export_txt( t, l = NULL, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, insertArrowAnnotationID = "", headerPreface = NULL, headerTitle = NULL, headerSubtitle = NULL, headerDescription = NULL, headerInsertSource = TRUE, collapse = TRUE ) export_printtranscript(...)export_txt( t, l = NULL, pathOutput = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, insertArrowAnnotationID = "", headerPreface = NULL, headerTitle = NULL, headerSubtitle = NULL, headerDescription = NULL, headerInsertSource = TRUE, collapse = TRUE ) export_printtranscript(...)
t |
Transcript object. |
l |
Layout object. |
pathOutput |
Character string; path where to save the transcript. |
filterTierNames |
Vector of character strings; names of tiers to be included. If left unspecified, all tiers will be exported. |
filterSectionStartsec |
Double; start of selection in seconds. |
filterSectionEndsec |
Double; end of selection in seconds. |
insertArrowAnnotationID |
Integer; ID of the annotation in front of which the arrow will be placed. |
headerPreface |
Character string; text used as preface before title. |
headerTitle |
Character string; text used as title. |
headerSubtitle |
Character string; text used as sub title. |
headerDescription |
Character string; text used as description after sub title. |
headerInsertSource |
Logical; if |
collapse |
Logical; if |
... |
Arguments passed to |
Character string; transcript as text.
corpus_export, export_eaf, export_exb, export_rpraat, export_srt, export_textgrid, export_docx
library(act) # Get a transcript t <- examplecorpus@transcripts[[1]] # Create print transcript printtranscript <- act::export_txt (t=t) # Display on screen cat(printtranscript)library(act) # Get a transcript t <- examplecorpus@transcripts[[1]] # Create print transcript printtranscript <- act::export_txt (t=t) # Display on screen cat(printtranscript)
Saves FFMPEG cut list fpr mac or windows For windows: simly saves to a .cmd file For mac: saves as a shell script and makes it executable (also adds "#!/bin/sh")
helper_cutlist_save( cutlistMac = NULL, cutlistWin = NULL, outFolder, outFilename )helper_cutlist_save( cutlistMac = NULL, cutlistWin = NULL, outFolder, outFilename )
cutlistMac |
Character string; Content if file, if |
cutlistWin |
Character string; Content if file, if |
outFolder |
Character string; Destination folder. |
outFilename |
Character string; Destination filename |
library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3
Formats time as HH:MM:SS,mmm
helper_format_time(t, digits = 1, addHrsMinSec = FALSE, addSec = FALSE)helper_format_time(t, digits = 1, addHrsMinSec = FALSE, addSec = FALSE)
t |
Double; time in seconds. |
digits |
Integer; number of digits. |
addHrsMinSec |
Logical; if |
addSec |
Logical; if |
Character string.
library(act) helper_format_time(12734.2322345) helper_format_time(2734.2322345) helper_format_time(34.2322345) helper_format_time(0.2322345) helper_format_time(12734.2322345, addHrsMinSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE) helper_format_time(12734.2322345, digits=3) helper_format_time(2734.2322345, digits=3) helper_format_time(34.2322345, digits=3) helper_format_time(0.2322345, digits=3) helper_format_time(12734.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(2734.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(34.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(0.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(12734.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(12734.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE)library(act) helper_format_time(12734.2322345) helper_format_time(2734.2322345) helper_format_time(34.2322345) helper_format_time(0.2322345) helper_format_time(12734.2322345, addHrsMinSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE) helper_format_time(12734.2322345, digits=3) helper_format_time(2734.2322345, digits=3) helper_format_time(34.2322345, digits=3) helper_format_time(0.2322345, digits=3) helper_format_time(12734.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(2734.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(34.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(0.2322345, addHrsMinSec=TRUE, digits=3) helper_format_time(12734.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE, addSec=TRUE) helper_format_time(12734.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(2734.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(34.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE) helper_format_time(0.2322345, addHrsMinSec=TRUE, digits=3, addSec=TRUE)
Helper: Set progress bar
helper_progress_set(title, total)helper_progress_set(title, total)
title |
Character string; Title of progress bar. |
total |
Integer; Number of items to tick. |
library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3
Helper: Advance progress bar by one tick
helper_progress_tick()helper_progress_tick()
library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3
Creates a tier name filter based on a vector of character strings and the values 'filterTierIncludeRegEx' and 'filterTierExcludeRegEx' in a layout object.
helper_tiers_filter_create( tierNames, filterTierIncludeRegEx = "", filterTierExcludeRegEx = "" )helper_tiers_filter_create( tierNames, filterTierIncludeRegEx = "", filterTierExcludeRegEx = "" )
tierNames |
Vector of character strings; names of the tiers. |
filterTierIncludeRegEx |
Character string; as regular expression, tiers matching the expression will be included in the print transcript. |
filterTierExcludeRegEx |
Character string; as regular expression, tiers matching the expression will be excluded from the print transcript. |
Vector of character strings; names of tiers
library(act) # --- Create a tier table from scratch tierTable <- act::helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTablelibrary(act) # --- Create a tier table from scratch tierTable <- act::helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable
Merges several the tier tables into one tier table.
helper_tiers_merge_tables(...)helper_tiers_merge_tables(...)
... |
accepts different kinds of objects; transcript objects, lists of transcript objects (as in @transcripts of a corpus object) and tier tables (as in @tiers of a transcript object). |
NOTE: To actually modify the tiers in a transcript object or a corpus object corpus use the functions of the package, e.g. act::transcripts_merge.
This function is only a helper function and for people that like experiments.
If tiers with the same name are of different types ('IntervalTier', 'TextTier') an error will be raised.
In that case can use, for example, 'act::tier_convert()' to change the tier types.
Data.frame
helper_tiers_sort_table, helper_tiers_merge_tables, tiers_convert, tiers_rename, tiers_sort, transcripts_merge
library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3library(act) # --- Create two tier tables from scratch tierTable1 <- act::helper_tiers_new_table(c("a","b","c","d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable2 <- act::helper_tiers_new_table(c("a","b","x","y"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable3 <- act::helper_tiers_merge_tables(tierTable1,tierTable2) tierTable3
Creates a new tier table as necessary in @tiers of a transcript object.
helper_tiers_new_table(tierNames, tierTypes = NULL, tierPositions = NULL)helper_tiers_new_table(tierNames, tierTypes = NULL, tierPositions = NULL)
tierNames |
Vector of character strings; names of the tiers. |
tierTypes |
Vector of character strings; types of the tiers. Allowed values: "IntervalTier","TextTier". Needs to have the same length as 'tierNames'. |
tierPositions |
Vector of integer values; Sort order of the tiers. Needs to have the same length as 'tierNames'. |
NOTE: To actually modify the tiers in a transcript object or a corpus object corpus use the functions of the package. This function is only a helper function and for people that like experiments.
Data.frame
helper_tiers_sort_table, helper_tiers_merge_tables, tiers_convert, tiers_rename, tiers_sort
library(act) # --- Create a tier table from scratch tierTable <- act::helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTablelibrary(act) # --- Create a tier table from scratch tierTable <- act::helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) tierTable
NOTE: To actually reorder the tiers in a transcript object or a corpus object corpus use act::tiers_sort.
This function is only a helper function and for people that like experiments.
helper_tiers_sort_table( tierTable, sortVector, tiersAddMissing = TRUE, tiersDelete = FALSE )helper_tiers_sort_table( tierTable, sortVector, tiersAddMissing = TRUE, tiersDelete = FALSE )
tierTable |
Data frame; tiers as specified and necessary in |
sortVector |
Vector of character strings; regular expressions to match the tier names. The order within the vector presents the new order of the tiers. Use "\*" (=two backslashes and a star) to indicate where tiers that are not present in the sort vector but in the transcript should be inserted. |
tiersAddMissing |
Logical; if |
tiersDelete |
Logical; if |
Sort a tier table by a predefined vector of regular expression strings. Tiers that are missing in the table but are present in the sort vector may be inserted. Tiers that are present in the table but not in the sort vector may be deleted or inserted. These tiers will be inserted by default at the end of the table. You may also use a element '\*' in 'sortVector' to define the position where they should be placed..
Data.frame
tiers_sort, helper_tiers_new_table, helper_tiers_merge_tables
# This function applies to the tier tables that are necessary in \code{@tiers} of a transcript. # object. For clarity, we will create such a table from scratch. library(act) # --- Create a tier table from scratch tierTable <- helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) # --- Create a vector, defining the new order of the tiers. sortVector <- c("c","a","d","b") # Sort the table tierTable.1 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.1 # --- Create a vector, in which the tier "c" is missing. sortVector <- c("a","b","d") # Sort the table, the missing tier will be inserted at the end. tierTable.1 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.1 # --- Create a vector, in which the tier "c" is missing, # but define the place, where missing tiers will be inserted by "*" sortVector <- c("a","\\*", "b","d") # Sort the table. The missing tier "c" will be inserted in second place. tierTable.2 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.2 # Sort the table, but delete tiers that are missing in the sort vector # Note: If 'tiersDelete=TRUE' tiers that are missing in the # will be deleted, even if the 'sortVector' contains a "\*". tierTable.3 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector, tiersDelete=TRUE) tierTable.3 # --- Create a vector, which contains tier names that are not present in 'tierTable'. sortVector <- c("c","a","x", "y", "d","b") tierTable.4 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.4# This function applies to the tier tables that are necessary in \code{@tiers} of a transcript. # object. For clarity, we will create such a table from scratch. library(act) # --- Create a tier table from scratch tierTable <- helper_tiers_new_table(c("a","b","c", "d"), c("IntervalTier", "TextTier","IntervalTier","TextTier")) # --- Create a vector, defining the new order of the tiers. sortVector <- c("c","a","d","b") # Sort the table tierTable.1 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.1 # --- Create a vector, in which the tier "c" is missing. sortVector <- c("a","b","d") # Sort the table, the missing tier will be inserted at the end. tierTable.1 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.1 # --- Create a vector, in which the tier "c" is missing, # but define the place, where missing tiers will be inserted by "*" sortVector <- c("a","\\*", "b","d") # Sort the table. The missing tier "c" will be inserted in second place. tierTable.2 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.2 # Sort the table, but delete tiers that are missing in the sort vector # Note: If 'tiersDelete=TRUE' tiers that are missing in the # will be deleted, even if the 'sortVector' contains a "\*". tierTable.3 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector, tiersDelete=TRUE) tierTable.3 # --- Create a vector, which contains tier names that are not present in 'tierTable'. sortVector <- c("c","a","x", "y", "d","b") tierTable.4 <- act::helper_tiers_sort_table(tierTable=tierTable, sortVector=sortVector) tierTable.4
Gets the names of all transcript objects in a corpus object based from the @name attribute of each transcript.
helper_transcript_names_get(x)helper_transcript_names_get(x)
x |
Corpus object |
List
library(act) act::helper_transcript_names_get(examplecorpus)library(act) act::helper_transcript_names_get(examplecorpus)
Makes valid names for all transcript objects in a corpus object based on the names passed in 'transcriptNames' parameter. In particular, the functions also corrects names, which have to be non-empty and unique. The following options are performed in the mentioned order.
helper_transcript_names_make( transcriptNames, extractPatterns = NULL, searchPatterns = NULL, searchReplacements = NULL, toUpper = FALSE, toLower = FALSE, trim = FALSE, defaultEmpty = "no_name" )helper_transcript_names_make( transcriptNames, extractPatterns = NULL, searchPatterns = NULL, searchReplacements = NULL, toUpper = FALSE, toLower = FALSE, trim = FALSE, defaultEmpty = "no_name" )
transcriptNames |
Vector of character strings; Names of the transcripts to validate. |
extractPatterns |
Vector of character strings; Extract pattern as regular expression. Leave empty for no search-replace in the names. |
searchPatterns |
Vector of character strings; Search pattern as regular expression. Leave empty for no search-replace in the names. |
searchReplacements |
Vector of character strings; Replacements for search. Leave empty for no search-replace in the names. |
toUpper |
Logical; Convert transcript names all to upper case. |
toLower |
Logical; Convert transcript names all to lower case. |
trim |
Logical; Remove leading and trailing spaces in names. |
defaultEmpty |
Character string; Default value for empty transcript names (e.g., resulting from search-replace operations) |
List
library(act) # make some names with an empty value "" and a duplivate "d" transcriptNames <- c("a", "b", "", "d", "d") act::helper_transcript_names_make(transcriptNames)library(act) # make some names with an empty value "" and a duplivate "d" transcriptNames <- c("a", "b", "", "d", "d") act::helper_transcript_names_make(transcriptNames)
Sets the names of all transcript objects in a corpus object both in the names of the list x@transcripts and in the slot @name of each transcript.
helper_transcript_names_set(x, transcriptNames)helper_transcript_names_set(x, transcriptNames)
x |
Corpus object |
transcriptNames |
Vector of character strings; new names. |
List
library(act) # get current names of the transcripts names.old <- act::helper_transcript_names_get(examplecorpus) # rename giving numbers as names names.test <- as.character(seq(1:length(names.old))) test <- act::helper_transcript_names_set(examplecorpus, names.test) names(test@transcripts) # create an error: empty name ## Not run: names.test <- names.old names.test[2] <- " " test <- act::helper_transcript_names_set(examplecorpus, names.test) ## End(Not run) # create an error: double names ## Not run: names.test <- names.old names.test[2] <- names.test[1] test <- act::helper_transcript_names_set(examplecorpus, names.test) ## End(Not run)library(act) # get current names of the transcripts names.old <- act::helper_transcript_names_get(examplecorpus) # rename giving numbers as names names.test <- as.character(seq(1:length(names.old))) test <- act::helper_transcript_names_set(examplecorpus, names.test) names(test@transcripts) # create an error: empty name ## Not run: names.test <- names.old names.test[2] <- " " test <- act::helper_transcript_names_set(examplecorpus, names.test) ## End(Not run) # create an error: double names ## Not run: names.test <- names.old names.test[2] <- names.test[1] test <- act::helper_transcript_names_set(examplecorpus, names.test) ## End(Not run)
Advice: In most situations it is more convenient to use act::corpus_new, act::corpus_import for importing annotation files.
import(..., transcriptName = NULL)import(..., transcriptName = NULL)
... |
file path, contents of an annotation file or rPraat object; see description above. |
transcriptName |
Character string; name of the transcript, if this parameter is set, the default name of the transcript will be changed. |
Imports the contents of an annotation file and returns a transcript object.
The input to this function in the parameter '...' may either be
(1) the path to an annotation file (Currently 'ELAN' .eaf, 'EXMARaLDA .exb, 'Praat' .TextGrid and 'Subrib title' .srt files),
(2) the contents of an annotation file obtained from the @file.content or by reading the contents of the files directly with read.lines() or
(3) a rPraat TextGrid object.
Only the first input to '...' will be processed
Transcript object.
corpus_import, corpus_new, import_eaf, import_exb, import_rpraat, import_srt, import_textgrid
library(act) # To import an annotation file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_FILE_ON_YOUR_COMPUTER" ## End(Not run) # Path to a .TextGrid file that you want to read filePath <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") t <- act::import(filePath=filePath) t # Path to an .eaf file that you want to read filePath <- system.file("extdata", "examplecorpus", "SYNC", "SYNC_rotar_y_flexionar.eaf", package="act") t <- act::import(filePath=filePath) t # Content of a .TextGrid file, e.g. as stored in \code{@file.content} # of a transcript object. fileContent <- examplecorpus@transcripts[['ARG_I_CHI_Santi']]@file.content t <- act::import(fileContent=fileContent) t # Content of an .eaf file, e.g. as stored in \code{@file.content} # of a transcript object. fileContent <- examplecorpus@transcripts[['SYNC_rotar_y_flexionar']]@file.content t <- act::import(fileContent=fileContent) tlibrary(act) # To import an annotation file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_FILE_ON_YOUR_COMPUTER" ## End(Not run) # Path to a .TextGrid file that you want to read filePath <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") t <- act::import(filePath=filePath) t # Path to an .eaf file that you want to read filePath <- system.file("extdata", "examplecorpus", "SYNC", "SYNC_rotar_y_flexionar.eaf", package="act") t <- act::import(filePath=filePath) t # Content of a .TextGrid file, e.g. as stored in \code{@file.content} # of a transcript object. fileContent <- examplecorpus@transcripts[['ARG_I_CHI_Santi']]@file.content t <- act::import(fileContent=fileContent) t # Content of an .eaf file, e.g. as stored in \code{@file.content} # of a transcript object. fileContent <- examplecorpus@transcripts[['SYNC_rotar_y_flexionar']]@file.content t <- act::import(fileContent=fileContent) t
Advice: In most situations it is more convenient to use act::corpus_new, act::corpus_import for importing annotation files.
Imports the contents of a 'ELAN' .eaf file and returns a transcript object.
The input to this function is either the path to an .eaf file or the contents of a .eaf file obtained from the @file.content of an existing transcript object by readLines().
If you pass 'fileContent' you need to pass 'transcriptName' as parameter, too.
import_eaf(filePath = NULL, fileContent = NULL, transcriptName = NULL)import_eaf(filePath = NULL, fileContent = NULL, transcriptName = NULL)
filePath |
Character string; input path of a single 'ELAN' .eaf file. |
fileContent |
Vector of character strings; contents of an 'ELAN' .eaf file read by |
transcriptName |
Character string; name of the transcript. |
Please note:
'ELAN' offers a variety of tier types, some including dependencies from other tiers. Therefore not all annotations do actually have a time value. Missing values will be detected in the superordinate tier or will be interpolated. You will not be able to recognize interpolated values in the annotations.
Please also note that dependencies between tiers in you .eaf file are not reflected in the transcript object within the act package.
Transcript object.
corpus_import, corpus_new, import, import_exb, import_rpraat, import_textgrid
library(act) # Path to an .eaf file that you want to read path <- system.file("extdata", "examplecorpus", "SYNC", "SYNC_rotar_y_flexionar.eaf", package="act") # To import a .eaf file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_EAF_ON_YOUR_COMPUTER" ## End(Not run) t <- act::import_eaf(filePath=path) t # Content of an .eaf file (already read by \code{readLines}), # e.g. from an existing transcript object: mycontent <- examplecorpus@transcripts[['SYNC_rotar_y_flexionar']]@file.content t <- act::import_eaf(fileContent=mycontent, transcriptName="test") tlibrary(act) # Path to an .eaf file that you want to read path <- system.file("extdata", "examplecorpus", "SYNC", "SYNC_rotar_y_flexionar.eaf", package="act") # To import a .eaf file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_EAF_ON_YOUR_COMPUTER" ## End(Not run) t <- act::import_eaf(filePath=path) t # Content of an .eaf file (already read by \code{readLines}), # e.g. from an existing transcript object: mycontent <- examplecorpus@transcripts[['SYNC_rotar_y_flexionar']]@file.content t <- act::import_eaf(fileContent=mycontent, transcriptName="test") t
Advice: In most situations it is more convenient to use act::corpus_new, act::corpus_import for importing annotation files.
import_exb(filePath = NULL, fileContent = NULL, transcriptName = NULL)import_exb(filePath = NULL, fileContent = NULL, transcriptName = NULL)
filePath |
Character string; input path of a single 'EXMARaLDA' .exb file. |
fileContent |
Vector of character strings; contents of a 'EXMARaLDA' .exb file . |
transcriptName |
Character string; name of the transcript. |
Imports the contents of a 'EXMARaLDA' .exb file and returns a transcript object.
The source is either the path to a .exb file or the contents of a .exb file obtained from the @file.content of an existing transcript object.
If you pass 'fileContent' you need to pass 'transcriptName' as parameter, too.
Please note:
'EXMARaLDA' allows for empty time slots without a time values. Missing values will be interpolated during the import. You will not be able to recognize interpolated values in the data.
Meta data for tiers (such as the display name etc.) will not be imported.
Media files are referenced not by their path but only as file names in .exb files. The names will be imported but will not work as paths in act.
Transcript object.
corpus_import, corpus_new, import, import_eaf, import_rpraat, import_textgrid
library(act) ## Not run: # To import a .TextGrid file of your choice: filePath <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" t <- act::import_exb(filePath=filePath) t ## End(Not run)library(act) ## Not run: # To import a .TextGrid file of your choice: filePath <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" t <- act::import_exb(filePath=filePath) t ## End(Not run)
This function is to create compatibility with the rPraat package.
It converts a 'rPraat' TextGrid object into an act transcript object.
import_rpraat(rpraatTextgrid, transcriptName = NULL)import_rpraat(rpraatTextgrid, transcriptName = NULL)
rpraatTextgrid |
List; rPraat TextGrid object. |
transcriptName |
Character string; name of the transcript. |
Please note:
Time values of annotations in TextGrids may be below 0 seconds. Negative time values will be recognized corretly in the first place. When exporting transcript object to other formats like 'ELAN' .eaf, 'EXMARaLDA' .exb ect. annotations that are completely before 0 sec will be deleted, annotations that start before but end after 0 sec will be truncated. Please see also the function act::transcripts_cure_single.
TextGrids and contained tiers may start and end at different times. These times do not need to match each other. The act package does not support start and end times of TextGrids and tiers and will. The default start of a TextGrid will be 0 seconds or the lowest value in case that annotations start below 0 seconds.
Credits: Thanks to Tomáš Bořil, the author of the rPraat package, for commenting on the exchange functions.
Transcript object.
corpus_import, corpus_new, import, import_eaf, import_exb, import_textgrid, export_rpraat
library(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) # Make sure to have rPraat installed before you try the following ## Not run: # Read TextGrid file with rPraat rPraat.tg <- rPraat::tg.read(path) # Convert to an act transcript t <- act::import_rpraat(rPraat.tg) # Change the name and add it to the examplecorpus t@name <-"rpraat" newcorpus <- act::transcripts_add(examplecorpus, t) # Have a look newcorpus@transcripts[["rpraat"]] # Alternatively, you can use the general import function t <- act::import(rPraat.tg) ## End(Not run)library(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) # Make sure to have rPraat installed before you try the following ## Not run: # Read TextGrid file with rPraat rPraat.tg <- rPraat::tg.read(path) # Convert to an act transcript t <- act::import_rpraat(rPraat.tg) # Change the name and add it to the examplecorpus t@name <-"rpraat" newcorpus <- act::transcripts_add(examplecorpus, t) # Have a look newcorpus@transcripts[["rpraat"]] # Alternatively, you can use the general import function t <- act::import(rPraat.tg) ## End(Not run)
Advice: In most situations it is more convenient to use act::corpus_new, act::corpus_import for importing annotation files.
import_srt(filePath, transcriptName = NULL, tierName = "subtitle")import_srt(filePath, transcriptName = NULL, tierName = "subtitle")
filePath |
Character string; input path of a single 'subribtitle' .srt file. |
transcriptName |
Character string; name of the transcript. |
tierName |
Character string; name of the imported tier |
Transcript object.
corpus_import, corpus_new, import, import_eaf, import_exb, import_textgrid, import_rpraat
library(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) # Make sure to have rPraat installed before you try the following ## Not run: # Read TextGrid file with rPraat rPraat.tg <- rPraat::tg.read(path) # Convert to an act transcript t <- act::import_rpraat(rPraat.tg) # Change the name and add it to the examplecorpus t@name <-"rpraat" newcorpus <- act::transcripts_add(examplecorpus, t) # Have a look newcorpus@transcripts[["rpraat"]] # Alternatively, you can use the general import function t <- act::import(rPraat.tg) ## End(Not run)library(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) # Make sure to have rPraat installed before you try the following ## Not run: # Read TextGrid file with rPraat rPraat.tg <- rPraat::tg.read(path) # Convert to an act transcript t <- act::import_rpraat(rPraat.tg) # Change the name and add it to the examplecorpus t@name <-"rpraat" newcorpus <- act::transcripts_add(examplecorpus, t) # Have a look newcorpus@transcripts[["rpraat"]] # Alternatively, you can use the general import function t <- act::import(rPraat.tg) ## End(Not run)
Advice: In most situations it is more convenient to use act::corpus_new, act::corpus_import for importing annotation files.
import_textgrid(filePath = NULL, fileContent = NULL, transcriptName = NULL)import_textgrid(filePath = NULL, fileContent = NULL, transcriptName = NULL)
filePath |
Character string; input path of a single 'Praat' .TextGrid file. |
fileContent |
Vector of character strings; contents of a 'Praat' .TextGrid file read with |
transcriptName |
Character string; name of the transcript. |
Imports the contents of a 'Praat' .TextGrid file and returns a transcript object.
The source is either the path to a .TextGrid file or the contents of a .TextGrid file obtained from the @file.content of an existing transcript object by readLines().
If you pass 'fileContent' you need to pass 'transcriptName' as parameter, too.
Please note:
Time values of annotations in TextGrids may be below 0 seconds. Negative time values will be recognized corretly in the first place. When exporting transcript object to other formats like 'ELAN' .eaf, 'EXMARaLDA' .exb ect. annotations that are completely before 0 sec will be deleted, annotations that start before but end after 0 sec will be truncated. Please see also the function act::transcripts_cure_single.
TextGrids and contained tiers may start and end at different times. These times do not need to match each other. The act package does not support start and end times of TextGrids and tiers and will. The default start of a TextGrid will be 0 seconds or the lowest value in case that annotations start below 0 seconds.
Transcript object.
corpus_import, corpus_new, import, import_eaf, import_exb, import_rpraat
library(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) t <- act::import_textgrid(filePath=path) t # Content of a .TextGrid (already read by \code{readLines}), # e.g. from an existing transcript object: mycontent <- examplecorpus@transcripts[[1]]@file.content t <- act::import_textgrid(fileContent=mycontent, transcriptName="test") tlibrary(act) # Path to the .TextGrid file that you want to read path <- system.file("extdata", "examplecorpus", "GAT", "ARG_I_PAR_Beto.TextGrid", package="act") # To import a .TextGrid file of your choice: ## Not run: path <- "PATH_TO_AN_EXISTING_TEXTGRID_ON_YOUR_COMPUTER" ## End(Not run) t <- act::import_textgrid(filePath=path) t # Content of a .TextGrid (already read by \code{readLines}), # e.g. from an existing transcript object: mycontent <- examplecorpus@transcripts[[1]]@file.content t <- act::import_textgrid(fileContent=mycontent, transcriptName="test") t
Gives detailed information about the contents of a corpus object or a transcript object that is passed as parameter to the function. In the case that you want to pass a transcript object form a corpus object, make sure that you access the transcript using double [[]] brackets.
info(...)info(...)
... |
object; either a corpus or a transcript object. |
To get summarized information about the transcript and corpus objects use act::info_summarized.
List.
library(act) act::info(examplecorpus) act::info(examplecorpus@transcripts[[1]])library(act) act::info(examplecorpus) act::info(examplecorpus@transcripts[[1]])
Gives summarized information about the contents of a corpus object or a transcript object that is passed as parameter to the function. In the case that you want to pass a transcript object form a corpus object, make sure that you access the transcript using double [[]] brackets.
info_summarized(...)info_summarized(...)
... |
object; either a corpus or a transcript object. |
To get more detailed information about the tiers in a corpus object use act::info.
List.
library(act) act::info_summarized(examplecorpus) act::info_summarized(examplecorpus@transcripts[[1]])library(act) act::info_summarized(examplecorpus) act::info_summarized(examplecorpus@transcripts[[1]])
You can create an new layout object with methods::new("layout").
This will give you a new layout object with the default settings uses by act.
If you want to modify the layout of the print transcripts, create a new layout object with mylayout <- methods::new("layout"), modify the values in the @slots and pass it as argument l to the respective functions.
nameCharacter string; Name of the layout.
filter.tier.includeRegExCharacter string; as regular expression, tiers matching the expression will be included in the print transcript.
filter.tier.excludeRegExCharacter string; as regular expression, tiers matching the expression will be excluded from the print transcript.
transcript.widthInteger; width of transcript, -1 for no line wrapping.
speaker.regexCharacter string; Regular expression to extract speaker abbreviation from tier name
speaker.widthInteger; maximum width of speaker abbreviation, -1 for full name without shortening.
speaker.endingCharacter string; string that is added at the end of the speaker name.
spacesbeforeInteger; number of spaces inserted before line number.
brackets.alignLogical; if TRUE act will try to align brackets [] for parallel speaking (Attention: experimental function; results may not satisfy).
header.insertLogical; if TRUE a transcript header is inserted.
arrow.insertLogical; is only used when transcripts are made based on a search results; if TRUE an arrow will be inserted, highlighting the transcript line containing the search hit.
arrow.shapeCharacter string; shape of the arrow.
docx.template.pathCharacter string;
docx.stylesData.frame; Matrix with mappings of act variables and the format templates in the .docx template files. To change the styles matrix use [email protected] <- act::export_docx_styles_load(path="...")
act::matrix_load
This function is only for checking how the normalization matrix will be loaded internally.
matrix_load(path = NULL, encoding = "UTF-8")matrix_load(path = NULL, encoding = "UTF-8")
path |
Character string; path to the replacement matrix (a .csv file). If argument is left open, the default replacement matrix of the package will be returned. |
encoding |
Character string; encoding of the file. |
Data.frame
library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # Have a look at the matrix colnames(mymatrix) mymatrix #the original path of the matrix is stored in the attributes attr(mymatrix, 'path')library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # Have a look at the matrix colnames(mymatrix) mymatrix #the original path of the matrix is stored in the attributes attr(mymatrix, 'path')
Save replacement matrix
matrix_save(replacementMatrix, path, encoding = "UTF-8")matrix_save(replacementMatrix, path, encoding = "UTF-8")
replacementMatrix |
Data frame; replacement matrix. |
path |
Character string; path where the matrix will be saved. |
encoding |
Character string; encoding of the file. |
nothing
library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # ' # Create temporary file path path <- tempfile(pattern = "mymatrix", tmpdir=tempdir(), fileext = ".csv") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", "mymatrix.csv") ## End(Not run) # Save the matrix act::matrix_save(mymatrix, path=path)library(act) # An example replacement matrix comes with the package. path <- system.file("extdata", "normalization", "normalizationMatrix.csv", package="act") # Load the matrix mymatrix <- act::matrix_load(path) # ' # Create temporary file path path <- tempfile(pattern = "mymatrix", tmpdir=tempdir(), fileext = ".csv") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", "mymatrix.csv") ## End(Not run) # Save the matrix act::matrix_save(mymatrix, path=path)
Searches for media files in folders and assigns the links to transcript objects in a corpus. The function uses the name of the transcript to find the media files, e.g. the function assumes that the annotation files have the same name as the media files, except from the suffix/the file type.
media_assign( x, searchPaths = NULL, searchSubfolders = TRUE, filterFile = "", namesExtractPattern = "", transcriptNames = NULL, deleteExistingMedia = TRUE, onlyUniqueFiles = TRUE, audioAsFallback = TRUE )media_assign( x, searchPaths = NULL, searchSubfolders = TRUE, filterFile = "", namesExtractPattern = "", transcriptNames = NULL, deleteExistingMedia = TRUE, onlyUniqueFiles = TRUE, audioAsFallback = TRUE )
x |
Corpus object. |
searchPaths |
Vector of character strings; paths where media files should be searched; if path is not defined, the paths given in |
searchSubfolders |
Logical; if |
filterFile |
Character string; Regular expression of files to look for. |
namesExtractPattern |
Character string; Regular Expression to match a part of the transcript name to seach for media files. |
transcriptNames |
Vector of character strings; Names of the transcripts for which you want to search media files; leave empty if you want to search media for all transcripts in the corpus object. |
deleteExistingMedia |
Logical; if |
onlyUniqueFiles |
Logical; if |
audioAsFallback |
Logical; if |
Only the the file types set in options()$act.fileformats.audio and options()$act.fileformats.video will be recognized.
You can modify these options to recognize other media types.
Corpus object.
media_delete, media_path_to_existing_file
library(act) # Set the folder(s) where your media files are located in the corpus object # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. [email protected] <- c("", "") examplecorpus <- act::media_assign(examplecorpus)library(act) # Set the folder(s) where your media files are located in the corpus object # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. examplecorpus@paths.media.files <- c("", "") examplecorpus <- act::media_assign(examplecorpus)
Delete media files links from transcript objects
media_delete(x, transcriptNames = NULL)media_delete(x, transcriptNames = NULL)
x |
Corpus object. |
transcriptNames |
Vector of character strings; Names of the transcripts for which you want to search media files; leave empty if you want to search media for all transcripts in the corpus object. |
Corpus object.
media_assign, media_path_to_existing_file
library(act) examplecorpus <- act::media_delete(examplecorpus)library(act) examplecorpus <- act::media_delete(examplecorpus)
Format the names attribute of the media paths. Relevant for media cuts with search_cuts. Names attribute can be used to differentiate multiple cuts with of the same media type (e.g. mp4 from different cameras).
media_format_names( x, recreateNames = TRUE, pattern, replacement = NULL, conditionalPreserve = FALSE )media_format_names( x, recreateNames = TRUE, pattern, replacement = NULL, conditionalPreserve = FALSE )
x |
Corpus object. |
recreateNames |
Logical; if |
pattern |
Character strings; Regular Expression as search pattern. |
replacement |
Character strings; Replacement pattern. |
conditionalPreserve |
Logical; if |
Corpus object.
media_assign, media_delete, search_cuts, \
library(act) # Set the folder(s) where your media files are located in the corpus object # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. [email protected] <- c("", "") examplecorpus <- act::media_assign(examplecorpus)library(act) # Set the folder(s) where your media files are located in the corpus object # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. examplecorpus@paths.media.files <- c("", "") examplecorpus <- act::media_assign(examplecorpus)
Gets the path of a media file for a transcript
media_path_to_existing_file( t, filterMediaFile = c(".*\\.(mp4|mov)", ".*\\.(aiff|aif|wav)", ".*\\.mp3") )media_path_to_existing_file( t, filterMediaFile = c(".*\\.(mp4|mov)", ".*\\.(aiff|aif|wav)", ".*\\.mp3") )
t |
transcript object; transcript for which you want to get the media path. |
filterMediaFile |
Vector of character strings; Each element of the vector is a regular expression. Expressions will be checked consecutively. The first match with an existing media file will be used for playing. The default checking order is video > uncompressed audio > compressed audio. |
Character string; path to a media file, or NULL if no existing media file has been found.
library(act) # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. # You can access the media files linked to a transcript directly using # the object properties. examplecorpus@transcripts[["SYNC_rotar_y_flexionar"]]@media.path # Get only media files of a certain type, e.g. a wav file, and return only the first match: act::media_path_to_existing_file(examplecorpus@transcripts[["SYNC_rotar_y_flexionar"]], filterMediaFile=".*\\.wav")library(act) # Please be aware that that the example corpus that comes with the package # does NOT contain media files. Please download the entire example corpus # with media files if you want to use this function reasonably. # You can access the media files linked to a transcript directly using # the object properties. examplecorpus@transcripts[["SYNC_rotar_y_flexionar"]]@media.path # Get only media files of a certain type, e.g. a wav file, and return only the first match: act::media_path_to_existing_file(examplecorpus@transcripts[["SYNC_rotar_y_flexionar"]], filterMediaFile=".*\\.wav")
Loads all .docx files in a folder and merges them to a singe .docx file.
merge_docx(folderInput, pathTemplateInput, pathOutput = NULL, recursive = TRUE)merge_docx(folderInput, pathTemplateInput, pathOutput = NULL, recursive = TRUE)
folderInput |
Character string; Path to a existing folder containing the .docx files |
pathTemplateInput |
Character string; Path to .docx file used as a template, where the other files will be inserted. |
pathOutput |
Character string; Optional. Output path were to save result. If parameter is not set, the print transcripts will only be returned. |
recursive |
Logical; if |
merged .docx in officeR format,
delete all options set by the package from R options
options_delete()options_delete()
library(act) act::options_delete()library(act) act::options_delete()
Reset options to default values
options_reset()options_reset()
library(act) act::options_reset()library(act) act::options_reset()
The package has numerous options that change the internal workings of the package.
options_show()options_show()
There are several options that change the way the package works. They are set globally.
Use options(name.of.option = value) to set an option.
Use options()$name.of.option to get the current value of an option.
Use act::options_reset to set all options to the default value.
Use act::options_delete to clean up and delete all option settings.
The package uses the following options.
Program
act.excamplecorpusURL character strings; where to download example media files.
act.updateX Logical; If TRUE the original corpus object 'x' passed passed to the search functions search_new and search_run will also be updated, in case that during the search fulltexts are created or the normalization is performed.
act.showprogress logical; if TRUE a progress bar will be shown during (possibly) time consuming operations.
Paths
act.path.praat Character string; path to the 'Praat' executable on your computer. Only necessary if you use the functions to remote control Praat using Praat scripts.
act.path.sendpraat Character string; path to the 'sendpraat' executable on your computer. Only necessary if you use the functions to remote control Praat using Praat scripts.
act.path.elan Character string; path to the 'ELAN' executable on your computer. Only necessary if you want to open search results in ELAN.
File formats
act.fileformats.video Vector of character strings; Suffixes of video files that will be identified; default is 'c("mp4", "mov")'.
act.fileformats.audio Vector of character strings; Suffixes of audio files that will be identified; default is 'c("wav", "aif", "mp3")'.
FFMPEG commands and options
act.ffmpeg.command.main Character string; 'FFmpeg' command that is used for cutting media files (audio & video).
act.ffmpeg.command.main.fast Character string; 'FFmpeg' command that is used for cutting video files using the 'FFmpeg' option 'fast video positioning'. This is considerably faster when working with long video files.
act.ffmpeg.command.audioAsMP3 Character string; 'FFmpeg' command that is used for cutting/generating compressed audio files as mp3 (for other audio, the main command is used).
act.ffmpeg.command.images Character string; 'FFmpeg' command that is used for extracting still images.
act.ffmpeg.command.images.fast Character string; 'FFmpeg' command that is used for extracting still images using the 'FFmpeg' option 'fast video positioning'. This is considerably faster when working with long video files.
act.ffmpeg.exportchannels.fromColumnName Character string; Name of the column in the data frame s@results from information, which audio channel to export, will be taken.
Import annotation files
act.import.readEmptyIntervals Logical; if TRUE empty intervals in you annotation files will be read, if FALSE empty intervals will be skipped.
act.import.scanSubfolders Logical; if TRUE sub folders will also be scanned for annotation files; if FALSE only the main level of the folders specified in pathsAnnotationFiles of your corpus object will be scanned.
act.import.storefileContentInTranscript if TRUE the contents of the original annotation file will be stored in [email protected]. Set to FALSE if you want to keep your corpus object small.
Export
act.export.filename.fromColumnName Character string; Name of the column from which the file names for exported files will be taken.
act.export.folder.grouping1.fromColumnName Character string; Name of sub folders that will be created in the folder of the search result, level 1.
act.export.folder.grouping2.fromColumnName Character string; Name of sub folders that will be created in the folder of the search result, level 2.
Miscellaneous
act.separator_between_intervals Character; Single character that is used for separating intervals when creating the full text.
act.separator_between_tiers Character; Single character that is used for separating tiers when creating the full text.
act.separator_between_words Character string; regular expression with alternatives that count as separators between words. Used for preparing the concordance.
act.wordCountRegEx Character string; regular expression that is used to count words.
act.pauseIdentifierGATRegEx Character string; regular expression that is used to identify pauses in GAT transcription
Nothing.
library(act) ## Not run: act::options_show() ## End(Not run)library(act) ## Not run: act::options_show() ## End(Not run)
Make concordance for search results
search_concordance(x, s, searchNormalized = TRUE)search_concordance(x, s, searchNormalized = TRUE)
x |
Corpus object. |
s |
Search object. |
searchNormalized |
Logical; if |
Search object.
library(act) # Search for the 1. Person Singular Pronoun in Spanish # Search without creating the concordance immediately. # This is for example useful if you are working with a large corpus, since # making the concordance may take a while. mysearch <- act::search_new(examplecorpus, pattern="yo", concordanceMake=FALSE) mysearch@results[1,] # The results do not contain the concordance, it is only 15 columns ncol(mysearch@results) # Make the concordance mysearch.new <- act::search_concordance(x=examplecorpus, s=mysearch) ncol(mysearch.new@results)library(act) # Search for the 1. Person Singular Pronoun in Spanish # Search without creating the concordance immediately. # This is for example useful if you are working with a large corpus, since # making the concordance may take a while. mysearch <- act::search_new(examplecorpus, pattern="yo", concordanceMake=FALSE) mysearch@results[1,] # The results do not contain the concordance, it is only 15 columns ncol(mysearch@results) # Make the concordance mysearch.new <- act::search_concordance(x=examplecorpus, s=mysearch) ncol(mysearch.new@results)
This function will call the following functions:
act_cuts_printtranscript to create print transcripts,
act::cuts_media to create FFmpeg cutlist to make media snippets,
act::search_cuts_srt() to create .srt subtitles,
for all search results.
For a detailed description including examples please refer to the documentation of the indidival functions. They also offer some more parameters than this functions. If you want to use those, call the functions individually.
search_cuts( x, s, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, l = NULL, folderOutput = NULL )search_cuts( x, s, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, l = NULL, folderOutput = NULL )
x |
Corpus object. |
s |
Search object. |
cutSpanBeforesec |
Double; Start the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
cutSpanAftersec |
Double; End the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
l |
Layout object. |
folderOutput |
Character string; if parameter is not set, the print transcripts will only be inserted in |
Search object;
library(act) # IMPORTANT: In the example corpus all transcripts are assigned media links. # The actual media files are, however, not included in when installing the package # due to size limitations of CRAN. # But you may download the media files separately. # Please see the section 'examplecorpus' for instructions. # --> You will need the media files to execute the following example code. ## Not run: # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create print transcripts, media cutlists and .srt subtitles # for all search results test <- act::search_cuts(x=examplecorpus, s=mysearch) # Display all print transcripts on screen from @cuts.printtranscripts cat([email protected]) # Display cutlist on screen from @cuts.cutlist.mac cat([email protected]) # Display .srt subtitles cat(test@results[, [email protected]]) ## End(Not run)library(act) # IMPORTANT: In the example corpus all transcripts are assigned media links. # The actual media files are, however, not included in when installing the package # due to size limitations of CRAN. # But you may download the media files separately. # Please see the section 'examplecorpus' for instructions. # --> You will need the media files to execute the following example code. ## Not run: # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create print transcripts, media cutlists and .srt subtitles # for all search results test <- act::search_cuts(x=examplecorpus, s=mysearch) # Display all print transcripts on screen from @cuts.printtranscripts cat(test@cuts.printtranscripts) # Display cutlist on screen from @cuts.cutlist.mac cat(test@cuts.cutlist.mac) # Display .srt subtitles cat(test@results[, mysearch@cuts.column.srt]) ## End(Not run)
This function creates FFmpeg commands to cut media files for each search results.
If you want to execute the commands (and cut the media files) you need to have FFmpeg installed on you computer. To install FFmpeg you can follow the instructions given in the vignette 'installation-ffmpeg'. Show the vignette with vignette("installation-ffmpeg").
search_cuts_media( x, s, exportMedia = TRUE, exportStills = TRUE, exportThumbnail = TRUE, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, folderOutput = NULL, filterMediaInclude = "", videoFastPositioning = TRUE, videoCodecCopy = FALSE, audioAsMP3 = FALSE, panning = NULL, outputOS = c("mac", "win"), outputFileName = "FFMPEG_cutlist" )search_cuts_media( x, s, exportMedia = TRUE, exportStills = TRUE, exportThumbnail = TRUE, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, folderOutput = NULL, filterMediaInclude = "", videoFastPositioning = TRUE, videoCodecCopy = FALSE, audioAsMP3 = FALSE, panning = NULL, outputOS = c("mac", "win"), outputFileName = "FFMPEG_cutlist" )
x |
Corpus object; Please note: all media paths for a transcript need to be given as a list in the corpus object in |
s |
Search object. |
exportMedia |
Logical; If |
exportStills |
Logical; If |
exportThumbnail |
Logical; If |
cutSpanBeforesec |
Double; Start the media cut some seconds before the hit to include some context; the default |
cutSpanAftersec |
Double; End the media cut some seconds before the hit to include some context; the default |
folderOutput |
Character string; path to folder where files will be written. |
filterMediaInclude |
Character string; regular expression to match only some of the media files in |
videoFastPositioning |
Logical; If |
videoCodecCopy |
Logical; If |
audioAsMP3 |
Logical; If |
panning |
Integer; 0=leave audio as is (ch1&ch2) , 1=only channel 1 (ch1), 2=only channel 2 (ch2), 3=both channels separated (ch1&ch2), 4=all three versions (ch1&ch2, ch1, ch2). This setting will override the option made in 'act.ffmpeg.exportchannels.fromColumnName' . |
outputOS |
Vector of character Strings; Saves FFMpeg cut list in format for |
outputFileName |
Character String; Name of the cut list. |
Cut lists
The commands are collected in cut lists.
The cut lists will be stored in different ways:
A cut list for for ALL search results will be stored in [email protected] to be used on MacOS and [email protected] to be used on Windows.
Individual cut lists for EACH search result will be stored in additional columns in the data frame s@results.
The cut lists that can be executed in the Terminal (Apple) or the Command Line Interface (Windows).
Input media files
The function will use all files in corpus@transcripts[[ ]]@media.path.
Therefore you will need to set the options filterMediaInclude filtering for which input media files you want to create the cuts.
The filter is a regular expression, e.g. '\.(wav|aif)' for '.wav' and '.aif' audio files or '\.mp4' for '.mp4' video files.
Output file names
The file names of the cuts are taken from the resultID column in the search results table. To change this, modify the contents of this column or define a different column using options(act.export.filename.fromColumnName.
By default, the name of the original media will be appended to the file name. The name of the media file will be taken from the names() attribute of each media file listed in the corpus@transcripts[]@media.path. To change this, modify the names()attribute.
Output format
The output format is predefined by in the options:
act.ffmpeg.command.main: defines the basic FFmpeg command (used for audio & video)
act.ffmpeg.command.main.fast: defines the FFmpeg command to be used with large video files.
act.ffmpeg.command.audioAsMP3: defines the FFmpeg used for converting audio files to MP3
act.ffmpeg.command.images. Defines the FFmpeg command used for extracting still images.
For video, the default is to generate mp4 cuts. You can also use the following commands to change the output format:
MP4 video cuts with original video quality:
options(act.ffmpeg.command.main = 'ffmpeg -i "INFILEPATH" -ss TIMESTART -t TIMEDURATION OPTIONS -y "OUTFILEPATH.mp4" -hide_banner')
options(act.ffmpeg.command.main.fast = 'ffmpeg -ss TIMESTARTMINUS10SECONDS -i "INFILEPATH" -ss 10.000 -t TIMEDURATION OPTIONS -y "OUTFILEPATH.mp4" -hide_banner')
MP4 video cuts with reduced video quality:
options(act.ffmpeg.command.main = 'ffmpeg -i "INFILEPATH" -ss TIMESTART -t TIMEDURATION OPTIONS -vf scale=1920:-1 -b:v 1M -b:a 192k -y "OUTFILEPATH.mp4" -hide_banner')
options(act.ffmpeg.command.main.fast = 'ffmpeg -ss TIMESTARTMINUS10SECONDS -i "INFILEPATH" -ss 10.000 -t TIMEDURATION OPTIONS -vf scale=1920:-1 -b:v 6M -b:a 192k -y "OUTFILEPATH.mp4" -hide_banner')
Extract stills
To extract stills the time information and the file names need to be stored in the search object in the results data frame, e.g. in s@results.
The information needs to be contained in a vector.
The elements of the vector are the time values in seconds, the name attribute will be used as file name, e.g. myStills <- c(still1=1.0, still2=2.0).
The vector needs stored for each search result in the column called stills.values as a list, e.g code s@results$stills.values[[1]] <- myStills.
Stills will be stored in a sub folder called stills by default, if not otherwise defined in the column stills.folder.
Please see the section options_show to customize the behavior of the function.
Advanced options
You can adjust the FFmpeg commands according to your needs.
The following options define the FFmpeg command that will be used by the package. The command needs to contain place holders which will be replaced by the actual values in the package. If you want to define your own ffmpeg command, please make sure to use the following placeholders:
INfilePath path to the input media file.
OUTFILEPATH path where the output media file will be saved
OPTIONS FFmpeg options that will be applied additionally, in particular fast video positioning.
TIMESTART time in seconds where to begin the cutting
TIMESTARTMINUS10SECONDS time in seconds where to begin the cutting, in case that fast video positioning is being used.
TIMEDURATION duration of cuts.
Search object; cut lists will be stored in [email protected] and [email protected].
library(act) # IMPORTANT: In the example corpus all transcripts are assigned media links. # The actual media files are, however, not included in when installing the package # due to size limitations of CRAN. # But you may download the media files separately. # Please see the section 'examplecorpus' for instructions. # --> You will need the media files to execute the following example code. ## Not run: # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create cut lists mysearch <- act::search_cuts_media (x=examplecorpus, s=mysearch) # Check results for Mac: # Get entire cut list for Mac and display on screen, # so you can copy&paste this into the Terminal mycutlist <- [email protected] cat(mycutlist) # Cut list for first search result mycutlist <- mysearch@results$cuts.cutlist.mac[[1]] cat(mycutlist) # Check results for Windows: # Get entire cut list for Mac and display on screen, # so you can copy&paste this into the CLI mycutlist <- [email protected] cat(mycutlist) # Cut list for first search result mycutlist <- mysearch@results$cuts.cutlist.win[[1]] cat(mycutlist) # It is, however, more convenient to specify the argument 'folderOutput' in order to get # the cut list as a (executable) file/batch list. ## End(Not run)library(act) # IMPORTANT: In the example corpus all transcripts are assigned media links. # The actual media files are, however, not included in when installing the package # due to size limitations of CRAN. # But you may download the media files separately. # Please see the section 'examplecorpus' for instructions. # --> You will need the media files to execute the following example code. ## Not run: # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create cut lists mysearch <- act::search_cuts_media (x=examplecorpus, s=mysearch) # Check results for Mac: # Get entire cut list for Mac and display on screen, # so you can copy&paste this into the Terminal mycutlist <- mysearch@cuts.cutlist.mac cat(mycutlist) # Cut list for first search result mycutlist <- mysearch@results$cuts.cutlist.mac[[1]] cat(mycutlist) # Check results for Windows: # Get entire cut list for Mac and display on screen, # so you can copy&paste this into the CLI mycutlist <- mysearch@cuts.cutlist.win cat(mycutlist) # Cut list for first search result mycutlist <- mysearch@results$cuts.cutlist.win[[1]] cat(mycutlist) # It is, however, more convenient to specify the argument 'folderOutput' in order to get # the cut list as a (executable) file/batch list. ## End(Not run)
Print transcripts in the style of conversation analysis will be created for each search result.
The transcripts will be inserted into the column defined in [email protected].
All transcripts will be stored in [email protected].
search_cuts_printtranscript( x, s, l = NULL, exportTxt = TRUE, exportDocx = TRUE, headerInsertSource = TRUE, cutSpanBeforesec = 0, cutSpanAftersec = 0, folderOutput = NULL )search_cuts_printtranscript( x, s, l = NULL, exportTxt = TRUE, exportDocx = TRUE, headerInsertSource = TRUE, cutSpanBeforesec = 0, cutSpanAftersec = 0, folderOutput = NULL )
x |
Corpus object. |
s |
Search object. |
l |
Layout object. |
exportTxt |
Logical; If |
exportDocx |
Logical; If |
headerInsertSource |
Logical; if |
cutSpanBeforesec |
Double; Start the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
cutSpanAftersec |
Double; End the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
folderOutput |
Character string; if parameter is not set, the print transcripts will only be inserted in |
If you want to modify the layout of the print transcripts, create a new layout object with mylayout <- methods::new("layout"), modify the settings and pass it as argument l.
Search object;
library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create print transcripts for all search results test <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Display all print transcripts on screen from @cuts.printtranscripts cat([email protected]) # Display all print transcripts from results data frame cat(test@results[,[email protected]]) cat(test@results[,[email protected]]) # Only single print transcript from results data frame cat(test@results[1,[email protected]]) # Create print transcript snippets including 1 sec before and 5 sec after [email protected] =1 [email protected] = 5 test <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Display all transcript snippets on screen cat(test@results[,[email protected]])library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create print transcripts for all search results test <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Display all print transcripts on screen from @cuts.printtranscripts cat(test@cuts.printtranscripts) # Display all print transcripts from results data frame cat(test@results[,mysearch@cuts.column.printtranscript]) cat(test@results[,mysearch@cuts.column.printtranscript]) # Only single print transcript from results data frame cat(test@results[1,mysearch@cuts.column.printtranscript]) # Create print transcript snippets including 1 sec before and 5 sec after mysearch@cuts.span.beforesec =1 mysearch@cuts.span.aftersec = 5 test <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Display all transcript snippets on screen cat(test@results[,mysearch@cuts.column.printtranscript])
Subtitles in 'Subrib Title' .srt format will be created for each search result.
The subtitles will be inserted into the column defined in [email protected].
search_cuts_srt( x, s, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, folderOutput = NULL, speakerShow = TRUE, speakerWidth = 3, speakerEnding = ":" )search_cuts_srt( x, s, cutSpanBeforesec = NULL, cutSpanAftersec = NULL, folderOutput = NULL, speakerShow = TRUE, speakerWidth = 3, speakerEnding = ":" )
x |
Corpus object. |
s |
Search object. |
cutSpanBeforesec |
Double; Start the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
cutSpanAftersec |
Double; End the cut some seconds before the hit to include some context; the default NULL will take the value as set in @cuts.span.beforesec of the search object. |
folderOutput |
Character string; if parameter is not set, the srt subtitles will only be inserted in |
speakerShow |
Logical; if |
speakerWidth |
Integer; width of speaker abbreviation, -1 for full name without shortening. |
speakerEnding |
Character string; string that is added at the end of the speaker name. |
Span
If you want to extend the cut before or after each search result, you can modify @cuts.span.beforesec and @cuts.span.aftersec in your search object.
If you want to modify the layout of the print transcripts, create a new layout object with mylayout <- methods::new("layout"), modify the settings and pass it as argument l.
Search object;
library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create srt subtitles for all search results test <- act::search_cuts_srt (x=examplecorpus, s=mysearch) # Display srt subtitle of first three results cat(test@results[1:3, [email protected]]) # Create srt subtitle including 1 sec before and 5 sec after [email protected] = 1 [email protected] = 5 test <- act::search_cuts_srt (x=examplecorpus, s=mysearch) # Display srt subtitle of first results cat(test@results[1,[email protected]])library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") # Create srt subtitles for all search results test <- act::search_cuts_srt (x=examplecorpus, s=mysearch) # Display srt subtitle of first three results cat(test@results[1:3, mysearch@cuts.column.srt]) # Create srt subtitle including 1 sec before and 5 sec after mysearch@cuts.span.beforesec = 1 mysearch@cuts.span.aftersec = 5 test <- act::search_cuts_srt (x=examplecorpus, s=mysearch) # Display srt subtitle of first results cat(test@results[1,mysearch@cuts.column.srt])
Search a corpus object and return the names of all transcripts and tiers that match the given parameters. You can define parameters to include and/or exclude transcripts and tiers based on their names. All parameters passed to the function will be combined.
search_makefilter( x, filterTranscriptNames = NULL, filterTranscriptIncludeRegex = NULL, filterTranscriptExcludeRegex = NULL, filterTierNames = NULL, filterTierIncludeRegex = NULL, filterTierExcludeRegex = NULL )search_makefilter( x, filterTranscriptNames = NULL, filterTranscriptIncludeRegex = NULL, filterTranscriptExcludeRegex = NULL, filterTierNames = NULL, filterTierIncludeRegex = NULL, filterTierExcludeRegex = NULL )
x |
Corpus object. |
filterTranscriptNames |
Vector of character strings; Names of the transcripts that you want to include; to include all transcripts in the corpus object leave parameter empty or set to |
filterTranscriptIncludeRegex |
Character string; as regular expression, include transcripts matching the expression. |
filterTranscriptExcludeRegex |
Character string; as regular expression, exclude transcripts matching the expression. |
filterTierNames |
Vector of character strings; Names of the tiers that you want to include; to include all tiers in the corpus object leave parameter empty or set to |
filterTierIncludeRegex |
Character string; as regular expression, include tiers matching the expression. |
filterTierExcludeRegex |
Character string; as regular expression, exclude tiers matching the expression. |
This functions is useful if you want to use functions of the package such as transcripts_update_normalization, transcripts_update_fulltexts, corpus_export and limit them to only some of the transcripts.
List of character vectors. $filterTranscriptNames contains all transcript names in the corpus matching the expressions, $filterTierNames contains all tier names in the corpus matching the expressions.
search_new, search_run, search_sub
library(act) # Search all transcripts that have "ARG" (ignoring case sensitivity) in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTranscriptIncludeRegex="(?i)arg") myfilter$transcriptNames # Search all transcripts that don't have "ARG" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTranscriptExcludeRegex="ARG") myfilter$transcriptNames # Search all tiers that have an "A" or an "a" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTierIncludeRegex="(?i)A") myfilter$tierNames # Search all tiers that have a capital "A" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTierIncludeRegex="A") myfilter$tierNames # In which transcripts do these tiers occur? myfilter$transcriptNames # Let's check the first of the transcripts, if this is really the case... examplecorpus@transcripts[[myfilter$transcriptNames[1]]]@tierslibrary(act) # Search all transcripts that have "ARG" (ignoring case sensitivity) in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTranscriptIncludeRegex="(?i)arg") myfilter$transcriptNames # Search all transcripts that don't have "ARG" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTranscriptExcludeRegex="ARG") myfilter$transcriptNames # Search all tiers that have an "A" or an "a" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTierIncludeRegex="(?i)A") myfilter$tierNames # Search all tiers that have a capital "A" in their name myfilter <- act::search_makefilter(x=examplecorpus, filterTierIncludeRegex="A") myfilter$tierNames # In which transcripts do these tiers occur? myfilter$transcriptNames # Let's check the first of the transcripts, if this is really the case... examplecorpus@transcripts[[myfilter$transcriptNames[1]]]@tiers
Creates a new search object and runs the search in a corpus object. Only 'x' and 'pattern' are obligatory. The other arguments can be left to their default values.
search_new( x, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = TRUE, name = "mysearch", resultidPrefix = "result", resultidStart = 1, filterTranscriptNames = NULL, filterTranscriptIncludeRegex = NULL, filterTranscriptExcludeRegex = NULL, filterTierNames = NULL, filterTierIncludeRegex = NULL, filterTierExcludeRegex = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, concordanceMake = TRUE, concordanceWidth = NULL, cutSpanBeforesec = 0, cutSpanAftersec = 0, runSearch = TRUE )search_new( x, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = TRUE, name = "mysearch", resultidPrefix = "result", resultidStart = 1, filterTranscriptNames = NULL, filterTranscriptIncludeRegex = NULL, filterTranscriptExcludeRegex = NULL, filterTierNames = NULL, filterTierIncludeRegex = NULL, filterTierExcludeRegex = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, concordanceMake = TRUE, concordanceWidth = NULL, cutSpanBeforesec = 0, cutSpanAftersec = 0, runSearch = TRUE )
x |
Corpus object; basis in which will be searched. |
pattern |
Character string; search pattern as regular expression. |
searchMode |
Character string; takes the following values: |
searchNormalized |
Logical; if |
name |
Character string; name of the search. Will be used, for example, as name of the sub folder when creating media cuts. |
resultidPrefix |
Character string; search results will be numbered consecutively; This character string will be placed before the consecutive numbers. |
resultidStart |
Integer; search results will be numbered consecutively; This is the start number of the identifiers. |
filterTranscriptNames |
Vector of character strings; names of transcripts to be included. |
filterTranscriptIncludeRegex |
Character string; as regular expression, limit search to certain transcripts matching the expression. |
filterTranscriptExcludeRegex |
Character string; as regular expression, exclude certain transcripts matching the expression. |
filterTierNames |
Vector of character strings; names of tiers to be included in the search. |
filterTierIncludeRegex |
Character string; as regular expression, limit search to certain tiers matching the expression. |
filterTierExcludeRegex |
Character string; as regular expression, exclude certain tiers matching the expression. |
filterSectionStartsec |
Double; start time of region for search. |
filterSectionEndsec |
Double; end time of region for search. |
concordanceMake |
Logical; if |
concordanceWidth |
Integer; number of characters to the left and right of the search hit in the concordance , the default is |
cutSpanBeforesec |
Double; Start the media and transcript cut some seconds before the hit to include some context, the default is |
cutSpanAftersec |
Double; End the media and transcript cut some seconds before the hit to include some context, the default is |
runSearch |
Logical; if |
Search object.
search_run, search_makefilter, search_sub
library(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) [email protected] [email protected] # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hitlibrary(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) mysearch.norm@results.nr mysearch.org@results.nr # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hit
The function creates an temporary .eaf file and a .psfx file that locates the search hit.
These files will then be opened in ELAN.
To make this function work you need to have 'ELAN' installed on your computer and tell the act package where ELAN is located.
Therefore you need to set the path to the ELAN executable in the option 'act.path.elan' using options(act.path.elan='PATHTOYOURELANEXECUTABLE').
search_openresult_inelan(x, s, resultid, openOriginal = FALSE)search_openresult_inelan(x, s, resultid, openOriginal = FALSE)
x |
Corpus object. |
s |
Search object. |
resultid |
Integer; Number of the search result (row in the data frame |
openOriginal |
Logical; if |
WARNING: This function will overwrite existing .psfx files.
Credits: Thanks to Han Sloetjes for feedback on the structure of the temporary .pfsx files. He actually made the code work.
library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "yo") # You can only use this function if you have installed ELAN on our computer. ## Not run: options(act.path.elan='PATHTOYOURELANEXECUTABLE') act::search_openresult_inelan(x=examplecorpus, s=mysearch, resultid=1, TRUE) ## End(Not run)library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "yo") # You can only use this function if you have installed ELAN on our computer. ## Not run: options(act.path.elan='PATHTOYOURELANEXECUTABLE') act::search_openresult_inelan(x=examplecorpus, s=mysearch, resultid=1, TRUE) ## End(Not run)
The function remote controls 'Praat' by using 'sendpraat' and a 'Praat' script. It opens a search result in the 'Praat' TextGrid Editor.
search_openresult_inpraat( x, s, resultid, play = TRUE, close = FALSE, filterMediaFile = c(".*\\.(aiff|aif|wav)", ".*\\.mp3"), delay = 0.5 )search_openresult_inpraat( x, s, resultid, play = TRUE, close = FALSE, filterMediaFile = c(".*\\.(aiff|aif|wav)", ".*\\.mp3"), delay = 0.5 )
x |
Corpus object. |
s |
Search object. |
resultid |
Integer; Number of the search result (row in the data frame |
play |
Logical; If |
close |
Logical; If |
filterMediaFile |
Vector of character strings; Each element of the vector is a regular expression. Expressions will be checked consecutively. The first match with an existing media file will be used for playing. The default checking order is uncompressed audio > compressed audio. |
delay |
Double; Time in seconds before the section will be opened in Praat. This is useful if Praat opens but the section does not. In that case increase the delay. |
To make this function work you need to do two things first:
Install 'sendpraat' on your computer. To do so follow the instructions in the vignette 'installation-sendpraat'. Show the vignette with vignette("installation-sendpraat").
Set the path to the 'sendpraat' executable correctly by using 'options(act.path.sendpraat = ...)'.
library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this functions if you have installed and # located the 'sendpraat' executable properly in the package options. ## Not run: act::search_openresult_inpraat(x=examplecorpus, s=mysearch, resultid=1, TRUE, TRUE) ## End(Not run)library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this functions if you have installed and # located the 'sendpraat' executable properly in the package options. ## Not run: act::search_openresult_inpraat(x=examplecorpus, s=mysearch, resultid=1, TRUE, TRUE) ## End(Not run)
The function remote controls 'Quicktime' by using an Apple Script. It opens a search result in 'Quicktime' and plays it.
search_openresult_inquicktime( x, s, resultid, play = TRUE, close = FALSE, bringToFront = TRUE, filterFile = c(".*\\.(mp4|mov)", ".*\\.(aiff|aif|wav)", ".*\\.mp3") )search_openresult_inquicktime( x, s, resultid, play = TRUE, close = FALSE, bringToFront = TRUE, filterFile = c(".*\\.(mp4|mov)", ".*\\.(aiff|aif|wav)", ".*\\.mp3") )
x |
Corpus object. |
s |
Search object. |
resultid |
Integer; Number of the search result (row in the data frame |
play |
Logical; If |
close |
Logical; if |
bringToFront |
Logical; if |
filterFile |
Vector of character strings; Each element of the vector is a regular expression. Expressions will be checked consecutively. The first match with an existing media file will be used for playing. The default checking order is video > uncompressed audio > compressed audio. |
Note: You need to be on a Mac to use this function.
Span
If you want to extend the cut before or after each search result, you can modify @cuts.span.beforesec and @cuts.span.aftersec in your search object.
Logical; TRUE if media file has been played, or FALSE if not.
library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this function if you are on a Mac. # In addition, you need to have downloaded the example media. ## Not run: # Assign media files [email protected] <- c("FOLDERWHEREMEDIAFILESARELOCATED") examplecorpus <- act::media_assign(examplecorpus) # Play the media for the first search result act::search_openresult_inquicktime(x=examplecorpus, s=mysearch, resultid = 1, play=TRUE, close=TRUE) # Play all search results after one another. for (i in 1:nrow(mysearch@results)) { print(mysearch@results$content[i]) act::search_openresult_inquicktime(x=examplecorpus, s=mysearch, resultid = i, play=TRUE, close=TRUE) } ## End(Not run)library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this function if you are on a Mac. # In addition, you need to have downloaded the example media. ## Not run: # Assign media files examplecorpus@paths.media.files <- c("FOLDERWHEREMEDIAFILESARELOCATED") examplecorpus <- act::media_assign(examplecorpus) # Play the media for the first search result act::search_openresult_inquicktime(x=examplecorpus, s=mysearch, resultid = 1, play=TRUE, close=TRUE) # Play all search results after one another. for (i in 1:nrow(mysearch@results)) { print(mysearch@results$content[i]) act::search_openresult_inquicktime(x=examplecorpus, s=mysearch, resultid = i, play=TRUE, close=TRUE) } ## End(Not run)
The function remote controls 'Quicktime' by using an Apple Script. It opens consecutively all search results in 'Quicktime' and plays them.
search_playresults_inquicktime(x, s, bringToFront = FALSE)search_playresults_inquicktime(x, s, bringToFront = FALSE)
x |
Corpus object. |
s |
Search object. |
bringToFront |
Logical; if |
Note: You need to be on a Mac to use this function.
No return value.
library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this function if you are on a Mac. # In addition, you need to have downloaded the example media files. ## Not run: # Assign media files [email protected] <- c("FOLDERWHEREMEDIAFILESARELOCATED") examplecorpus <- act::media_assign(examplecorpus) # Create print transcripts. This is not necessary. # But its nice to see them when playing all results. mysearch <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Play all search results act::search_playresults_inquicktime(x=examplecorpus, s=mysearch) ## End(Not run)library(act) mysearch <- act::search_new(x=examplecorpus, pattern = "pero") # You can only use this function if you are on a Mac. # In addition, you need to have downloaded the example media files. ## Not run: # Assign media files examplecorpus@paths.media.files <- c("FOLDERWHEREMEDIAFILESARELOCATED") examplecorpus <- act::media_assign(examplecorpus) # Create print transcripts. This is not necessary. # But its nice to see them when playing all results. mysearch <- act::search_cuts_printtranscript (x=examplecorpus, s=mysearch) # Play all search results act::search_playresults_inquicktime(x=examplecorpus, s=mysearch) ## End(Not run)
Search results from a search object will be saved to a Excel-XLSX or a CSV (comma separated values) file.
By default a XLSX file will be saved. If you want to save a CSV file, use saveAsCSV=TRUE.
Please note:
The function will '=' signs at the beginning of annotation by ".=". This is because the content would be interpreted as the beginning of a formula (leading to an error).
In the case of writing to an excel file, line breaks will be replaced by "|". This is because line breaks will lead to an error.
search_results_export( s, path, sheetName = "data", saveAsCSV = FALSE, encoding = "UTF-8", separator = ";", overwrite = TRUE )search_results_export( s, path, sheetName = "data", saveAsCSV = FALSE, encoding = "UTF-8", separator = ";", overwrite = TRUE )
s |
Search object. Search object containing the results you wish to export. |
path |
Character string; path where file will be saved. Please add the suffix '.csv' or '.xlsx' to the file name. |
sheetName |
Character string, set the name of the excel sheet. |
saveAsCSV |
Logical; if |
encoding |
Character string; text encoding for CSV files. |
separator |
Character; single character that is used to separate the columns. |
overwrite |
Logical; if |
library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") nrow(mysearch@results) # Create temporary file path path <- tempfile(pattern = "searchresults", tmpdir = tempdir(), fileext = ".xlsx") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- tempfile(pattern = "searchresults", tmpdir = "PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", fileext = ".xlsx") ## End(Not run) # Save search results act::search_results_export(s=mysearch, path=path) # Do your coding of the search results somewhere outside of act # ... # Load search results mysearch.import <- act::search_results_import(path=path) nrow(mysearch.import@results)library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") nrow(mysearch@results) # Create temporary file path path <- tempfile(pattern = "searchresults", tmpdir = tempdir(), fileext = ".xlsx") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- tempfile(pattern = "searchresults", tmpdir = "PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", fileext = ".xlsx") ## End(Not run) # Save search results act::search_results_export(s=mysearch, path=path) # Do your coding of the search results somewhere outside of act # ... # Load search results mysearch.import <- act::search_results_import(path=path) nrow(mysearch.import@results)
Search results will be imported from an Excel '.xlsx' file or a comma separated values '.csv' file into a search object.
search_results_import( path, revertReplacements = TRUE, sheetName = "data", encoding = "UTF-8", separator = ";" )search_results_import( path, revertReplacements = TRUE, sheetName = "data", encoding = "UTF-8", separator = ";" )
path |
Character string; path to file from where data will be loaded. |
revertReplacements |
Logical, when exporting search results from act, '=' at the beginning of lines are replaced by '.=", and in numbers the decimal separator '.' is replaced by a ",". If |
sheetName |
Character string, set the name of the excel sheet containing the data. |
encoding |
Character string; text encoding in the case of CVS files. |
separator |
Character; single character that is used to separate the columns in CSV files. |
Search object.
library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") nrow(mysearch@results) # Create temporary file path path <- tempfile(pattern = "searchresults", tmpdir = tempdir(), fileext = ".xlsx") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- tempfile(pattern = "searchresults", tmpdir = "PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", fileext = ".xlsx") ## End(Not run) # Save search results act::search_results_export(s=mysearch, path=path) # Do your coding of the search results somewhere outside of act # ... # Load search results mysearch.import <- act::search_results_import(path=path) nrow(mysearch.import@results)library(act) # Search mysearch <- act::search_new(examplecorpus, pattern="yo") nrow(mysearch@results) # Create temporary file path path <- tempfile(pattern = "searchresults", tmpdir = tempdir(), fileext = ".xlsx") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- tempfile(pattern = "searchresults", tmpdir = "PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", fileext = ".xlsx") ## End(Not run) # Save search results act::search_results_export(s=mysearch, path=path) # Do your coding of the search results somewhere outside of act # ... # Load search results mysearch.import <- act::search_results_import(path=path) nrow(mysearch.import@results)
Runs a search, based on an existing search object s, in a corpus object x.
search_run(x, s)search_run(x, s)
x |
Corpus object. |
s |
Search object. |
Search object.
search_new, search_makefilter, search_sub
library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search mysearch <- act::search_run(x=examplecorpus, s=mysearch) mysearch mysearch@results$hit # Search Only in tiers called "A", in any transcript [email protected] <-"A" [email protected] <-"" mysearch <- act::search_run(x=examplecorpus, s=mysearch) cbind(mysearch@results$transcriptName, mysearch@results$tierName, mysearch@results$hit) # Search Only in tiers called "A", only in transcript "ARG_I_PER_Alejo" [email protected] <-"A" [email protected] <-"ARG_I_PER_Alejo" mysearch <- act::search_run(x=examplecorpus, s=mysearch) cbind(mysearch@results$transcriptName, mysearch@results$tierName, mysearch@results$hit)library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search mysearch <- act::search_run(x=examplecorpus, s=mysearch) mysearch mysearch@results$hit # Search Only in tiers called "A", in any transcript mysearch@filter.tier.names <-"A" mysearch@filter.transcript.names <-"" mysearch <- act::search_run(x=examplecorpus, s=mysearch) cbind(mysearch@results$transcriptName, mysearch@results$tierName, mysearch@results$hit) # Search Only in tiers called "A", only in transcript "ARG_I_PER_Alejo" mysearch@filter.tier.names <-"A" mysearch@filter.transcript.names <-"ARG_I_PER_Alejo" mysearch <- act::search_run(x=examplecorpus, s=mysearch) cbind(mysearch@results$transcriptName, mysearch@results$tierName, mysearch@results$hit)
The function remote controls 'Praat' by using 'sendpraat' and a 'Praat' script. It first searches your corpus object and uses the first search hit. The corresponding TextGrid will be opened in the 'Praat' TextGrid Editor and the search hit will be displayed.
search_searchandopen_inpraat(x, pattern)search_searchandopen_inpraat(x, pattern)
x |
Corpus object. |
pattern |
Character string; search pattern as regular expression. |
To make this function work you need to do two things first:
Install 'sendpraat' on your computer. To do so follow the instructions in the vignette 'installation-sendpraat'. Show the vignette with vignette("installation-sendpraat").
Set the path to the 'sendpraat' executable correctly by using 'options(act.path.sendpraat = ...)'.
library(act) # You can only use this functions if you have installed # and located the 'sendpraat' executable properly in the package options. ## Not run: act::search_searchandopen_inpraat(x=examplecorpus, "pero") ## End(Not run)library(act) # You can only use this functions if you have installed # and located the 'sendpraat' executable properly in the package options. ## Not run: act::search_searchandopen_inpraat(x=examplecorpus, "pero") ## End(Not run)
Only x, s and pattern are obligatory.
The other arguments can be left to their default values.
search_stills( x, s, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = FALSE, filterTierNames = NULL, filterTierIncludeRegex = "still", filterTierExcludeRegex = NULL, resultids = NULL, stillsFolder = "stills" )search_stills( x, s, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = FALSE, filterTierNames = NULL, filterTierIncludeRegex = "still", filterTierExcludeRegex = NULL, resultids = NULL, stillsFolder = "stills" )
x |
Corpus object; basis in which will be searched. |
s |
Search object. |
pattern |
Character string; search pattern as regular expression. |
searchMode |
Character string; takes the following values: |
searchNormalized |
Logical; if |
filterTierNames |
Vector of character strings; names of tiers to be included. |
filterTierIncludeRegex |
Character string; as regular expression, limit search to certain tiers matching the expression. |
filterTierExcludeRegex |
Character string; as regular expression, exclude some tiers from search matching the expression. |
resultids |
Vector of Integer; By default all results in the search object will be processed. if |
stillsFolder |
Character string; name of the sub folder where to store the stills. folder will be created recursively |
Search object.
library(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) [email protected] [email protected] # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hitlibrary(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) mysearch.norm@results.nr mysearch.org@results.nr # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hit
This function starts from the results of a prior search and performs a sub search for a temporal co-occurence. In the sub search all results from the prior search will be checked. The sub search will check annotations in other tiers that temporally overlap with the original search result. Those annotation will be checked if they match a search pattern. If so, the search hit of the sub search will be added to a new column in the original search results data frame.
search_sub( x, s, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = TRUE, filterTierIncludeRegex = "", filterTierExcludeRegex = "", destinationColumn = "subsearch", deleteEmptyLines = FALSE, excludeHitsInSameTier = TRUE, collapseString = " | " )search_sub( x, s, pattern, searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"), searchNormalized = TRUE, filterTierIncludeRegex = "", filterTierExcludeRegex = "", destinationColumn = "subsearch", deleteEmptyLines = FALSE, excludeHitsInSameTier = TRUE, collapseString = " | " )
x |
Corpus object. |
s |
Search object. |
pattern |
Character string; search pattern as regular expression |
searchMode |
Character string; takes the following values: |
searchNormalized |
Logical; if |
filterTierIncludeRegex |
Character string; limit search to tiers that match the regular expression |
filterTierExcludeRegex |
Character string; limit search to tiers that match the regular expression |
destinationColumn |
Character string; name of column where results of sub search will be stored |
deleteEmptyLines |
Logical; if |
excludeHitsInSameTier |
Logical; if |
collapseString |
Character string; Characters that will be used to separate multiple search hits |
Search object.
search_new, search_run, search_makefilter
library(act) # Lets search for instances where participants laugh together # First search for annotations that contain laughter (in original content) myRegEx <- "(\\brie\\b|\\briendo\\b)" mysearch <- act::search_new(x=examplecorpus, pattern=myRegEx, searchNormalized = FALSE) [email protected] # Now perform sub search, also on laughs/laughing test <- act::search_sub(x=examplecorpus, s=mysearch, pattern=myRegEx) # Check the co-occurring search hits test@results$subsearchlibrary(act) # Lets search for instances where participants laugh together # First search for annotations that contain laughter (in original content) myRegEx <- "(\\brie\\b|\\briendo\\b)" mysearch <- act::search_new(x=examplecorpus, pattern=myRegEx, searchNormalized = FALSE) mysearch@results.nr # Now perform sub search, also on laughs/laughing test <- act::search_sub(x=examplecorpus, s=mysearch, pattern=myRegEx) # Check the co-occurring search hits test@results$subsearch
Search in original content of a single transcript
search_transcript_content(t, s)search_transcript_content(t, s)
t |
Transcript object; transcript to search in. |
s |
Search object. |
Data.frame data frame with search results.
library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search df <- act::search_transcript_content(t=examplecorpus@transcripts[[3]], s=mysearch) nrow(df)library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search df <- act::search_transcript_content(t=examplecorpus@transcripts[[3]], s=mysearch) nrow(df)
Search in full text of a single transcript
search_transcript_fulltext(t, s)search_transcript_fulltext(t, s)
t |
Transcript object; transcript to search in. |
s |
Search object. |
Data.frame data frame with search results.
library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search df <- act::search_transcript_fulltext(t=examplecorpus@transcripts[[3]], s=mysearch) nrow(df)library(act) # Search for the 1. Person Singular Pronoun in Spanish. # Only create the search object without running the search. mysearch <- act::search_new(x=examplecorpus, pattern= "yo", runSearch=FALSE) # Run the search df <- act::search_transcript_fulltext(t=examplecorpus@transcripts[[3]], s=mysearch) nrow(df)
This object defines the properties of a search in act. It also contains the results of this search in a specific corpus, if the search has already been run. (Note that you can also create a search without running it immediately). A search object can be run on different corpora.
Some of the slots are defined by the user.
Other slots are [READ ONLY], which means that they can be accessed by the user but
should not be changed. They contain values that are filled when you execute functions
on the object.
nameCharacter string; name of the search. Will be used, for example, as name of the sub folder when creating media cuts
patternCharacter string; search pattern as a regular expression.
search.modeCharacter string; defines if the original contents of the annotations should be searched or if the full texts should be searched. Slot takes the following values: content, fulltext (=default, includes both full text modes), fulltext.byTime, fulltext.byTier.
search.normalizedlogical. if TRUE the normalized annotations will be used for searching.
resultid.prefixCharacter string; search results will be numbered consecutively; This character string will be placed before the consecutive numbers.
resultid.startInteger; search results will be numbered consecutively; This is the start number of the identifiers.
filter.transcript.namesVector of character strings; names of transcripts to include in the search. If the value is character() or "" filter will be ignored.
filter.transcript.includeRegExCharacter string; Regular expression that defines which transcripts should be INcluded in the search (matching the name of the transcript).
filter.transcript.excludeRegExCharacter string; Regular expression that defines which transcripts should be EXcluded in the search (matching the name of the transcript).
filter.tier.namesVector of character strings; names of tiers to include in the search. If the value is character() or "" filter will be ignored.
filter.tier.includeRegExCharacter string; Regular expression that defines which tiers should be INcluded in the search (matching the name of the tier).
filter.tier.excludeRegExCharacter string; Regular expression that defines which tiers should be EXcluded in the search (matching the name of the tier).
filter.section.startsecDouble; Time value in seconds, limiting the search to a certain time span in each transcript, defining the start of the search window.
filter.section.endsecDouble; Time value in seconds, limiting the search to a certain time span in each transcript, defining the end of the search window.
concordance.makeLogical; If a concordance should be created when the search is run.
concordance.widthInteger; number of characters to include in the concordance.
cuts.span.beforesecDouble; Seconds how much the cuts (media and print transcripts) should start before the start of the search hit.
cuts.span.aftersecDouble; Seconds how much the cuts (media and print transcripts) should end after the end of the search hit.
cuts.column.srtCharacter string; name of destination column in the search results data frame where the .srt subtitles will be inserted; column will be created if not present in data frame; set to "" for no insertion.
cuts.column.printtranscriptCharacter string; name of destination column in the search results data frame where the print transcripts will be inserted; column will be created if not present in data frame; set to "" for no insertion.
cuts.printtranscriptsCharacter string; [READ ONLY] All print transcripts for the search results (if generated previously)
cuts.cutlist.macCharacter string; [READ ONLY] 'FFmpeg' cut list for use on a Mac, to cut the media files for the search results.
cuts.cutlist.winCharacter string; [READ ONLY] 'FFmpeg' cut list for use on Windows, to cut the media files for the search results.
resultsData.frame; Results of the search.1
results.nrInteger; [READ ONLY] Number of search results.
results.tiers.nrInteger; [READ ONLY] Number of tiers over which the search results are distrubuted.
results.transcripts.nrInteger; [READ ONLY] Number of transcripts over which the search results are distrubuted.
x.nameCharacter string; [READ ONLY] name of the corpus object on which the search has been run.
act::export_docx, act::export_docx,
library(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) [email protected] [email protected] # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hitlibrary(act) # Search for the 1. Person Singular Pronoun in Spanish. mysearch <- act::search_new(examplecorpus, pattern= "yo") mysearch # Search in normalized content vs. original content mysearch.norm <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE) mysearch.org <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE) mysearch.norm@results.nr mysearch.org@results.nr # The difference is because during normalization capital letters will be converted # to small letters. One annotation in the example corpus contains a "yo" with a # capital letter: mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE) mysearch@results$hit # Search in full text vs. original content. # Full text search will find matches across annotations. # Let's define a regular expression with a certain span. # Search for the word "no" 'no' followed by a "pero" 'but' # in a distance ranging from 1 to 20 characters. myRegEx <- "\\bno\\b.{1,20}pero" mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext") mysearch mysearch@results$hit
Adds a tiers in all transcript objects of a corpus.
If tiers should be added only in certain transcripts, set the parameter filterTranscriptNames.
In case that you want to select transcripts by using regular expressions use the function act::search_makefilter first.
tiers_add( x, tierName, tierType = c("IntervalTier", "TextTier"), positionAbsolute = NULL, destinationTier = NULL, relativePositionToDestinationTier = 0, insertOnlyIfDestinationExists = FALSE, filterTranscriptNames = NULL, skipIfTierAlreadyExists = TRUE )tiers_add( x, tierName, tierType = c("IntervalTier", "TextTier"), positionAbsolute = NULL, destinationTier = NULL, relativePositionToDestinationTier = 0, insertOnlyIfDestinationExists = FALSE, filterTranscriptNames = NULL, skipIfTierAlreadyExists = TRUE )
x |
Corpus object. |
tierName |
Character string; names of the tier to be added. |
tierType |
Character string; type of the tier to be added. |
positionAbsolute |
Integer; Absolute position where the tier will be inserted. Value 1 and values beloe 1 will insert the tier in the first position; To insert the tier at the end, leave 'positionAbsolute' and 'destinationTier' open. |
destinationTier |
Character string; insert the tier relative to this tier. |
relativePositionToDestinationTier |
Integer; position relative to the destination tier; 1=immediately after; 0 and -1=immediately before; bigger numbers are also allowed. |
insertOnlyIfDestinationExists |
Logical; if |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be modified. If left open, the tier will be added to all transcripts in the corpus. |
skipIfTierAlreadyExists |
Logical; if |
You can either insert the new tier at a specific position (e.g. 'positionAbsolute=1') or in relation to a existing tier (e.g. destinationTier='speaker1'). To insert a tier at the end, leave 'positionAbsolute' and 'destinationTier' open.
Results will be reported in @history of the transcript objects.
Corpus object.
tiers_delete, tiers_rename, tiers_convert, tiers_sort
library(act) # --- Add new interval tier. # Since not position is set it will be inserted in the end, by default. x <- act::tiers_add(examplecorpus, tierName="TEST") #check results x@history[length(x@history)] #have a look at the first transcript x@transcripts[[1]]@tiers #--> New tier is inserted in the end. # --- Add new interval tier in position 2 x <- act::tiers_add(examplecorpus, tierName="TEST", positionAbsolute=2) #check results x@history[length(x@history)] #have a look at the first transcript x@transcripts[[1]]@tiers #--> New tier is inserted as second tier. # --- Add new interval tier at the position of "Entrevistador", only if this tier exists, # If the destination tier does not exist, the new tier will NOT be inserted. #Have a look at the first and the second transcript. examplecorpus@transcripts[[1]]@tiers #Transcript 1 does contain a tier "Entrevistador" in the first position. examplecorpus@transcripts[[2]]@tiers #Transcript 2 does contain a tier "Entrevistador" in the first position. #Insert new tier x <- act::tiers_add(examplecorpus, tierName="TEST", destinationTier="Entrevistador", relativePositionToDestinationTier=0, insertOnlyIfDestinationExists=TRUE) #Check results x@history[length(x@history)] #Have a look at the transcript 1: # Tier 'TEST' was in first position (e.g. where 'Entrevistador' was before). x@transcripts[[1]]@tiers #Have a look at the transcript 2: #Tier 'TEST' was not inserted, since there was no destination tier 'Entrevistador'. x@transcripts[[2]]@tiers # --- Add new interval tier AFTER tier="Entrevistador" # If the destination tier does not exist, the new tier will be inserted at the end in any case. x <- act::tiers_add(examplecorpus, tierName="TEST", destinationTier="Entrevistador", relativePositionToDestinationTier=1, insertOnlyIfDestinationExists=FALSE) #check results x@history[length(x@history)] #Have a look at the transcript 1: # Tier 'TEST' was inserted after the tier 'Entrevistador'. x@transcripts[[1]]@tiers #Have a look at the transcript 2: #Tier 'TEST' was insertedat the end. x@transcripts[[2]]@tierslibrary(act) # --- Add new interval tier. # Since not position is set it will be inserted in the end, by default. x <- act::tiers_add(examplecorpus, tierName="TEST") #check results x@history[length(x@history)] #have a look at the first transcript x@transcripts[[1]]@tiers #--> New tier is inserted in the end. # --- Add new interval tier in position 2 x <- act::tiers_add(examplecorpus, tierName="TEST", positionAbsolute=2) #check results x@history[length(x@history)] #have a look at the first transcript x@transcripts[[1]]@tiers #--> New tier is inserted as second tier. # --- Add new interval tier at the position of "Entrevistador", only if this tier exists, # If the destination tier does not exist, the new tier will NOT be inserted. #Have a look at the first and the second transcript. examplecorpus@transcripts[[1]]@tiers #Transcript 1 does contain a tier "Entrevistador" in the first position. examplecorpus@transcripts[[2]]@tiers #Transcript 2 does contain a tier "Entrevistador" in the first position. #Insert new tier x <- act::tiers_add(examplecorpus, tierName="TEST", destinationTier="Entrevistador", relativePositionToDestinationTier=0, insertOnlyIfDestinationExists=TRUE) #Check results x@history[length(x@history)] #Have a look at the transcript 1: # Tier 'TEST' was in first position (e.g. where 'Entrevistador' was before). x@transcripts[[1]]@tiers #Have a look at the transcript 2: #Tier 'TEST' was not inserted, since there was no destination tier 'Entrevistador'. x@transcripts[[2]]@tiers # --- Add new interval tier AFTER tier="Entrevistador" # If the destination tier does not exist, the new tier will be inserted at the end in any case. x <- act::tiers_add(examplecorpus, tierName="TEST", destinationTier="Entrevistador", relativePositionToDestinationTier=1, insertOnlyIfDestinationExists=FALSE) #check results x@history[length(x@history)] #Have a look at the transcript 1: # Tier 'TEST' was inserted after the tier 'Entrevistador'. x@transcripts[[1]]@tiers #Have a look at the transcript 2: #Tier 'TEST' was insertedat the end. x@transcripts[[2]]@tiers
Merges tiers from all transcripts in a corpus object into a data frame.
tiers_all(x, compact = TRUE)tiers_all(x, compact = TRUE)
x |
Corpus object. |
compact |
Logical; if |
Data frame
library(act) #Get data frame with all tiers alltiers <- act::tiers_all(examplecorpus) alltiers #Get data frame with a simplified version alltiers <- act::tiers_all(examplecorpus, compact=TRUE) alltierslibrary(act) #Get data frame with all tiers alltiers <- act::tiers_all(examplecorpus) alltiers #Get data frame with a simplified version alltiers <- act::tiers_all(examplecorpus, compact=TRUE) alltiers
Converts tier types between 'interval' and 'point' tier.
Applies to all tiers in all transcript objects of a corpus.
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames.
In case that you want to select transcripts by using regular expressions use the function act::search_makefilter first.
tiers_convert( x, intervalToPoint = FALSE, pointToInterval = FALSE, filterTierNames = NULL, filterTranscriptNames = NULL )tiers_convert( x, intervalToPoint = FALSE, pointToInterval = FALSE, filterTierNames = NULL, filterTranscriptNames = NULL )
x |
Corpus object. |
intervalToPoint |
Logical; if |
pointToInterval |
Logical; if |
filterTierNames |
Vector of character strings; names of the tiers to be included. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be checked. If left open, all transcripts will be checked |
Note: When converting from interval > point tier, the original end times of the annotations will be lost definitely.
Corpus object.
tiers_add, tiers_delete, tiers_rename, tiers_sort, helper_tiers_new_table, helper_tiers_sort_table
library(act) # Check the names and types of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers examplecorpus@transcripts[[2]]@tiers # Convert interval tiers to point tiers newcorpus <- act::tiers_convert(examplecorpus, intervalToPoint=TRUE) # the names and types of the existing tiers newcorpus@transcripts[[1]]@tiers newcorpus@transcripts[[2]]@tiers # Convert point tiers to interval tiers newcorpus <- act::tiers_convert(newcorpus, pointToInterval=TRUE) # Note: In this round trip conversion from 'interval > point > interval tier' # the original end times of the annotations get lost (when converting from interval > point).library(act) # Check the names and types of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers examplecorpus@transcripts[[2]]@tiers # Convert interval tiers to point tiers newcorpus <- act::tiers_convert(examplecorpus, intervalToPoint=TRUE) # the names and types of the existing tiers newcorpus@transcripts[[1]]@tiers newcorpus@transcripts[[2]]@tiers # Convert point tiers to interval tiers newcorpus <- act::tiers_convert(newcorpus, pointToInterval=TRUE) # Note: In this round trip conversion from 'interval > point > interval tier' # the original end times of the annotations get lost (when converting from interval > point).
Deletes tiers in all transcript objects of a corpus.
If only tiers in certain transcripts should be affected set the parameter filterTranscriptNames.
In case that you want to select tiers and/or transcripts by using regular expressions use the function act::search_makefilter first.
Results will be reported in @history of the transcript objects.
tiers_delete(x, tierNames, filterTranscriptNames = NULL)tiers_delete(x, tierNames, filterTranscriptNames = NULL)
x |
Corpus object. |
tierNames |
Character string; names of the tiers to be deleted. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be modified. If left open, all transcripts will be checked. |
Corpus object.
tiers_add, tiers_rename, tiers_convert, tiers_sort, helper_tiers_new_table, helper_tiers_sort_table
library(act) # get info about all tiers all.tiers <- act::info(examplecorpus)$tiers # tiers 'A' and 'B' occur 6 times in 6 transcripts all.tiers["A", "tier.count"] all.tiers["B", "tier.count"] # delete tiers tierNames <- c("A", "B") x<- examplecorpus x <- act::tiers_delete(examplecorpus, tierNames=tierNames) x@history[length(x@history)] # tiers 'A' and 'B' do not occur anymore act::info(x)$tiers$tierNamelibrary(act) # get info about all tiers all.tiers <- act::info(examplecorpus)$tiers # tiers 'A' and 'B' occur 6 times in 6 transcripts all.tiers["A", "tier.count"] all.tiers["B", "tier.count"] # delete tiers tierNames <- c("A", "B") x<- examplecorpus x <- act::tiers_delete(examplecorpus, tierNames=tierNames) x@history[length(x@history)] # tiers 'A' and 'B' do not occur anymore act::info(x)$tiers$tierName
Renames all tiers in all transcript objects of a corpus.
If only certain transcripts should be affected set the parameter filterTranscriptNames.
In case that you want to select transcripts by using regular expressions use the function act::search_makefilter first.
tiers_rename(x, searchPattern, searchReplacement, filterTranscriptNames = NULL)tiers_rename(x, searchPattern, searchReplacement, filterTranscriptNames = NULL)
x |
Corpus object. |
searchPattern |
Character string; search pattern as regular expression. |
searchReplacement |
Character string; replacement string. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
The tiers will only be renamed if the resulting names preserve the uniqueness of the tier names.
Results will be reported in @history of the transcript objects.
Please be aware that this function is not optimized for speed and may take quite a while to run, depending on the size of your corpus object.
Corpus object.
tiers_add, tiers_convert, tiers_rename, tiers_sort, helper_tiers_new_table, helper_tiers_sort_table
library(act) # Check the names of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers$name examplecorpus@transcripts[[2]]@tiers$name x <- act::tiers_rename(examplecorpus, "Entrevistador", "E") x@transcripts[[1]]@tiers$name x@transcripts[[2]]@tiers$namelibrary(act) # Check the names of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers$name examplecorpus@transcripts[[2]]@tiers$name x <- act::tiers_rename(examplecorpus, "Entrevistador", "E") x@transcripts[[1]]@tiers$name x@transcripts[[2]]@tiers$name
Reorder the positions of tiers in all transcripts of a corpus object.
The ordering of the tiers will be done according to a vector of regular expressions defined in 'sortVector'.
If only certain transcripts or tiers should be affected set the parameter filterTranscriptNames.
In case that you want to select transcripts by using regular expressions use the function act::search_makefilter first.
tiers_sort( x, sortVector, filterTranscriptNames = NULL, tiersAddMissing = FALSE, tiersDelete = FALSE )tiers_sort( x, sortVector, filterTranscriptNames = NULL, tiersAddMissing = FALSE, tiersDelete = FALSE )
x |
Corpus object. |
sortVector |
Vector of character strings; regular expressions to match the tier names. The order within the vector presents the new order of the tiers. Use "\\*" (=two backslashes and a star) to indicate where tiers that are not present in the sort vector but in the transcript should be inserted. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
tiersAddMissing |
Logical; if |
tiersDelete |
Logical; if |
Corpus object.
tiers_add, tiers_convert, tiers_delete, tiers_rename, helper_tiers_new_table, helper_tiers_sort_table
library(act) # Check the order of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # Get tier names to create the sort vector sortVector <- c(examplecorpus@transcripts[[1]]@tiers$name, examplecorpus@transcripts[[2]]@tiers$name) # Revert the vector for demonstration. sortVector <- sortVector[length(sortVector):1] # This will only reorder the tiers. examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector) # Check again the order of the tiers examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # This will reorder the tiers and additionally add tiers that are given # in the sort vector but not present in the transcript. examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, tiersAddMissing=TRUE) # Check again the order of the tiers examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check for example the first transcript: it now contains a tier called "newTier" examplecorpus@transcripts[[1]]@tiers # To get more examples and information about sorting see 'helper_tiers_sort_table()'.library(act) # Check the order of the existing tiers in the first two transcripts examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # Get tier names to create the sort vector sortVector <- c(examplecorpus@transcripts[[1]]@tiers$name, examplecorpus@transcripts[[2]]@tiers$name) # Revert the vector for demonstration. sortVector <- sortVector[length(sortVector):1] # This will only reorder the tiers. examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector) # Check again the order of the tiers examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # This will reorder the tiers and additionally add tiers that are given # in the sort vector but not present in the transcript. examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, tiersAddMissing=TRUE) # Check again the order of the tiers examplecorpus@transcripts[[1]]@tiers$name[order(examplecorpus@transcripts[[1]]@tiers$position)] examplecorpus@transcripts[[2]]@tiers$name[order(examplecorpus@transcripts[[2]]@tiers$position)] # Insert a tier called "newTier" into all transcripts in the corpus: for (t in examplecorpus@transcripts) { sortVector <- c(t@tiers$name, "newTier") examplecorpus <- act::tiers_sort(x=examplecorpus, sortVector=sortVector, filterTranscriptNames=t@name, tiersAddMissing=TRUE) } # Check for example the first transcript: it now contains a tier called "newTier" examplecorpus@transcripts[[1]]@tiers # To get more examples and information about sorting see 'helper_tiers_sort_table()'.
A transcript object contains the annotations of a loaded annotation file and some meta data . In addition, it contains information that is auto generated by the act package, which is necessary for some functions (e.g. the full text search)
Some of the slots are defined by the user.
Other slots are [READ ONLY], which means that they can be accessed by the user but
should not be changed. They contain values that are filled when you execute functions
on the object.
nameCharacter string; [READ ONLY] Name of the transcript, generated from the annotation file name.
file.pathCharacter string; [READ ONLY] Original location of the annotation file.
file.encodingCharacter string; [READ ONLY] Encoding applied to the file when reading.
file.typeCharacter string; [READ ONLY] Type of the original annotation file/object, e.g. 'eaf' or 'textgrid' for files and 'rpraat' for a rPraat .TextGrid object.
file.contentCharacter string; [READ ONLY] Content of the original annotation file/object.
import.resultCharacter string; [READ ONLY] Information about the success of the import of the annotation file.
load.messageCharacter string; [READ ONLY] Possibly messages about errors that occurred on importing the annotation file.
length.secDouble; [READ ONLY] Duration of the transcript in seconds.
tiersData.frame; [READ ONLY] Table with the tiers. To modify the tiers it is highly recommended to use functions of the package to ensure for consistency of the data.
annotationsData.frame; Table with the annotations.
media.pathVector of character strings; Path(s) to the media files that correspond to this transcript object.
normalization.systimePOSIXct; Time of the last normalization.
fulltext.systimePOSIXct; [READ ONLY] Time of the last creation of the full texts.
fulltext.filter.tier.namesVector of character strings; names of tiers that were included in the full text..
fulltext.bytime.origCharacter string; [READ ONLY] full text of the transcript based on the ORIGINAL content of the annotations, sorting the annotations by TIME
fulltext.bytime.normCharacter string; [READ ONLY] full text of the transcript based on the NORMALIZED content of the annotations, sorting the annotations by TIME
fulltext.bytier.origCharacter string; [READ ONLY] full text of the transcript based on the ORIGINAL content of the annotations, sorting the annotations first by TIERS and then by time
fulltext.bytier.normCharacter string; [READ ONLY] full text of the transcript based on the NORMALIZED content of the annotations, sorting the annotations first by TIERS and then by time
modification.systimePOSIXct; [READ ONLY] Time of the last modification of the transcript. Modifications after importing the annotation file by applying one/some of the packages function(s). Manual changes of the transcript by the user are not tracked!
historyList; [READ ONLY] History of the modifications made to the transcript object.
library(act) examplecorpus@transcripts[[1]]library(act) examplecorpus@transcripts[[1]]
Add a single or multiple transcript objects to a corpus object.
transcripts_add( x, ..., skipDuplicates = FALSE, createFulltext = TRUE, assignMedia = TRUE )transcripts_add( x, ..., skipDuplicates = FALSE, createFulltext = TRUE, assignMedia = TRUE )
x |
Corpus object |
... |
transcript object, list of transcript objects, corpus object. |
skipDuplicates |
Logical; If |
createFulltext |
Logical; if |
assignMedia |
Logical; if |
The name of the transcript objects have to be unique in the act package.
The @name attribute of each transcript object will be set as identifier in the list of transcripts in the corpus object.
By default, transcripts with non unique names will be renamed.
If you prefer to import.skipDoubleFiles, set the parameter skipDuplicates=TRUE.
Skipped/renamed transcripts will be reported in
Corpus object
library(act) # get one of the already existing transcript in the examplecorpus newtrans <- examplecorpus@transcripts[[1]] # add this transcript to the examplecorpus newcorpus <- act::transcripts_add(examplecorpus, newtrans) # compare the two corpus objects length(examplecorpus@transcripts) length(newcorpus@transcripts) names(examplecorpus@transcripts) names(newcorpus@transcripts)library(act) # get one of the already existing transcript in the examplecorpus newtrans <- examplecorpus@transcripts[[1]] # add this transcript to the examplecorpus newcorpus <- act::transcripts_add(examplecorpus, newtrans) # compare the two corpus objects length(examplecorpus@transcripts) length(newcorpus@transcripts) names(examplecorpus@transcripts) names(newcorpus@transcripts)
Transcript object may contain errors, e.g. because of defect annotation input files or user modifications. This function may cure some of these errors in all transcript objects of a corpus.
Annotations with reversed times: annotations with endsec lower than startsec will be deleted.
Overlapping annotations: earlier annotations will end where the next annotation starts.
Annotations below 0 sec: Annotations that are starting and ending before 0 sec will be deleted; Annotations starting before but ending after 0 sec will be truncated.
Missing tiers: Tiers that are present in the annotations but missing in the list of tiers in @tiers of the transcript object will be added.
transcripts_cure( x, filterTranscriptNames = NULL, annotationsTimesReversed = TRUE, annotationsOverlap = TRUE, annotationsTimesBelowZero = TRUE, tiersMissing = TRUE, warning = FALSE )transcripts_cure( x, filterTranscriptNames = NULL, annotationsTimesReversed = TRUE, annotationsOverlap = TRUE, annotationsTimesBelowZero = TRUE, tiersMissing = TRUE, warning = FALSE )
x |
Corpus object. |
filterTranscriptNames |
Vector of character strings; names of the transcripts to be included. |
annotationsTimesReversed |
Logical; If |
annotationsOverlap |
Logical; If |
annotationsTimesBelowZero |
Logical; If |
tiersMissing |
Logical; If |
warning |
Logical; If |
Corpus object;
library(act) # The example corpus does not contain any errors. # But let's use the function anyway. x<- act::transcripts_cure(examplecorpus) x@history[[length(x@history)]] # See \code{act::cure_transcript} for actual examples.library(act) # The example corpus does not contain any errors. # But let's use the function anyway. x<- act::transcripts_cure(examplecorpus) x@history[[length(x@history)]] # See \code{act::cure_transcript} for actual examples.
Transcript object may contain errors, e.g. because of defect annotation input files or user modifications. This function may cure some of these errors.
Annotations with reversed times: annotations with endsec lower than startsec will be deleted.
Overlapping annotations: earlier annotations will end where the next annotation starts.
Annotations below 0 sec: Annotations that are starting and ending before 0 sec will be deleted; Annotations starting before but ending after 0 sec will be truncated.
Missing tiers: Tiers that are present in the annotations but missing in the list of tiers in @tiers of the transcript object will be added.
transcripts_cure_single( t, annotationsTimesReversed = TRUE, annotationsOverlap = TRUE, annotationsTimesBelowZero = TRUE, tiersMissing = TRUE, warning = FALSE )transcripts_cure_single( t, annotationsTimesReversed = TRUE, annotationsOverlap = TRUE, annotationsTimesBelowZero = TRUE, tiersMissing = TRUE, warning = FALSE )
t |
Transcript object. |
annotationsTimesReversed |
Logical; If |
annotationsOverlap |
Logical; If |
annotationsTimesBelowZero |
Logical; If |
tiersMissing |
Logical; If |
warning |
Logical; If |
Transcript object;
library(act) # --- annotationsTimesReversed: will be deleted # get example transcript and reverse the times of an annotation t <- examplecorpus@transcripts[[1]] t@annotations$startsec[1] <- 20 t@annotations$endsec[1] <- 10 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- annotationsTimesBelowZero: will be deleted or start at 0 sec t <- examplecorpus@transcripts[[1]] t@annotations$startsec[1] <- -2 t@annotations$endsec[1] <- -1 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) t <- examplecorpus@transcripts[[1]] t@annotations$startsec[2] <- -5 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- annotationsOverlap: will end where the next starts t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] t@annotations$endsec[1] <- 8 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- tiersMissing: will be added to @tiers in transcript object t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] t@annotations$tierName[1] <- "NEW" t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) t2@tiers # compare with original tiers t@tiers # --- several things at once t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] # annotation completely below 0 sec t@annotations$startsec[1] <- -6 t@annotations$endsec[1] <- -5 # annotation starts before but ends after 0 sec t@annotations$startsec[2] <- -3 # annotation with reversed times t@annotations$startsec[3] <- 6.9 t@annotations$endsec[3] <- -6.8 # annotation overlaps with next annotation t@annotations$endsec[6] <- 9 # new tier, missing tier list t@annotations$tierName[8] <- "NEW" t2 <- act::transcripts_cure_single(t, warning=TRUE) tail(t2@history, n=1) examplecorpus@transcripts[[1]]@historylibrary(act) # --- annotationsTimesReversed: will be deleted # get example transcript and reverse the times of an annotation t <- examplecorpus@transcripts[[1]] t@annotations$startsec[1] <- 20 t@annotations$endsec[1] <- 10 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- annotationsTimesBelowZero: will be deleted or start at 0 sec t <- examplecorpus@transcripts[[1]] t@annotations$startsec[1] <- -2 t@annotations$endsec[1] <- -1 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) t <- examplecorpus@transcripts[[1]] t@annotations$startsec[2] <- -5 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- annotationsOverlap: will end where the next starts t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] t@annotations$endsec[1] <- 8 t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) # --- tiersMissing: will be added to @tiers in transcript object t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] t@annotations$tierName[1] <- "NEW" t2 <- act::transcripts_cure_single(t) tail(t2@history, n=1) t2@tiers # compare with original tiers t@tiers # --- several things at once t<- examplecorpus@transcripts[[1]] t@annotations <- t@annotations[order(t@annotations$tierName, t@annotations$startsec), ] # annotation completely below 0 sec t@annotations$startsec[1] <- -6 t@annotations$endsec[1] <- -5 # annotation starts before but ends after 0 sec t@annotations$startsec[2] <- -3 # annotation with reversed times t@annotations$startsec[3] <- 6.9 t@annotations$endsec[3] <- -6.8 # annotation overlaps with next annotation t@annotations$endsec[6] <- 9 # new tier, missing tier list t@annotations$tierName[8] <- "NEW" t2 <- act::transcripts_cure_single(t, warning=TRUE) tail(t2@history, n=1) examplecorpus@transcripts[[1]]@history
Delete transcript objects from a corpus object.
You need to name the transcripts to delete directly in the parameter 'transcriptNames'.
If you want to delete transcripts based on a search pattern (regular expression) use act::search_sub first.
transcripts_delete(x, transcriptNames)transcripts_delete(x, transcriptNames)
x |
Corpus object |
transcriptNames |
Vector of character strings; names of the transcript object to be deleted. |
Corpus object
library(act) # delete two transcripts by their name test <- act::transcripts_delete(examplecorpus, c("BOL_CCBA_SP_MeryGaby1", "BOL_CCBA_SP_MeryGaby2")) # compare the the original and modified corpus object length(examplecorpus@transcripts) length(test@transcripts) setdiff(names(examplecorpus@transcripts), names(test@transcripts)) test@history[length(test@history)] # delete transcripts that match a filter, e.g. all transcripts from Bolivia "BOL_" myfilter <- act::search_makefilter(examplecorpus, filterTranscriptIncludeRegex = "BOL_") test <- act::transcripts_delete(examplecorpus, myfilter$transcriptNames) # compare the the original and modified corpus object length(examplecorpus@transcripts) length(test@transcripts) setdiff(names(examplecorpus@transcripts), names(test@transcripts))library(act) # delete two transcripts by their name test <- act::transcripts_delete(examplecorpus, c("BOL_CCBA_SP_MeryGaby1", "BOL_CCBA_SP_MeryGaby2")) # compare the the original and modified corpus object length(examplecorpus@transcripts) length(test@transcripts) setdiff(names(examplecorpus@transcripts), names(test@transcripts)) test@history[length(test@history)] # delete transcripts that match a filter, e.g. all transcripts from Bolivia "BOL_" myfilter <- act::search_makefilter(examplecorpus, filterTranscriptIncludeRegex = "BOL_") test <- act::transcripts_delete(examplecorpus, myfilter$transcriptNames) # compare the the original and modified corpus object length(examplecorpus@transcripts) length(test@transcripts) setdiff(names(examplecorpus@transcripts), names(test@transcripts))
Filter all transcript objects in a corpus and return the filtered corpus object.
It is possible to filter out temporal sections and tiers.
In case that you want to select tiers by using regular expressions use the function act::search_makefilter first.
transcripts_filter( x, filterTranscriptNames = NULL, filterTranscriptsRestict = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, timesPreserve = TRUE, sort = c("none", "tier>startsec", "startsec>tier") )transcripts_filter( x, filterTranscriptNames = NULL, filterTranscriptsRestict = NULL, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, timesPreserve = TRUE, sort = c("none", "tier>startsec", "startsec>tier") )
x |
Corpus object; |
filterTranscriptNames |
Vector of character strings; names of transcripts to remain. If left unspecified, all transcripts will remain in the transcripts. |
filterTranscriptsRestict |
Vector of character strings; names of transcripts to which filters will be applied. If left unspecified, all transcripts will be filtered. |
filterTierNames |
Vector of character strings; names of tiers to remain in the transcripts. If left unspecified, all tiers will remain in the transcripts. |
filterSectionStartsec |
Double, start of selection in seconds. |
filterSectionEndsec |
Double, end of selection in seconds. |
timesPreserve |
Logical; Parameter is used if |
sort |
Logical; Annotations will be sorted: 'none' (=no sorting), 'tier>startsec' (=sort first by tier, then by startsec), 'startsec>tier' (=sort first by startsec, then by tier) |
Corpus object;
library(act) # Filter corpus to only contain some tiers all.tierNames <- unique(act::tiers_all(examplecorpus)$name) some.tierNames <- all.tierNames[1:10] x <- act::transcripts_filter(examplecorpus, filterTierNames=some.tierNames) x@history[[length(x@history)]]library(act) # Filter corpus to only contain some tiers all.tierNames <- unique(act::tiers_all(examplecorpus)$name) some.tierNames <- all.tierNames[1:10] x <- act::transcripts_filter(examplecorpus, filterTierNames=some.tierNames) x@history[[length(x@history)]]
Filter a transcript object and return the filtered transcript object.
It is possible to REMOVE temporal sections and tiers.
In case that you want to select tiers by using regular expressions use the function act::search_makefilter first.
transcripts_filter_remove_single( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, sort = c("none", "tier>startsec", "startsec>tier") )transcripts_filter_remove_single( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, sort = c("none", "tier>startsec", "startsec>tier") )
t |
Transcript object. |
filterTierNames |
Vector of character strings; names of tiers to be remain in the transcripts. If left unspecified, all tiers will remain in the transcript exported. |
filterSectionStartsec |
Double, start of selection in seconds. |
filterSectionEndsec |
Double, end of selection in seconds. |
sort |
Logical; Annotations will be sorted: 'none' (=no sorting), 'tier>startsec' (=sort first by tier, then by startsec), 'startsec>tier' (=sort first by startsec, then by tier) |
Transcript object;
library(act) # get an example transcript t1 <- examplecorpus@transcripts[[1]] # --- Filter by tiers # The example transcript contains two tiers that contain four annotations each. t1@tiers table(t1@annotations$tierName) # Filter transcript to only contain annotations of the FIRST tier t2 <- act::transcripts_filter_single(t1, filterTierNames=t1@tiers$name[1]) t2@tiers table(t2@annotations$tierName) # Use act::search_makefilter() first to get the tier names, # in this case search for tiers with a capital 'I', # which is the second tier, called 'ISanti' myfilter <- act::search_makefilter(examplecorpus, filterTranscriptNames=t2@name, filterTierIncludeRegex="I" ) t2 <- act::transcripts_filter_single(t1, filterTierNames=myfilter$tierNames) t2@tiers table(t2@annotations$tierName) # --- Filter by time section # only set start of section (until the end of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6) cbind(t2@annotations$startsec,t2@annotations$endsec) # only set end of section (from the beginning of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section, start new times from 0 t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec)library(act) # get an example transcript t1 <- examplecorpus@transcripts[[1]] # --- Filter by tiers # The example transcript contains two tiers that contain four annotations each. t1@tiers table(t1@annotations$tierName) # Filter transcript to only contain annotations of the FIRST tier t2 <- act::transcripts_filter_single(t1, filterTierNames=t1@tiers$name[1]) t2@tiers table(t2@annotations$tierName) # Use act::search_makefilter() first to get the tier names, # in this case search for tiers with a capital 'I', # which is the second tier, called 'ISanti' myfilter <- act::search_makefilter(examplecorpus, filterTranscriptNames=t2@name, filterTierIncludeRegex="I" ) t2 <- act::transcripts_filter_single(t1, filterTierNames=myfilter$tierNames) t2@tiers table(t2@annotations$tierName) # --- Filter by time section # only set start of section (until the end of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6) cbind(t2@annotations$startsec,t2@annotations$endsec) # only set end of section (from the beginning of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section, start new times from 0 t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec)
Filter a transcript object and return the filtered transcript object.
It is possible to EXTRACT temporal sections and tiers.
In case that you want to select tiers by using regular expressions use the function act::search_makefilter first.
transcripts_filter_single( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, timesPreserve = TRUE, sort = c("none", "tier>startsec", "startsec>tier") )transcripts_filter_single( t, filterTierNames = NULL, filterSectionStartsec = NULL, filterSectionEndsec = NULL, timesPreserve = TRUE, sort = c("none", "tier>startsec", "startsec>tier") )
t |
Transcript object. |
filterTierNames |
Vector of character strings; names of tiers to be remain in the transcripts. If left unspecified, all tiers will remain in the transcript exported. |
filterSectionStartsec |
Double, start of selection in seconds. |
filterSectionEndsec |
Double, end of selection in seconds. |
timesPreserve |
Logical; Parameter is used if |
sort |
Logical; Annotations will be sorted: 'none' (=no sorting), 'tier>startsec' (=sort first by tier, then by startsec), 'startsec>tier' (=sort first by startsec, then by tier) |
Transcript object;
library(act) # get an example transcript t1 <- examplecorpus@transcripts[[1]] # --- Filter by tiers # The example transcript contains two tiers that contain four annotations each. t1@tiers table(t1@annotations$tierName) # Filter transcript to only contain annotations of the FIRST tier t2 <- act::transcripts_filter_single(t1, filterTierNames=t1@tiers$name[1]) t2@tiers table(t2@annotations$tierName) # Use act::search_makefilter() first to get the tier names, # in this case search for tiers with a capital 'I', # which is the second tier, called 'ISanti' myfilter <- act::search_makefilter(examplecorpus, filterTranscriptNames=t2@name, filterTierIncludeRegex="I" ) t2 <- act::transcripts_filter_single(t1, filterTierNames=myfilter$tierNames) t2@tiers table(t2@annotations$tierName) # --- Filter by time section # only set start of section (until the end of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6) cbind(t2@annotations$startsec,t2@annotations$endsec) # only set end of section (from the beginning of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section, start new times from 0 t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec)library(act) # get an example transcript t1 <- examplecorpus@transcripts[[1]] # --- Filter by tiers # The example transcript contains two tiers that contain four annotations each. t1@tiers table(t1@annotations$tierName) # Filter transcript to only contain annotations of the FIRST tier t2 <- act::transcripts_filter_single(t1, filterTierNames=t1@tiers$name[1]) t2@tiers table(t2@annotations$tierName) # Use act::search_makefilter() first to get the tier names, # in this case search for tiers with a capital 'I', # which is the second tier, called 'ISanti' myfilter <- act::search_makefilter(examplecorpus, filterTranscriptNames=t2@name, filterTierIncludeRegex="I" ) t2 <- act::transcripts_filter_single(t1, filterTierNames=myfilter$tierNames) t2@tiers table(t2@annotations$tierName) # --- Filter by time section # only set start of section (until the end of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6) cbind(t2@annotations$startsec,t2@annotations$endsec) # only set end of section (from the beginning of the transcript) t2 <- act::transcripts_filter_single(t1, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec) # set start and end of section, start new times from 0 t2 <- act::transcripts_filter_single(t1, filterSectionStartsec=6, filterSectionEndsec=8) cbind(t2@annotations$startsec,t2@annotations$endsec)
Merges several transcript objects in a corpus object. One transcript is the destination transcript (the transcript that will be updated and receives the new data). The other transcripts are the update transcripts (they contain the data that will replace data in teh destination transcript). The update transcripts need to contain a tier in which the update sections are marked with a specific character string.
transcripts_merge( x, destinationTranscriptName, updateTranscriptNames, identifierTier = "update", identifierPattern = ".+", eraseCompletely = TRUE )transcripts_merge( x, destinationTranscriptName, updateTranscriptNames, identifierTier = "update", identifierPattern = ".+", eraseCompletely = TRUE )
x |
Corpus object; |
destinationTranscriptName |
Character strings; name of transcript hat willl be updated. |
updateTranscriptNames |
Vector of character strings; names of transcripts that contain the updates. |
identifierTier |
Character string; regular expression that identifies the tier in which the sections are marked, that will be inserted into transDestination. |
identifierPattern |
Character string; regular expression that identifies the sections that will be inserted into transDestination. |
eraseCompletely |
Logical; if |
You may chose between the following two options:
The update sections in the destination transcript will first be erased completely and then the updates will be filled in.
The update sections in the destination transcript will NOT be erased completely. Rater only the contents of tiers will be erased that are also present in the update tiers. e.g. if your destination transcript contains more tiers than the update transcripts, the contents of those tiers will be preserved in the destination tier during the update.
Transcript object
library(act) # We need three transcripts to demonstrate the function \code{transcripts_merge}: # - the destination transcript: "update_destination" # - two transcripts that contain updates: "update_update1 and "update_update2" #Have a look at the annotations in the destination transcript first. #It contains 2 annotations: examplecorpus@transcripts[["update_destination"]]@annotations #Have a look at the annotations in the update_update1 transcript, too: #It contains 3 annotations: examplecorpus@transcripts[["update_update1"]]@annotations # Run the function with only one update: test <- act::transcripts_merge(x=examplecorpus, destinationTranscriptName="update_destination", updateTranscriptNames = "update_update1") #Have a look at the annotations in the destination transcript again. #It now contains 5 annotations: test@transcripts[["update_destination"]]@annotations # Run the function with two transcript objects for updates: test <- act::transcripts_merge(x=examplecorpus, destinationTranscriptName="update_destination", updateTranscriptNames = c("update_update1","update_update2")) #Have a look at the annotations in the destination transcript again. #It now contains 8 annotations: test@transcripts[["update_destination"]]@annotations # Compare the transcript in the original and in the modified corpus object. # The update transcript objects are gone: act::info_summarized(examplecorpus)$transcriptNames act::info_summarized(test)$transcriptNames #Have a look at the history of the corpus object test@historylibrary(act) # We need three transcripts to demonstrate the function \code{transcripts_merge}: # - the destination transcript: "update_destination" # - two transcripts that contain updates: "update_update1 and "update_update2" #Have a look at the annotations in the destination transcript first. #It contains 2 annotations: examplecorpus@transcripts[["update_destination"]]@annotations #Have a look at the annotations in the update_update1 transcript, too: #It contains 3 annotations: examplecorpus@transcripts[["update_update1"]]@annotations # Run the function with only one update: test <- act::transcripts_merge(x=examplecorpus, destinationTranscriptName="update_destination", updateTranscriptNames = "update_update1") #Have a look at the annotations in the destination transcript again. #It now contains 5 annotations: test@transcripts[["update_destination"]]@annotations # Run the function with two transcript objects for updates: test <- act::transcripts_merge(x=examplecorpus, destinationTranscriptName="update_destination", updateTranscriptNames = c("update_update1","update_update2")) #Have a look at the annotations in the destination transcript again. #It now contains 8 annotations: test@transcripts[["update_destination"]]@annotations # Compare the transcript in the original and in the modified corpus object. # The update transcript objects are gone: act::info_summarized(examplecorpus)$transcriptNames act::info_summarized(test)$transcriptNames #Have a look at the history of the corpus object test@history
Merges several transcripts. One transcript is the destination transcript (the transcript that will be updated). The other transcripts are the update transcripts and contain the updates. The update transcripts need to contain a tier in which the update sections are marked with a specific character string.
transcripts_merge2( destinationTranscript, updateTranscripts, identifierTier = "update", identifierPattern = ".+", eraseCompletely = TRUE )transcripts_merge2( destinationTranscript, updateTranscripts, identifierTier = "update", identifierPattern = ".+", eraseCompletely = TRUE )
destinationTranscript |
Transcript object; transcript that serves as destination (and will receive the updates). |
updateTranscripts |
List of transcript objects; transcript objects that will be inserted into the destination transcripts (entirely or in part). |
identifierTier |
Character string; regular expression that identifies the tier in which the sections are marked, that will be inserted into destinationTranscript. |
identifierPattern |
Character string; regular expression that identifies the sections that will be inserted into destinationTranscript. |
eraseCompletely |
Logical; if |
You may chose between the following two options:
The update sections in the destination transcript will first be erased completely and then the updates will be filled in.
The update sections in the destination transcript will NOT be erased completely. Rater only the contents of tiers will be erased that are also present in the update tiers. e.g. if your destination transcript contains more tiers than the update transcripts, the contents of those tiers will be preserved in the destination tier during the update.
Transcript object
library(act) # We need three transcripts to demonstrate the function \code{transcripts_merge}: # - the destination transcript destinationTranscript <- examplecorpus@transcripts[["update_destination"]] # - two transcripts that contain updates updateTranscripts <- c(examplecorpus@transcripts[["update_update1" ]], examplecorpus@transcripts[["update_update2" ]]) # Run the function test <- transcripts_merge2(destinationTranscript, updateTranscripts) # Save the transcript to a TextGrid file. # Set the destination file path path <- tempfile(pattern = "merge_test", tmpdir = tempdir(), fileext = ".TextGrid") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".TextGrid", sep="")) ## End(Not run) # Export act::export_textgrid( t=test, pathOutput=path)library(act) # We need three transcripts to demonstrate the function \code{transcripts_merge}: # - the destination transcript destinationTranscript <- examplecorpus@transcripts[["update_destination"]] # - two transcripts that contain updates updateTranscripts <- c(examplecorpus@transcripts[["update_update1" ]], examplecorpus@transcripts[["update_update2" ]]) # Run the function test <- transcripts_merge2(destinationTranscript, updateTranscripts) # Save the transcript to a TextGrid file. # Set the destination file path path <- tempfile(pattern = "merge_test", tmpdir = tempdir(), fileext = ".TextGrid") # It makes more sense, however, to you define a destination folder # that is easier to access on your computer: ## Not run: path <- file.path("PATH_TO_AN_EXISTING_FOLDER_ON_YOUR_COMPUTER", paste(t@name, ".TextGrid", sep="")) ## End(Not run) # Export act::export_textgrid( t=test, pathOutput=path)
Rename transcript objects in a corpus object.
This function changes both the names of the transcripts in the list x@transcripts and in the @name slot of the transcript.
The function ensures that each transcript object preserves a unique name.
transcripts_rename( x, newTranscriptNames = NULL, searchPatterns = NULL, searchReplacements = NULL, toUpper = FALSE, toLower = FALSE, trim = FALSE, stopNonUnique = TRUE )transcripts_rename( x, newTranscriptNames = NULL, searchPatterns = NULL, searchReplacements = NULL, toUpper = FALSE, toLower = FALSE, trim = FALSE, stopNonUnique = TRUE )
x |
Corpus object |
newTranscriptNames |
Vector of character strings; new names for the transcripts. If left open, the current names in the corpus object will be taken as basis. |
searchPatterns |
Character string; Search pattern as regular expression applied to the names of the transcripts. |
searchReplacements |
Character string; String to replace the hits of the search. |
toUpper |
Logical; Convert transcript names all to upper case. |
toLower |
Logical; Convert transcript names all to lower case. |
trim |
Logical; Remove leading and trailing spaces in names. |
stopNonUnique |
Logical; If |
Corpus object
library(act) # get current names old.names <- names(examplecorpus@transcripts) # make vector of names with the same length new.names <- paste("transcript", 1:length(old.names), sep="") # rename the transcripts test <- act::transcripts_rename(examplecorpus, newTranscriptNames=new.names) # check names(test@transcripts) test@transcripts[[1]]@name test@history[length(test@history)] # convert to lower case test <- act::transcripts_rename(examplecorpus, toLower=TRUE) test@history[length(test@history)] # search replace test <- act::transcripts_rename(examplecorpus, searchPatterns=c("ARG", "BOL"), searchReplacements = c("ARGENTINA", "BOLIVIA") ) test@history[length(test@history)] # search replace ignoring upper and lower case test <- act::transcripts_rename(examplecorpus, searchPatterns=c("(?i)arg", "(?i)bol"), searchReplacements = c("ARGENTINA", "BOLIVIA") ) test@history[length(test@history)] # search replace too much test <- act::transcripts_rename(x=examplecorpus, searchPatterns="ARG_I_CHI_Santi", searchReplacements = "") names(test@transcripts)[1]library(act) # get current names old.names <- names(examplecorpus@transcripts) # make vector of names with the same length new.names <- paste("transcript", 1:length(old.names), sep="") # rename the transcripts test <- act::transcripts_rename(examplecorpus, newTranscriptNames=new.names) # check names(test@transcripts) test@transcripts[[1]]@name test@history[length(test@history)] # convert to lower case test <- act::transcripts_rename(examplecorpus, toLower=TRUE) test@history[length(test@history)] # search replace test <- act::transcripts_rename(examplecorpus, searchPatterns=c("ARG", "BOL"), searchReplacements = c("ARGENTINA", "BOLIVIA") ) test@history[length(test@history)] # search replace ignoring upper and lower case test <- act::transcripts_rename(examplecorpus, searchPatterns=c("(?i)arg", "(?i)bol"), searchReplacements = c("ARGENTINA", "BOLIVIA") ) test@history[length(test@history)] # search replace too much test <- act::transcripts_rename(x=examplecorpus, searchPatterns="ARG_I_CHI_Santi", searchReplacements = "") names(test@transcripts)[1]
Creates/updates the full texts of the transcripts in a corpus. The full text may be created in two different ways:
The contents of a transcription will be joined consecutively based on the time information.
The contents of each tier will be joined consecutively, and then the next tier will be joined.
transcripts_update_fulltexts( x, searchMode = c("fulltext", "fulltext.bytier", "fulltext.bytime"), transcriptNames = NULL, tierNames = NULL, forceUpdate = FALSE )transcripts_update_fulltexts( x, searchMode = c("fulltext", "fulltext.bytier", "fulltext.bytime"), transcriptNames = NULL, tierNames = NULL, forceUpdate = FALSE )
x |
Corpus object. |
searchMode |
Character string; Which full text should be created; accepts the following values: |
transcriptNames |
Vector of character strings; Names of the transcripts you want to update; leave empty if you want to process all transcripts that need an update. |
tierNames |
Vector of character strings; Names of the tiers to include in the fulltext. |
forceUpdate |
Logical; If |
Corpus object.
library(act) examplecorpus <- act::transcripts_update_fulltexts(x=examplecorpus)library(act) examplecorpus <- act::transcripts_update_fulltexts(x=examplecorpus)
Normalizes the contents of transcriptions in a corpus object using a normalization matrix. Function returns a corpus object with normalized transcription and updates the original corpus object passed as argument to x.
transcripts_update_normalization( x, pathReplacementMatrix = "", transcriptNames = NULL, forceUpdate = FALSE )transcripts_update_normalization( x, pathReplacementMatrix = "", transcriptNames = NULL, forceUpdate = FALSE )
x |
Corpus object. |
pathReplacementMatrix |
Character string; path to replacement matrix in CSV format. If empty, the default replacement matrix that comes with the package will be used. |
transcriptNames |
Vector of character strings; Names of the transcripts for which you want to search media files; leave empty if you want to search media for all transcripts in the corpus object. |
forceUpdate |
Logical; If |
library(act) examplecorpus <- act::transcripts_update_normalization(x=examplecorpus)library(act) examplecorpus <- act::transcripts_update_normalization(x=examplecorpus)