Loading...
 
Skip to main content

Search Indexing tab

Related Topics
Overview
Use this tab to configure specific MIME types in order to search within files (such as PDF files) that have been uploaded to the Tiki File Gallery.
To Access
From the File Gallery Admin page, click the Search Indexing tab.
Note
In order to search within uploaded files, your server may require additional applications, such as strings or pdftotext.




Option Description Default
Automatic indexing of file content Uses command line tools to extract the information from the files based on their MIME types. Disabled
Automatic indexing of emails stored as files Parses message/rfc822 types of files (aka eml files) and stores individual email headers and content in search index. Disabled
Asynchronous indexing Enabled
OCR Files Extract and index text from supported file types. Disabled
OCR Every File Attempt to OCR every supported file. Disabled
Allow file level OCR languages Allow users to change the default languages that will be used to OCR a file. Enabled
OCR limit languages Limit the number of languages one can select from this list.
Auto detect languages | Afrikaans (Afrikaans) | Albanian (Shqip) | Amharic (አማርኛ) | Arabic | Arabic (العربية) | Armenian | Armenian (Հայերեն) | Assamese (অসমীয়া) | Azerbaijani (azərbaycan dili) | Azerbaijani (azərbaycan dili) (cyrl) | Basque (euskara, euskera) | Belarusian (беларуская мова) | Bengali | Bengali (বাংলা) | Bosnian (bos...
None
tesseract path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
/usr/bin/tesseract
pdfimages path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
Pdfimages
Option Description Default
Automatic indexing of file content Uses command line tools to extract the information from the files based on their MIME types. Disabled
Automatic indexing of emails stored as files Parses message/rfc822 types of files (aka eml files) and stores individual email headers and content in search index. Disabled
Asynchronous indexing Enabled
OCR Files Extract and index text from supported file types. Disabled
OCR Every File Attempt to OCR every supported file. Disabled
Allow file level OCR languages Allow users to change the default languages that will be used to OCR a file. Enabled
OCR limit languages Limit the number of languages one can select from this list.
Auto detect languages | Afrikaans (Afrikaans) | Albanian (Shqip) | Amharic (አማርኛ) | Arabic | Arabic (العربية) | Armenian | Armenian (Հայերեն) | Assamese (অসমীয়া) | Azerbaijani (azərbaycan dili) | Azerbaijani (azərbaycan dili) (cyrl) | Basque (euskara, euskera) | Belarusian (беларуская мова) | Bengali | Bengali (বাংলা) | Bosnian (bos...
None
tesseract path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
/usr/bin/tesseract
pdfimages path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
Pdfimages
Option Description Default
Automatic indexing of file content Uses command line tools to extract the information from the files based on their MIME types. Disabled
Automatic indexing of emails stored as files Parses message/rfc822 types of files (aka eml files) and stores individual email headers and content in search index. Disabled
Asynchronous indexing Enabled
OCR Files Extract and index text from supported file types. Disabled
OCR Every File Attempt to OCR every supported file. Disabled
Allow file level OCR languages Allow users to change the default languages that will be used to OCR a file. Enabled
OCR limit languages Limit the number of languages one can select from this list.
Auto detect languages | Afrikaans (Afrikaans) | Albanian (Shqip) | Amharic (አማርኛ) | Arabic | Arabic (العربية) | Armenian | Armenian (Հայերեն) | Assamese (অসমীয়া) | Azerbaijani (azərbaycan dili) | Azerbaijani (azərbaycan dili) (cyrl) | Basque (euskara, euskera) | Belarusian (беларуская мова) | Bengali | Bengali (বাংলা) | Bosnian (bos...
None
tesseract path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
/usr/bin/tesseract
pdfimages path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
Pdfimages
Option Description Default
Automatic indexing of file content Uses command line tools to extract the information from the files based on their MIME types. Disabled
Automatic indexing of emails stored as files Parses message/rfc822 types of files (aka eml files) and stores individual email headers and content in search index. Disabled
Asynchronous indexing Enabled
OCR Files Extract and index text from supported file types. Disabled
OCR Every File Attempt to OCR every supported file. Disabled
Allow file level OCR languages Allow users to change the default languages that will be used to OCR a file. Enabled
OCR limit languages Limit the number of languages one can select from this list.
Auto detect languages
None
tesseract path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
Tesseract
pdfimages path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
/usr/bin/pdfimages
Option Description Default
Automatic indexing of file content Uses command line tools to extract the information from the files based on their MIME types. Disabled
Asynchronous indexing Enabled
OCR Files Extract and index text from supported file types. Disabled
OCR Every File Attempt to OCR every supported file. Disabled
Allow file level OCR languages Allow users to change the default languages that will be used to OCR a file. Enabled
OCR limit languages Limit the number of languages one can select from this list.
Auto detect languages
None
tesseract path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
sh: 1: where: not found
pdfimages path Path to the location of the binary. Defaults to the $PATH location.
If blank, the $PATH will be used, but will likely fail with scheduler.
sh: 1: where: not found