Loading...
 
Skip to main content

History: OCR Indexing

Preview of version: 5

OCR Indexing

Since Tiki20, file galleries can index the contents of files with images uploaded to Tiki, by means of "Optical Character Recognition" (OCR), and take the result to feed the search index also.


Tiki relies on https://github.com/tesseract-ocr/tesseract so you need to install as per https://tesseract-ocr.github.io/tessdoc/Installation.html

If you are using WikiSuite, Tesseract is installed by default: https://wikisuite.org/Differences-between-Virtualmin-and-WikiSuite

Server Check helps you confirm that Tesseract is working well, and available to Tiki.

Required Preferences

To enable OCR indexing in Tiki, make sure to activate the following preference:
ocr_enable: Enables Tiki to extract and index text from supported file types.

Optional Preferences

You can further customize OCR behavior with these optional settings:
ocr_every_file: If enabled, Tiki will attempt OCR on all supported files, regardless of other criteria.
ocr_file_level: Allows users to override the default OCR language settings on a per-file basis.

Additional Customization

Tiki also offers several advanced customization options:

  • Display OCR status per file.
  • Set custom paths for the tesseract and pdfimages binaries via the system $PATH.


The file gallery has two view modes: "Finder view" and the default "Page view". The OCR status for files can only be seen in the "Page view" mode of the file galleries File Gallery.

Alias names for this page:
OCR | OCRIndexing | Optical Character Recognition

History

Information Version
Sammy Ndabo 16
Sammy Ndabo 15
Sammy Ndabo 14
Sammy Ndabo 13
Sammy Ndabo 12
Sammy Ndabo 11
Sammy Ndabo 10
Sammy Ndabo 9
Sammy Ndabo image Plugin modified by editor. 8
Sammy Ndabo 7
Sammy Ndabo 6
Sammy Ndabo 5
Sammy Ndabo Improve the OCR indexing documentation with new details about the OCR feature 4
Marc Laporte 3
Xavi (as xavidp - admin) 2
Xavi (as xavidp - admin) minimum doc better than nothing? 1