🔍 Ambar: Document Search Engine

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

Ilya Pirozhenko 2dcb7af0b4 Merge pull request #170 from ciscoriordan/patch-1 Fix typo in Install.md		8 years ago
.vscode	Push draft of 2.0.0rc	8 years ago
ElasticSearch	Push draft of 2.0.0rc	8 years ago
FrontEnd	Release v2.1.8 Sponsored by IFIC	8 years ago
LocalCrawler	Release v2.1.8 Sponsored by IFIC	8 years ago
MongoDB	Push draft of 2.0.0rc	8 years ago
Pipeline	Correct env var for ocrPdfMaxPageCount	8 years ago
Rabbit	Push draft of 2.0.0rc	8 years ago
Redis	Push draft of 2.0.0rc	8 years ago
ServiceApi	Release v2.1.8 Sponsored by IFIC	8 years ago
WebApi	Release v2.1.8 Sponsored by IFIC	8 years ago
.gitignore	Push draft of 2.0.0rc	8 years ago
CHANGELOG.md	Release v2.1.8 Sponsored by IFIC	8 years ago
Install.md	Update Install.md	8 years ago
License.txt	2.0.0 rc2	8 years ago
README.md	Update README.md	8 years ago
docker-compose.yml	Release v2.1.8 Sponsored by IFIC	8 years ago
privacy-policy.md	2.0.0 rc2	8 years ago

README.md

🔍 Ambar: Document Search Engine

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines the new way to implement a full-text document search into yor workflow:

Easily deploy Ambar with a single docker-compose file
Perform a Google-like search through your documents and images contents
Ambar supports all popular document formats, performs OCR if needed
Tag your documents
Use a simple REST Api to integrate Ambar into your workflow

Features

Search

Tutorial: Mastering Ambar Search Queries

Fuzzy Search (John~3)
Phrase Search ("John Smith")
Search By Author (author:John)
Search By File Path (filename:*.txt)
Search By Date (when: yesterday, today, lastweek, etc)
Search By Size (size>1M)
Search By Tags (tags:ocr)
Search As You Type
Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed since the crawler monitors fs events and automatically processes new files.

Content Extraction

Ambar supports large files (>30MB)
ZIP archives
Mail archives (PST)
MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
OCR over images
Email messages with attachments
Adobe PDF (with OCR)
OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
OpenOffice documents
RTF, Plaintext
HTML / XHTML
Multithread processing

Installation

Notice: Ambar requires Docker to run, it can't run w/o Docker

Just follow the installation instruction

Docker images can be found on Docker Hub

Support

Ambar is fully open-source and free to use, however you can get a dedicated support from our team for a fee:

Install & Configure Ambar on your machine - 999$
Mount external data source - 99$
Add automatic tagging rule - 299$
Add password protection to Ambar UI - 299$
Add custom file extractor - 599$
Dedicated support - 199$/hour
Custom features development - 299$/hour

FAQ

Is it open-source?

Yes, it's fully open-source now.

Is it free?

Yes, it is forever free.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language please contact us on hello@ambar.cloud.

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

I have a problem what should I do?

Request a dedicated support session by mailing us on hello@ambar.cloud

Change Log

Privacy Policy

License

MIT License

README.md

🔍 Ambar: Document Search Engine

Features

Search

Crawling

Content Extraction

Installation

Support

FAQ

Is it open-source?

Is it free?

Does it perform OCR?

Which languages are supported for OCR?

Does it support tagging?

What about searching in PDF?

What is the maximum file size it can handle?

I have a problem what should I do?

Sponsors

Change Log

Privacy Policy

License