About
The open source project Strus provides a collection
libraries and command line tools written in C++ for building a competitive,
scalable full-text search engine.
The Strus search engine can be built using any key value store database that provides an
upper bound seek function
for the stored key/value pairs. Currently, there exists an implementation based on the
LevelDB library.
Demo system
There is a demo system online for Strus:
A search on the complete Wikipedia collection (English)
with a description how the system can be installed and the index built.
Tutorials
A tutorial for building a simple search engine with PHP is available here.
Another tutorial for building a nontrivial search engine also covering the insert case with Python and the Tornado web framework is available here.
An introduction that shows how to write dynamically loadable extensions of the Strus core
in C++ can be found here.
An article that shows how scalable search engines can be built with Strus (distributing the search index) can be found here.
Installation
There is a short installation guide on how to install the strus packages or fetch the Strus sources from Github and build them on your own or alternatively to run Strus from docker images.
Documentation
The documentation is work in progress. What is available can be found here.
Story
Why build yet another search engine? Here I tell about my motivation and the story of Strus. I also try to explain, what distinguishes Strus from other search engine software.
News
Updated documentation of the domain specific languages used by the command line utilities to describe document and query analysis, query evaluation and the query language. |
|
New Github project strusAll to simiplify out of source builds of Strus. |
|
Changed status from pre-Alpha to Alpha as most of the features are available and stable. Strus is already successfully used in projects. But the interface may still undergo changes and the versioning does not follow strict rules yet. |
|
Improved the Wikipedia demo search. The search takes references to documents close to query terms into account besides weighting the appearance of query terms in documents. We also added a search for the closest vectors (cosine similarity) to the entities appearing in the query. The vectors (about 10 million vectors with dimension 300) were created with word2vec on the Wikipedia collection. |
|
We started a new project StrusPattern for deeper document analysis. |
|
The work on the strusWebservice is ongoing. A slide show that helps to figure out where we are heading to in this project can be found here. |
|
There is a Travis build available now for strus at travis-ci.org |
|
We have implemented an NGRAM normalizer and a tokenizer and normalizer for terms defined by regular expressions on text. You find all functions implemented till now here. |
|
A document segmenter for JSON (based on the cJSON library) has been implemented for the Strus analyzer project. Selection expressions are also formulated in the abbreviated syntax of XPath as for XML. |
|
A new article has been published, that shows how to create call traces for Strus for debugging, statistical analysis and deeper understandig of the software. |
|
Strus can be build on OS X. Unfortunately we cannot provide packages yet. But at least you can build the software on your own. |
|
The project is now sponsored by Eurospider. The feedback from sophisticated retrieval projects and the development of new components as the Strus webservice project will bring the Strus forward. |
|
Providing some query performance numbers for the Wikipedia demo system running on an Intel NUC. |
|
License of Strus changed from GPLv3 to MPLv2 (Mozilla Public License, Version 2.0). We were looking for a license, that on one hand protects the work done for Strus and on the other hand allows users of Strus to attribute their own work in their way, even as closed source. We think that the Mozilla Public License, Version 2.0 meets these requirements best. Fortunately Strus did never include nor link against any pervasively licenced code, so this change of the software license is possible with the agreement of all contributors. |
|
The Wikipedia demo search engine is now hosted on an Intel NUC. Read more.... |
|
New article published on codeproject about writing Strus extension modules in C++. |
|
Article that shows how scalable search engines can be built with Strus. |
|
A tutorial for building a search engine with Strus based on Python and the Tornado web framework has been published on codeproject. |
|
The NBLNK weighting scheme for the Wikipedia demo is now online. This weighting scheme does not match documents against the query, but ranks the links in documents by weighting the sentences the links appear in against the query. It is a good example for the information extraction capabilities of Strus. |
|
Packages of the latest build are available now. |
|
Python bindings are available now. |
|
Language bindings for Java are now available. |
|
A docker image and a tutorial is available for Strus. |
|
Started advertising Strus to get some feedback and maybe even some support. |
|
The demo project of a search engine for the Wikipedia collection (english) is online. |
|
The demo project of a search engine for the Wikipedia collection (english) is close to be finished. |
© 2015 Patrick Frey
Original template design by Andreas Viklund / Best hosted at www.svenskadomaner.se