|| Vicaya Creator Guide || Online: Download || Guides: User Creator Developer ||
Documentation for a content creator including standard content file structure, running the server with your content, creating an index, testing, CDROM distribution, and maintaining updates. It is assumed that a creator is familiar with the User Guide and is generally familiar with Vicaya from the user perspective.
As a content creator, you are primarily concerned with the 'vicaya' and any content directories you create. We will assume 'mycontent' though you may call it anything you like. In the standard basic release, the documentation directory serves as both the searchable content example as well as the documentation you are reading now. However, as you may have received Vicaya as a finished product serving more specific content (such as accesstoinsight) we will refer to 'mycontent' throughout the documentation:
mycontent: contains the supporting data pages that the end user will see (perhaps the 'documentation' you are reading now). This directory may be named anything you choose, provided proper configuration. Note that you likely received a copy of vicaya with some other content name. You will recognize this directory because it very likely contains WEB-INF and seeds.html
jre: contains the the Java Runtime Environments for various platforms. We will not discuss this here.
vicaya: contains the vicaya server and supporting data, such as templates, search index segments, and the crawler (nutch).
Unless you have a pre-made content file/structure, you can create a new content directory from an existing example. Remove all data from the mycontent directory **EXCEPT** WEB-INF (Note again that in your case, mycontent will most likely have another name). Place your html and web resources into the mycontent directory.
It is recommended that you include an index.html file. This will be the default page opened when the user views your web application. In order to perform a crawl (for searching), you will need to include a seeds.html file. The seeds.html file should reference every page that you want to crawl and search (you may get away with relaxing that rule, but it is not recommended, and I will not describe the method here).
Feel free to modify the vicaya/images. It is easiest to preserve the names and overwrite with images of your preference. Note that the icons are GIF and the splash image is a PNG. You may want to modify the searchpage.xml and searchhit.xml templates (I hope they are self explainatory).
Modify vicaya/context.properties such that context points to your web application. HOME refers to the vicaya directory. If you have your files set up as described above, then your context.properties file should look like this:
context=HOME/../mycontent
You may add further web applications, but only one may be the ROOT (context=ROOT). For example:
context=HOME/../mycontent context.other=HOME/../othercontent
http://localhost:8108/index.html will refer to the index.html file found in mycontent while http://localhost:8108/other/index.html will refer to the index.html file found in othercontent.
Note that Vicaya is based on Tomcat and as such it is fully capable of running many different J2EE web applications simultaneously. Search is only one such possibility. Care has be used to ensure that you may use diverse applications with Vicaya.
Run the start.sh.bat file. See the user documentation if you need further instruction
Running the crawl will delete the existing vicaya/segments directory. If that makes you nervous, make a backup. You'll need Cygwin installed if you are running Windows (Unix and Mac OS X users have what is needed already).
If you have your content set up with a mycontent/seeds.html file, then simply 'cd' to vicaya and run:
sh start_crawl.sh
Check out the Nutch 0.7 tutorial or the Nutch 0.8-dev tutorial.
A few errors in the crawl are to be expected (missing links, or links to external web sites, or unhandled types (like MP3), etc). However, if you see constant errors, then, things probably have gone terribly wrong. Try to see if you can view http://localhost:8108. If not, your server is messed up. Otherwise, there is something wrong with the crawl. You'll need to seek help. The Vicaya website has links to a help forum, user mailing list, etc. Otherwise, you can perform a search with http://localhost:8108/search .
For now, just stop the server, delete the vicaya/segment directory, modify your content, and restart server and the crawl.
TODO
The Vicaya project began as a response to a request at AccessToInsight.org for a search engine that could run from a CD-ROM. The goal is to be able to duplicate the website in a CD-ROM including a search engine. However, the Vicaya project is designed to support any offline website. In fact, it might be possible to support any read-only J2EE web application in addition to search (I'd love to learn about it and would be happy to offer suggestions).
|| Vicaya Creator Guide || Online: Download || Guides: User Creator Developer ||