<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Rather Insane</title>
	<atom:link href="http://ratherinsane.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://ratherinsane.com/blog</link>
	<description></description>
	<pubDate>Mon, 15 Dec 2008 04:34:11 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.8-bleeding-10187</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Filtering: Getting What You Want In O(n)</title>
		<link>http://ratherinsane.com/blog/2008/12/14/filtering-getting-what-you-want-in-on</link>
		<comments>http://ratherinsane.com/blog/2008/12/14/filtering-getting-what-you-want-in-on#comments</comments>
		<pubDate>Mon, 15 Dec 2008 04:26:02 +0000</pubDate>
		<dc:creator>Greg</dc:creator>
		
		<category><![CDATA[C++]]></category>

		<category><![CDATA[Computer Science]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[Lisp]]></category>

		<category><![CDATA[PHP]]></category>

		<category><![CDATA[filter]]></category>

		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=19</guid>
		<description><![CDATA[Web application development is currently a hot spot for the application of design patterns. Common patterns and development strategies all revolve around the separation of “business logic and display logic.” The focus has so far been on the separation of the interface code and the code that implements the functionality of the application, but almost [...]]]></description>
			<content:encoded><![CDATA[<p>Web application development is currently a hot spot for the application of design patterns. Common patterns and development strategies all revolve around the separation of “business logic and display logic.” The focus has so far been on the separation of the interface code and the code that implements the functionality of the application, but almost completely ignored has been the separation of code that produces or obtains data and code that processes data that should be happening within the application code.</p>
<p>Obtaining data is a trivial task, which is why it generally receives so little focus. However, obtaining data <i>correctly</i> is not a trivial task. Filtering addresses this directly, by providing the critical component between the data source and the data sink. Many abstraction layers exist for PHP, such as <a href="http://adodb.sourceforge.net/">ADOdb</a>, <a href="http://www.php.net/pdo">PDO – PHP Data Objects</a>, and <a href="http://framework.zend.com/manual/en/zend.db.html">Zend_Db</a>. However, all of these provide a simple transport layer for sending a query to a database and managing its result. They wrap around the real data source (a MySQL, PostgreSQL, etc. database) and provide an endpoint that, in theory, will allow a programmer to change data sources without modifying his code. As a result, all of these data sources lack the ability to uniformly filter through results, which results in either clunky code using an abstracted interface or lots of handwritten raw queries that are non-portable anyway. Adding filtering as a means of controlling the data source is a solution to this problem.</p>
<p><i>Here, we will digress from web applications to the toolkit of black magic that is Lisp/Scheme. If you&#8217;re not interested in the theoretical build up, skip down two sections to where we return to more practical matters.</i></p>
<h2>Theoretical Reasoning</h2>
<p>I was first introduced to the concept of filtering while reading <a href="http://mitpress.mit.edu/sicp/full-text/book/book.html">Structure and Interpretation of Computer Programs</a> the textbook for MIT&#8217;s introductory computer science course, 6.001. The basic idea is that a filter is a way to interleave the creation of a sequence of values and its processing. For example, let&#8217;s consider what happens if one wishes to find the first three prime Fibonacci numbers less than 10000 (which are 2, 3, and 5). There are two ways to do this: first, the generation and testing of numbers can be treated separately, which allows the methods for computing Fibonacci numbers to be separated from the method for testing for prime numbers. In this case, if we wanted to change our search from Fibonacci numbers to Catalan numbers, it&#8217;d be as simple as substituting a different generation method. With this strategy, 20 numbers must be generated, but only 6 are ultimately tested before the requirements are met. This means that 14, or 70%, of the generated numbers are discarded without being used, which is unacceptable.</p>
<p>The second option is the much more natural choice: combine the generation of Fibonacci numbers with the test for primes. However, this means that the entire program must be changed in order to go from Fibonacci number to Catalan numbers. Most people would avoid this by creating a method that uses some sort of internal state to return the next Fibonacci number with each successive call, which improves portability and pluggability, but is not thread safe. This is, however, a start, and it can be formalized into the &#8220;stream&#8221; concept described in SICP.</p>
<p>A stream is simple: it is a sequence of values that are generated as requested (an instance of lazy evaluation). A filter is simply a function that takes a single value and returns a boolean indicating whether or not that value meets the requirements of the filter. It is then easy to build a stream filter, which takes a filter function and a stream, and returns a stream that is the result of applying the filter to every element of the stream. Recall, however, that streams are implemented with lazy evaluation, so the filter is not actually applied to the underlying stream until values are requested, which ensures low overhead in processing large streams where only the first few values are required—only those values will be computed. Naturally, filters can naturally be stacked to produce a series of values that match a variety of different criteria, without compromising the integrity of any of the individual checks, and without giving care to the ultimate source of the data. A Fibonacci stream can easily be replaced with a Catalan stream by passing a different parameter into the prime filter stream. This achieves the modularity and independence of the first approach while maintaining the efficiency of the second approach: the first three values of the prime stream are requested before the appropriate conditions are met, which in turn only requests the first 6 values from the Fibonacci stream (4 values to find 2, 1 value to find 3, and one value to find 5). There are a lot of possibilities with filters. They can be constructed as simple functions, or as functors/closures with internal state, which allows for extended parameterization.</p>
<h2>From Lisp to the Real World</h2>
<p>Now that filters have shown their theoretical promise in Lisp, it is time to take them to an imperative language that is commonly used to build applications: C, C++, or Java. This comes with a number of important changes: in Lisp, it is easily to implement lazy evaluation on a large scale; in C++, it is not; in Lisp, the primary data structure is a list; in C++, a number of data structures are going to be used. This means that the most reasonable definition for a stream and a stream filter must change. In C++, a stream can be any data structure, and a stream filter is any object or function that can take a filter predicate and a fully populated data structure, and return a data structure of the same type that contains all of the objects from the given structure that match the filter predicate. This could be extended to work with infinite streams or use some forms of lazy evaluation, but it has been much more common, in my experience, to be interested in extracting subsets of existing data.</p>
<p>We&#8217;ll start by declaring filters as objects, because this will enable us to take multiple filters and stack them on top of one another to create simple, yet flexible ways of matching multiple conditions as if with an AND clause. It would be possible to create filters that implement any logical operation on other filters (for example, finding items that match either of two filters (OR) or only one of two filters (XOR)), but we&#8217;ll concern ourselves with the most immediately useful (and therefore worth the most time investment—the rest is &#8220;interesting&#8221; from a theoretical standpoint, but until it comes up in practice, isn&#8217;t worth investigating as more than a conceptual notion). It is also possible to use templates/generics and other OO tricks to make filters work with any data structure via iterators, but we&#8217;ll start with a really simple type of filter: one that takes a vector of elements and returns a new vector of elements that match the filtering predicate. Templates will still be necessary, however, to ensure that the filter can interact with any type of object:</p>
<pre name="code" class="cpp">
namespace filters {

template &lt;typename Find&gt;
class Filter {

typedef Filter&lt;Find&gt; FILTER_TYPE;
typedef std::vector&lt;Find&gt; CONTAINER;
typedef typename CONTAINER::iterator ITER;

public:
	Filter(void) : filterFunc(NULL), subFilter(NULL) {}
	Filter(bool (*filterFunc)(Find item)) : filterFunc(filterFunc), subFilter(NULL) {}
	~Filter(void) {}

	void stack(FILTER_TYPE *f);
	bool test(Find item);
	CONTAINER *filter(CONTAINER *items);
	virtual bool operator()(Find item);
private:
	bool (*filterFunc)(Find item);
	FILTER_TYPE *subFilter;
};

};
</pre>
<p>In particular, this filter takes as a constructor parameter any one function that takes as a single argument a pointer to an object of the specified type and returns a boolean indicating whether the object should be accepted by the filter. One additional feature is present though. The filter object is actually a functor, because it allows for the filter to not only encode state information, but to represent the predicate used to test items as well. When implementing real filters, because they may represent comparisons to some standard value, but the filter definition calls only for one parameter to the test function (this is a consequence of ability to curry variables through lambda functions in Lisp/Scheme), so it is impossible to use an external function as a filter, as that standard value would also have to be supplied. I&#8217;ll give an example of this as part of a practical filter demonstration.</p>
<p>Now the meat of the filter has to be filled in. Note that this is not a real filter! This is the mechanism by which all filters work. For completion&#8217;s sake, however, we&#8217;ll implement the functor nature of this filter to simply test that an item is not null. Also note that because this is a template definition, this could should actually replace the declaration lines above, within the class declaration, because it must be included with the header so that the compiler can generate the appropriate source code:</p>
<pre name="code" class="cpp">
void stack(FILTER_TYPE *f) {
	if (this->subFilter == NULL)
		this->subFilter = f;
	else
		this->subFilter->stack(f);
}
bool test(Find item) {
	bool status;
	if (this->filterFunc != NULL)
		status = (*this->filterFunc)(item);
	else
		status = (*this)(item);

	if (subFilter != NULL)
		status = status &#038; this->subFilter->test(item);
	return status;
}
CONTAINER *filter(CONTAINER *items) {
	CONTAINER *rtn = new CONTAINER;
	ITER iter;

	for (iter = items->begin(); iter != items->end(); iter++) {
		if (this->test(*iter))
			rtn->push_back(*iter);
	}

	return rtn;
}

virtual bool operator()(Find item) {
	if (item != NULL)
		return true;
	return false;
}
</pre>
<p>Overall this is pretty simple and implements all of the basic contracts that must be satisfied by a stream filter from Lisp. Note however that the filter function will suffer from the same performance issues as a naïve approach when trying to find the first n values that match some criteria. This is a different problem from normal filtering, and a subclass of Filter should be used to do this by overriding filter() to expect two parameters and only iterate until the desired number of elements has been founds. The test() method is already designed to handle the situation by testing a value against all filters at once, rather than testing all values against a single filter, so it is unaffected by the change of strategy.</p>
<p>Finally, let me give a practical example. As part of a C++ library I&#8217;m developing for myself to use on some game-like projects, I&#8217;ve created a few filters to do useful searches for me. One of them is to locate all objects (positions represented by single points) within a specific distance from some reference point. This base class is generic enough that its only dependence is the Point class itself, but that should be simple enough to implement on your own:</p>
<pre name="code" class="cpp">
namespace filters {

using namespace geom;

template &lt;typename Find&gt;
class DistanceFilter : public Filter&lt;Find *&gt; {

typedef Point *(Find::*FindPointFunc)(void) const;

public:
	DistanceFilter(Point *p, double d, FindPointFunc getPosition, bool outer = false) :
			Filter&lt;Find *&gt;(), pFunc(getPosition), p(p), dist(d), invert(outer) {}
	~DistanceFilter(void) {}

	virtual bool operator()(Find *item) {
		Point *point = (item->*pFunc)();
		return (p->distance(point) < dist) ^ invert;
	}
private:
	FindPointFunc pFunc;
	Point *p;
	double dist;
	bool invert;
};

};
</pre>
<p>This particular class takes as state variables the reference point and the distance to test against. It also requires a boolean indicating whether the test is for less than or greater than the specified distance (i.e. Whether the location of the object is inside or outside the circle of radius d). Finally, it takes a function pointer to the method that should be used to obtain the target object&#8217;s position. Most people would say that function pointers (and pointers in general) are quite dangerous, but I&#8217;m reasonably certain that in this case, because the compiler must expand all of the templates, it is able to remove the function pointer entirely. If not, then I&#8217;m sure that this could be re-written in such a way that it would retain a generic application, but without the function pointer. The time investment in figuring this out seems wasted to me, because the solution I have works well for my purposes.</p>
<h2>From the Real World to PHP</h2>
<p>&#8220;That&#8217;s great,&#8221; you may be saying (if you read the section on C++), &#8220;but how does this apply to a situation involving a database abstraction layer? The DB already provides the data that matches the criteria that I want, so there is no point in doing additional filtering on the results, unless I&#8217;m doing something horribly wrong with my SQL queries!&#8221; And you would be correct. If your queries aren&#8217;t giving you exactly what you want, and it isn&#8217;t because your query would require complicated SQL as compared to some simple iterative parsing, I would agree that you should fix the queries, not introduce additional post-processing. But, what if you have a lot of different combinations of things to search for? There must be an easier way to specify combinations of searches than through explicitly written queries for each combination.</p>
<p>Imagine you&#8217;re implementing webmail using PHP. How many ways are there to search through emails? By sender, by subject, by date, by read status, etc. The list goes on. And what if you want an &#8220;advanced&#8221; search wherein you can specify multiple parameters at once? The headaches that would result from attempting to write out queries for all possible combinations of search features (If there are five different things tagged to an email that one can search through, there are 32 ways to combine some, all, or none of those criteria into a search—do you really want to write out 32 queries?). The solution is to take the filtering concept described it above, and refactor it to work with databases. The only difference is that now, instead of acting on sets of data, the filter will produce a query to send to a database, in order to return the desired data set.</p>
<p>In PHP, like in C++, a filter will be a class. Using knowledge about the structure of the table, it will create a way to easily specify the parameters to a query, provide security against SQL injections, and create combinatorial queries without any explicit query writing. Also, in keeping with the tradition of stream filters, these filters will be stackable, but that implementation is beyond the scope of this article (read: interesting but I haven&#8217;t done it yet). Although it is not as commonly useful, due to limitations on PHP (resolved with late static bindings, added in 5.3 and up), all filters will exist as objects with state, rather than only the ones that need state. It has to be done in order to centralize some of the core functionality. Ideally, a query searching through email using a filter would look something like this:
</pre>
<pre name="code" class="php">
$params = array();
$params['sender'] = 'joe@smith.com';
$params['date'] = '20081117';	                 // YYYYMMDD
$filter = new EmailFilter();
$query = $filter->create_query($params);
$result = mysql_query($query);
while ($email = mysql_fetch_array($result)) {
        // ... Process here.
}
</pre>
<p>Options such as sender, date, subject, etc. could be added and removed at will, with the filter generating a correct query in any case, and, even more importantly, generating a reasonable, logical query. MySQL will be able to handle any correct, yet inane queries, but it would be nice if they resembled something that a human might create for debugging purposes, and possibly for optimization purposes (10% in parse time goes a long way if you&#8217;re hammering out thousands of queries). For more complicated queries, joins, etc. this becomes increasingly difficult, but a lot of queries that are commonly used are pretty simple.</p>
<p>In a stroke of luck, the filters are even easier to implement in PHP than they were in C++. 99% of the functionality is again in the base class, with the default behavior for derived classes specifying only the table that a query should use:</p>
<pre name="code" class="php">
abstract class Filter {
	protected $table = 'default';

	public function create_query($params) {
		$where = $this->decode_filter($params);
		$query = 'SELECT * FROM `'.$this->table.'` '.$where;
		return $query;
	}

	public function decode_filter($params) {
		$wheres = array();

		foreach ($params as $key => $name) {
			$wheres[] = '`'.mysql_real_escape_string($key).'` = "'.mysql_real_escape_string($name).'"';
		}

		if (!empty($wheres))
			return 'WHERE '.implode($wheres, ' AND ');
		return '';
	}
}

class EmailFilter extends Filter {
	public function EmailFilter() {
		$this->table = 'emails';
	}
}
</pre>
<p>However, this implementation is really stupid. It does nothing with its data, and it provides none of the features I mentioned earlier, outside of basic processing of columns and match values. And, with things the way that they are, the EmailFilter class serves almost no purpose, as it provides a simple piece of configuration data that could actually exist as a parameter to create_query() and save a lot on processing overhead. A real filter should be able to handle more than just simple string data, and it should be capable of handling complex data types. The purpose of any library is to simplify the life of the user (the developer using the application), so if it is unable to provide functionality without lots of extra work on the programmer&#8217;s part, it is not worth using.</p>
<p>With that in mind, the base Filter class must be modified to become aware of the columns in some way, so that it can be smarter with data preparation, and it must provide hooks for subclasses to specify new columns and provide their own processing for those columns:</p>
<pre name="code" class="php">
abstract class Filter {
	protected $table = 'default';
	protected $special;
	protected $columns;
	protected $wheres;

	public function __construct() {
		$this->wheres = array();
		$this->special = array();
		$this->columns = array();
	}

	public function create_query($params) {
		$where = $this->decode_filter($params);
		$query = 'SELECT * FROM `'.$this->table.'` '.$where;
		return $query;
	}

	public function decode_filter($params) {
		$this->wheres = array();

		foreach ($params as $key => $value) {
			$this->process($key, $value);
		}

		if (!empty($this->wheres))
			return 'WHERE '.implode($this->wheres, ' AND ');
		return '';
	}

	public function process($key, $value) {
		if (!empty($this->special) &#038;&#038; in_array($key, array_keys($this->special))) {
			return call_user_func($this->special[$key]['callback'], $key, $value);
		}
		else if (is_array($value)) {
			$values = array();
			foreach ($value as $item) {
				$values[] = $this->handle_type($key, $item);
			}
			$this->wheres[] =  '`'.mysql_real_escape_string($key).'` IN ('.implode($values, ', ').')';
		}
		else {
			$this->wheres[] = '`'.mysql_real_escape_string($key).'` = '.$this->handle_type($key, $value);
		}
	}

	protected function handle_type($key, $value) {
		if (isset($this->columns[$key]) &#038;&#038; $this->columns[$key]['type'] == 'int')
			return (int) $value;
		else
			return '"'.mysql_real_escape_string($value).'"';
	}
}

class EmailFilter extends Filter {
	public function __construct() {
		$this->table = 'emails';
		$this->special['recipient'] = array('callback' => array($this, 'process_recipient'));
	}

	public function process_recipient($key, $value) {
		$this->process('to', $value);
		$this->process('cc', $value);
		$this->process('bcc', $value);
	}
}
</pre>
<p>The end result is a very flexible class, Filter, that can accept data in a number of formats (literals or arrays of literals—objects are slightly more complicated to handle). Along with it is a very simple example &#8220;real&#8221; filter that implements one simplifying macro, &#8216;recipient,&#8217; which is used to populate values for the to, cc, and bcc fields of the email search with one phrase. The use of a child class is much more justified here, as it is now adding significant new functionality by providing a definition for a special key word processor that is not included with the normal output. Processing for object data types can be implemented readily in this way by manually taking the relevant object fields and passing them back up to a call to process. Remember, the end goal is ease of use for the developer who must use the filters to create SQL queries, and I think it is pretty easy to pass an array of values to the filter and have it implicitly iterate over it, processing each value according to the rules already specified:</p>
<pre name="code" class="php">
$filter = new EmailFilter();
$params = array('from' => array('john_smith@gmail.com', 'tj@monticello.org'), 'subject' => 'Hello');
echo $filter->create_query($params);
echo '';
$params = array('recipient' => array('t1@gmail.com', 't2@gmail.com'));
echo $filter->create_query($params);
</pre>
<p>Produces:</p>
<pre>
SELECT * FROM `emails` WHERE `from` IN ("john_smith@gmail.com", "tj@monticello.org") AND `subject` = "Hello"
SELECT * FROM `emails` WHERE `to` IN ("t1@gmail.com", "t2@gmail.com") AND `cc` IN ("t1@gmail.com", "t2@gmail.com") AND `bcc` IN ("t1@gmail.com", "t2@gmail.com")
</pre>
<h2>Conclusions</h2>
<p>Overall, the concept of a filter is an essential component of the database access and abstraction layer. I wouldn&#8217;t call this a design pattern (I hate that phrase), but more of an idiom. There&#8217;s no one filtering solution that&#8217;s correct for every situation, and they aren&#8217;t always the answer. If a handful of static queries are necessary, then that is what the code should do. If the code will be changing what it is looking for or creating complicated expressions with partial information, then some form of automated query generation based on filtering is essential. It abstracts the way that data is represented from the application programmer in the same way that functionality is abstracted away from the user interface developer, allowing him to focus more on how the data needs to be used than on how it should be obtained.</p>
<p>Don&#8217;t forget that filters do have a penalty associated with them. The processing time is never free, but it isn&#8217;t really significant, because of the scale on which filters operate (a handful of items to process, not thousands). In true CS spirit, though, because servers are getting faster, the important thing is development time, not absolute efficiency. Filters fit quite neatly into the rapid application development paradigm by reducing the amount of time spent creating and debugging queries (if the formulaic query is right once, when it creates a different query it&#8217;ll probably still be right unless you give it garbage). The O(n) claim of efficiency, of course, pertains to the C++ implementation of filters, where each item that is to be examined is checked O(n) times or fewer, if the filters are stacked properly. I could go on for pages about how to order them to try to maximize this type of efficiency, but it&#8217;s not the point here. In PHP, even, the generation of a query is still O(n), but it is a different type of penalty, because it is associated with string processing, rather than comparisons.</p>
<p>Finally, remember that this is just a brief (?!) overview of the possibilities. I certainly haven&#8217;t explored all of the options, and there are a lot of different ways filters can be implemented, in both C++ and in PHP, and there are a number of things that I didn&#8217;t implement that may prove useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/12/14/filtering-getting-what-you-want-in-on/feed</wfw:commentRss>
		</item>
		<item>
		<title>In praise of Zend_Db</title>
		<link>http://ratherinsane.com/blog/2008/12/12/in-praise-of-zend_db</link>
		<comments>http://ratherinsane.com/blog/2008/12/12/in-praise-of-zend_db#comments</comments>
		<pubDate>Sat, 13 Dec 2008 03:27:39 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Zend Framework]]></category>

		<category><![CDATA[php]]></category>

		<category><![CDATA[zend_db]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=115</guid>
		<description><![CDATA[Zend_Db is Zend Framework&#8217;s database abstraction layer and this week I learned of a poorly documented feature of Zend_Db::quote (while struggling to work around a Zend_Db bug).
It turns out that if you pass an array to quote(), it will escape each element of the array and return a comma separated string:

function example($id) {
   [...]]]></description>
			<content:encoded><![CDATA[<p>Zend_Db is Zend Framework&#8217;s database abstraction layer and this week I learned of a poorly documented feature of Zend_Db::quote (while struggling to work around a Zend_Db bug).</p>
<p>It turns out that if you pass an array to quote(), it will escape each element of the array and return a comma separated string:</p>
<pre name="code" class="php">
function example($id) {
    global $db;
    $db->query('SELECT * FROM `test` WHERE id IN (' . $db->quote($id, 'INTEGER') . ')');
}
</pre>
<p>This is great, because it means the application can treat a mixed parameter (either a value or an array of values) as identical and just let Zend_Db figure it all out.</p>
<p>The bug, by the way, is if you are working with an array of strings of the form &#8220;ns:value&#8221;, quote each value individual (aka not as shown above), and use IN, the Mysqli adapter, at least, will try to evaluate :value as a named expression. </p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/12/12/in-praise-of-zend_db/feed</wfw:commentRss>
		</item>
		<item>
		<title>jQuery, OpenSearch and Autocomplete</title>
		<link>http://ratherinsane.com/blog/2008/11/20/jquery-opensearch-and-autocomplete</link>
		<comments>http://ratherinsane.com/blog/2008/11/20/jquery-opensearch-and-autocomplete#comments</comments>
		<pubDate>Thu, 20 Nov 2008 19:10:19 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[autocomplete]]></category>

		<category><![CDATA[jQuery]]></category>

		<category><![CDATA[OpenSearch]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=106</guid>
		<description><![CDATA[Here&#8217;s a quick code snippet for making JQuery&#8217;s autocomplete ui element consume an OpenSearch resource:

    
    jQuery('#term').autocomplete('/tools/semanticweb/lcsh/lcsh_opensearch.php', {parse: opensearch});
    function opensearch(data) {
        data = eval(data);
        var parsed = [];

    [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quick code snippet for making JQuery&#8217;s <a href="http://docs.jquery.com/UI/Autocomplete">autocomplete</a> ui element consume an <a href="http://www.opensearch.org/Home">OpenSearch</a> resource:</p>
<pre name="code" class="javascript">
    <script type="text/javascript">
    jQuery('#term').autocomplete('/tools/semanticweb/lcsh/lcsh_opensearch.php', {parse: opensearch});
    function opensearch(data) {
        data = eval(data);
        var parsed = [];

        for (var i=0; i < data[1].length; i++) {
            var row = jQuery.trim(data[1][i]);
            if (row) {
                parsed[parsed.length] = {
                    data: [row],
                    value: row,
                    result: row
                };
            }
        }
        return parsed;
    }      
    </script>
</script></pre>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/11/20/jquery-opensearch-and-autocomplete/feed</wfw:commentRss>
		</item>
		<item>
		<title>XSLT: The document() function</title>
		<link>http://ratherinsane.com/blog/2008/11/08/xslt-the-document-function</link>
		<comments>http://ratherinsane.com/blog/2008/11/08/xslt-the-document-function#comments</comments>
		<pubDate>Sat, 08 Nov 2008 12:51:59 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=98</guid>
		<description><![CDATA[Earlier this morning, I was translating some PHP code (great for development) into XSLT (great for deployment). This particular PHP script intelligently merges a chunk of descriptive metadata with the technical and instantiation metadata of a media item. The first challenge was to fetch the descriptive metadata for the object. Fortunately, the designers of xslt, [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier this morning, I was translating some PHP code (great for development) into XSLT (great for deployment). This particular PHP script intelligently merges a chunk of descriptive metadata with the technical and instantiation metadata of a media item. The first challenge was to fetch the descriptive metadata for the object. Fortunately, the designers of xslt, in their wisdom, included the <em>document()</em> function, which allows the XSLT script to fetch external resources.</p>
<pre name="code" class="xml">
&lt;xsl:stylesheet xmlns:fedora-rels-ext="info:fedora/fedora-system:def/relations-external#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"&gt;
             &lt;xsl:output method="xml" omit-xml-declaration="yes" /&gt;

             &lt;xsl:template match="rdf:RDF"&gt;
                     &lt;xsl:apply-templates select="//fedora-rels-ext:isPartOf" /&gt;
             &lt;/xsl:template&gt;

             &lt;xsl:template match="//fedora-rels-ext:isPartOf"&gt;
                     &lt;xsl:variable name="pid" select="substring-after(@rdf:resource, '/')" /&gt;
                     &lt;xsl:copy-of select="document(concat('http://localhost:8080/fedora/get/', $pid, '/PBCore'))" /&gt;
             &lt;/xsl:template&gt;
     &lt;/xsl:stylesheet&gt;
</pre>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/11/08/xslt-the-document-function/feed</wfw:commentRss>
		</item>
		<item>
		<title>Federated/distributed digital repositories</title>
		<link>http://ratherinsane.com/blog/2008/10/26/federateddistributed-digital-repositories</link>
		<comments>http://ratherinsane.com/blog/2008/10/26/federateddistributed-digital-repositories#comments</comments>
		<pubDate>Sun, 26 Oct 2008 16:03:02 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Digital Repositories]]></category>

		<category><![CDATA[bvault]]></category>

		<category><![CDATA[federated repositories]]></category>

		<category><![CDATA[Fedora]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=95</guid>
		<description><![CDATA[For the bVault project I am developing, one of our secondary goals is to create a replicable model for other digital media repositories. One of the ways we are pursuing this is to lay the foundations for an interface to a federated/distributed repository among other public broadcasters, which takes advantage of one of the architectural [...]]]></description>
			<content:encoded><![CDATA[<p>For the <a href="http://launchpad.net/bvault">bVault</a> project I am developing, one of our secondary goals is to create a replicable model for other digital media repositories. One of the ways we are pursuing this is to lay the foundations for an interface to a federated/distributed repository among other public broadcasters, which takes advantage of one of the architectural features of public broadcasting in the US &#8212; the public broadcasting &#8220;network&#8221; is really a federation of individual stations that subscribe and contribute to a particular programming distribution service (PBS and NPR among others)</p>
<p>A federated repository ultimately needs three things:</p>
<ol>
<li>A common API among the participating repositories,</li>
<li>A search index that covers all the repositories, and</li>
<li>A resolver to translate a search result back to the originating repository</li>
</ol>
<h3>Common API</h3>
<p>For bVault, the common API is the set of web services exposed by Fedora, and the metadata translation dissemination service behind that, which allows a client to receive a particular metadata format, regardless of the underlying schema. This is an important feature, because it allows individual repositories to use whichever metadata format is most natural to their needs, while seamlessly generating interoperable metadata.</p>
<h3>Search index</h3>
<p>The exact methods employed to generate a spanning search index are essentially arbitrary. Solr provides some <a href="http://wiki.apache.org/solr/DistributedSearch">distributed/sharded</a> search capabilities, but the index could also operate on a pub/sub model where repositories push content out to a master search index, or with a search engine like crawler using <a href="http://openarchives.org">OAI-PMH endpoints</a> for the repository. Because the search index is loosely coupled to the whole system, it ultimately is an architectural decision rather than a technical one</p>
<h3>Distributed Resolver</h3>
<p>Now that we have a way to discover items within a repository, the interface needs a way to extract the content from the origin. For this, we need a way to resolve a unique resource identifier (URI!) back to its source. Again, the method is somewhat arbitrary, but for this project, we elected to require unique namespaces for each repository (quite reasonable, considering the application).</p>
<p>To do this, I&#8217;ve slipped a namespace resolver into the client&#8217;s API call to allow the interface to act independently from the source of the content. For a simple API call, like listDatastreams, we have:</p>
<pre name="code" class="php">
public function listDatastreams($pid, $asOfDateTime = null) {
      return Fedora_Repository::get('API-A', $pid)->listDatastreams(array('pid' => $pid,
                    'asOfDateTime' => $asOfDateTime));
}
</pre>
<p>This requests the API-A binding appropriate to the current persistant identifier (pid):</p>
<pre name="code" class="php">
/**
  * Retrieves a Fedora Repository that can provide the $type endpoint for the PID/prefix $prefix
  *
  * @param string $type
  * @param string $prefix
  * @return Fedora_Repository
  */
static public function get($type, $prefix = '') {
     global $objManager;

     $arrRepository = $objManager->resolve($prefix);
     $objClient = new stdClass;

     if(count($arrRepository) == 1) {
           $objClient = $arrRepository[0]->getSoapClient($type);
     } else {
           $arrKey = array_rand($arrRepository, count($arrRepository));

           foreach($arrKey as $key) {
               $objClient = $arrRepository[$key]->getSoapClient($type);
               if($objClient !== false) {
                     break;
               }
           }
      }

      if($objClient instanceof SoapClient) {
            return $objClient;
      } else {
            return false;
      }
}
</pre>
</p>
<p>Creating a distributed repository doesn&#8217;t cost much now, and if you design it right, you can benefit from the potential for redundancy and mirroring immediately, even before there is a federated network to tap into.
</p>
<p>The full source is available from the <a href="http://bazaar.launchpad.net/%7Echris-beer/bvault/wgbh/files/9?file_id=fedora-20080924210122-b7675owtu9oq690p-28">bVault Fedora PHP library</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/10/26/federateddistributed-digital-repositories/feed</wfw:commentRss>
		</item>
		<item>
		<title>Zend_Cache for Web Services</title>
		<link>http://ratherinsane.com/blog/2008/10/25/zend_cache-for-web-services</link>
		<comments>http://ratherinsane.com/blog/2008/10/25/zend_cache-for-web-services#comments</comments>
		<pubDate>Sat, 25 Oct 2008 17:38:07 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Zend Framework]]></category>

		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[web services]]></category>

		<category><![CDATA[zend_cache]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=83</guid>
		<description><![CDATA[My current project involves a number of SOAP Web Services requests to retrieve information from our Fedora repository. To help minimize overhead from HTTP requests, I&#8217;m using Zend Framework&#8217;s Zend_Cache_Frontend_Class to wrap the whole Fedora/PHP interface class. Zend_Cache allows me to implement this style of caching with only a single line of code.

Our web services [...]]]></description>
			<content:encoded><![CDATA[<p>My current project involves a number of SOAP Web Services requests to retrieve information from our <a href="http://fedora-commons.org">Fedora repository</a>. To help minimize overhead from HTTP requests, I&#8217;m using Zend Framework&#8217;s <a href="http://framework.zend.com/manual/en/zend.cache.frontends.html#zend.cache.frontends.class">Zend_Cache_Frontend_Class</a> to wrap the whole Fedora/PHP interface class. Zend_Cache allows me to implement this style of caching with only a single line of code.</p>
<p>
Our web services consumer provides a couple of access methods that can be safely cached:</p>
<pre name="code" class="php">class Fedora_Object
{
/* .... */
	        public function getDissemination($pid, $sDefPid, $methodName, $parameters, $asOfDateTime = null) {
           try {
                   return Fedora_Repository::get('API-A', $pid)-&gt;getDissemination(array('pid' =&gt; $pid,
                                                     'serviceDefinitionPid' =&gt; $sDefPid,
                                                     'methodName' =&gt; $methodName,
                                                     'parameters' =&gt; $parameters,
                                                     'asOfDateTime' =&gt; $asOfDateTime));
                } catch(SoapFault $s) {
                       return $s;
                }
        }
/* .... */
}</pre>
</p>
<p>
In the bootstrap file, instead of initializing the Fedora_Object class, I wrap it in a Zend_Cache instance:</p>
<pre name="code" class="php">$fedora = Zend_Cache::factory('Class', 'File', array('cached_entity' =&gt; new Fedora_Object(),
                          'cached_methods' =&gt; array('getObjectXML', 'getDatastreamDissemination', 'getDissemination'),
                           'cache_by_default' =&gt; false));</pre>
<p>This code tells Zend_Cache to cache only the specified <em>cached_methods</em> and pass everything else through. Easy.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/10/25/zend_cache-for-web-services/feed</wfw:commentRss>
		</item>
		<item>
		<title>Force Directed Positioning with PHP, SVG, and HTML</title>
		<link>http://ratherinsane.com/blog/2008/10/25/force-directed-positioning-with-php-svg-and-html</link>
		<comments>http://ratherinsane.com/blog/2008/10/25/force-directed-positioning-with-php-svg-and-html#comments</comments>
		<pubDate>Sat, 25 Oct 2008 15:12:45 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[force directed graph]]></category>

		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=81</guid>
		<description><![CDATA[I recently released a proof-of-concept library for drawing force directed graphs with PHP. Right now, it is absurdly simple and has a few quirks, but it gets the job done. It&#8217;s original purpose is for visualizing search results within a controlled vocabulary, although I can imagine other uses.
Right now, it will only properly function in [...]]]></description>
			<content:encoded><![CDATA[<p>I recently released a proof-of-concept library for drawing force directed graphs with PHP. Right now, it is absurdly simple and has a few quirks, but it gets the job done. It&#8217;s original purpose is for visualizing search results within a controlled vocabulary, although I can imagine other uses.</p>
<p>Right now, it will only properly function in browers that support the &lt;canvas&gt; tag, although that will change in a couple of weeks. I also hope to implement a few parsers to make it easier to create graphs (I&#8217;m thinking RDF and dot will be sufficient). There&#8217;s also some necessary performance enhancements before it can be considered production-ready.</p>
<p>You can download the files <a href="http://d-archive.com/~chris/graph.tar.gz">here</a> or see the library in context at the <a href="http://bazaar.launchpad.net/%7Echris-beer/bvault/wgbh/files/9?file_id=graph-20081024202937-ckq4r5lpe3lemvw2-1">bvault repository on launchpad</a>.</p>
<p>This work is inspired by the <a href="http://www.jsviz.org/blog/">JSViz library</a> which renders force directed graphs on the client side using javascript.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/10/25/force-directed-positioning-with-php-svg-and-html/feed</wfw:commentRss>
		</item>
		<item>
		<title>vi and the mouse</title>
		<link>http://ratherinsane.com/blog/2008/08/22/vi-and-the-mouse</link>
		<comments>http://ratherinsane.com/blog/2008/08/22/vi-and-the-mouse#comments</comments>
		<pubDate>Sat, 23 Aug 2008 00:34:24 +0000</pubDate>
		<dc:creator>Greg</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[ssh]]></category>

		<category><![CDATA[vi]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=67</guid>
		<description><![CDATA[Some people probably consider this sacrilege, but I like to use a mouse with my vi sessions, because the scrolling facilities provided by a mouse wheel far outweigh anything that is possible with the keyboard. When working in a console though, perhaps through ssh, gvim is obviously not available, unless one is using X11 forwarding, [...]]]></description>
			<content:encoded><![CDATA[<p>Some people probably consider this sacrilege, but I like to use a mouse with my vi sessions, because the scrolling facilities provided by a mouse wheel far outweigh anything that is possible with the keyboard. When working in a console though, perhaps through ssh, gvim is obviously not available, unless one is using X11 forwarding, which requires more time than it merits on windows. What most people don&#8217;t know, however, is that vi supports a console mouse, and ssh supports the transmission of mouse information and events. The best windows SSH client, <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/">PuTTY</a>, also supports this feature.</p>
<p>It&#8217;s really simple to turn on mouse support in vi, and requires no ssh client (or server, so far as i know) configuration:</p>
<pre>
:set mouse=a
</pre>
<p>If you want to enable the feature permanently and by default (which is a good idea when you may use it more than once&#8211;vi will print a message, but start normally if mouse support is unavailable), just add the line to ~/.vimrc.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/08/22/vi-and-the-mouse/feed</wfw:commentRss>
		</item>
		<item>
		<title>Reflection: The Next Generation of Pointer?</title>
		<link>http://ratherinsane.com/blog/2008/08/19/reflection-the-next-generation-of-pointer</link>
		<comments>http://ratherinsane.com/blog/2008/08/19/reflection-the-next-generation-of-pointer#comments</comments>
		<pubDate>Tue, 19 Aug 2008 21:42:22 +0000</pubDate>
		<dc:creator>Greg</dc:creator>
		
		<category><![CDATA[C++]]></category>

		<category><![CDATA[Computer Science]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[pointers]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[reflection]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=59</guid>
		<description><![CDATA[Recently, I discussed some basic usage of Java&#8217;s reflection capabilities and also how to build a framework similar to JavaBeans, etc. After starting a project in C++ and implementing several filters (article pending) using templates and function pointers, I started thinking: exactly how similar are pointers to reflection? What can I do with pointers that [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I discussed some basic usage of <a href="http://ratherinsane.com/blog/2008/06/28/using-javas-reflection-api-part-i">Java&#8217;s reflection capabilities</a> and also how to <a href="http://ratherinsane.com/blog/2008/08/02/using-javas-reflection-api-part-ii">build a framework</a> similar to JavaBeans, etc. After starting a project in C++ and implementing several filters (article pending) using templates and function pointers, I started thinking: exactly how similar are pointers to reflection? What can I do with pointers that I can&#8217;t do with reflection, and vice versa? Note that I&#8217;m only really considering pointers when used in the same context as reflection. This list ignores the other possibilities with pointers that make them a truly formidable language feature, one that Java unfortunately lacks (perhaps &#8220;fortunately,&#8221; in order to <a href="http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html">make the language easier</a>).</p>
<p><b>Common to both</b><br />
Pros:</p>
<ol>
<li>Apply to functions (methods) and data</li>
<li>Work with static and instance functions/data</li>
<li>References/Pointers to methods are independent of the instance they are used with</li>
<li>Have some form of type safety*</li>
<li>Follow the rules of polymorphism correctly, when used on an appropriate method.</li>
</ol>
<p>* In C++, this can be voluntarily violated. In Java, this is in the form of thrown exceptions.</p>
<p><b>Pointers</b><br />
Pros:</p>
<ol>
<li>Allow ultimate freedom</li>
</ol>
<p>Cons:</p>
<ol>
<li>Protection can be circumvented for more difficult usage</li>
<li>Require prior knowledge of the structure in question</li>
</ol>
<p><b>Reflection</b><br />
Pros:</p>
<ol>
<li>Allow for named access to unknown information</li>
<li>Allow for complete enumeration of available functions/fields</li>
</ol>
<p>Cons</p>
<ol>
<li>Does not facilitate ease of access due to annoying oft-empty try/catch blocks and lots of empty code that must be repeated for each use</li>
<li>High overhead, not really a first-class language feature. It, like the String class, does things that are not possible with normally written Java, yet are not language features.</li>
<li>All reflected method invocations return Objects, not specific types.</li>
</ol>
<p>Really, reflection is the next step in the evolution of function pointers. It expands the functionality of a pointer from something that is ultimately a pretty dumb tool, capable of pointing at something and calling it, without the ability to detect errors at runtime. With reflection, much more careful examination is possible: all methods can be dynamically enumerated, and try/catch blocks can be used to prevent bad calls from bringing the application to a crash with a segmentation fault. Obviously this comes at an expense, and it brings to mind a single question: is this really necessary?</p>
<p>The obvious answer from Dodgeball aside, I don&#8217;t think it is terribly necessary. Expanding the simple function pointer syntax from C, C++ added method pointers with a somewhat obscure syntax, yet one that is ultimately easier to use than Java&#8217;s reflection capabilities:</p>
<p>In C++:</p>
<pre name="code" class="cpp">
// Just typedef the pointer type so that it's really easy to use
typedef char *(SomeClass::TypeNameHere)(void);

// Create a pointer to the toCharArray() method of Someclass
TypeNameHere ptr = &#038;SomeClass::toCharArray;

// Print out the result of calling it on two instances
SomeClass *sc1 = new SomeClass(1);
SomeClass *sc2 = new SomeClass(2);
std::cout < <  (sc1->*ptr)() < < std::endl << (sc2->*ptr)() < < std::endl;
</pre>
<p>Something equivalent in Java:
</pre>
<pre name="code" class="java">
try {
	Method m = SomeClass.class.getMethod("toCharArray", null);
	SomeClass sc1 = new SomeClass(1);
	SomeClass sc2 = new SomeClass(2);
	System.out.println((String) m.invoke(sc1, null) + (String) m.invoke(sc2, null));
} catch (Exception e) {
	// We know the function exists, so there should be no exception
}
</pre>
<p>This looks like pretty much the same amount of code. However, the typedef line is written once in the header file that defines SomeClass, which means that it is never rewritten by the modules that use SomeClass. This is always desirable, when the recreation would be a carbon copy every time. The try/catch blocks, however, must always be included, or else the method that uses reflection must have a &#8220;throws&#8221; clause added to its declaration, which only has the effect of deferring where the try/catch block must be placed, not omitting it entirely. Thus the Java version has an additional cost of two extra lines in the source code + all of the generated exception handling code that must be created for the VM for <b>each instance of reflection</b>, even though it is almost a certainty that the method will exist in many cases, because the method is a part of the interface that the class must implement, and this is enforced through some other means (interface methods, methods declared in a parent class). </p>
<p>There is also the extra cost of going through the Method object to reach the method, both in terms of space and speed. In C++, the cost of using a function pointer is almost negligible, because it is 4 bytes for the pointer (32-bit system), and it uses essentially exactly the same call procedure as a method call. Beyond that, in certain instances (templates with function pointers), a function pointer can be used to simplify source code, but is removed entirely by the compiler. Because of the complete decoupling between compile and run time provided by the reflection interface, this is simply not possible with Java. The one problem that could be solved is that a Method invocation returns a result of type Object, even if the underlying method that was called returns another object. This can easily be typecast, but it is more useless overhead code that must be typed out even though the return type of the method is <b>explicitly</b> known (isn&#8217;t that the entire point of a strongly-typed language?). Java&#8217;s generics and type erasure could mitigate this problem, but the lack of a typedef statement makes my fingers hurt whenever I encounter generics as well.</p>
<p>Method provides some flexibility, sure, but in every reasonable situation, all of these flexibility options add up to useless overhead. Extra source code, extra generated code, and useless runtime checks that are never failed, unless the programmer goes to great lengths to make them fail. But, if you want to do that, you should be using C++ anyway, because it makes that easy to do too.</p>
<p>Of course, reflection has some nice features on paper: when creating debuggers, it has a decided advantage, because it can be used to list the actual fields and methods available, but gdb and others have already found ways around this by processing the symbol information available in compiled libraries and executables. But in practice it has (at least in Java) a number of shortcomings that must be solved before it can functionally surpass its C++ counterpart and predecessor, the function pointer, as a useful language feature that simplifies and enhances code.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/08/19/reflection-the-next-generation-of-pointer/feed</wfw:commentRss>
		</item>
		<item>
		<title>Fedora, GSearch, Solr [UPDATED</title>
		<link>http://ratherinsane.com/blog/2008/08/12/fedora-gsearch-solr</link>
		<comments>http://ratherinsane.com/blog/2008/08/12/fedora-gsearch-solr#comments</comments>
		<pubDate>Tue, 12 Aug 2008 20:59:09 +0000</pubDate>
		<dc:creator>Chris Beer</dc:creator>
		
		<category><![CDATA[Fedora]]></category>

		<category><![CDATA[GSearch]]></category>

		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://ratherinsane.com/blog/?p=57</guid>
		<description><![CDATA[I&#8217;m at the Fedora Common&#8217;s Red Island Repository Institute this week to learn more about the Fedora Commons repository to help in my digital repository work at WGBH. For the last week, or so, I have been struggling with getting Fedora and Solr to play nice with each other, so we can do really interesting [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m at the Fedora Common&#8217;s Red Island Repository Institute this week to learn more about the <a href="http://fedora-commons.org">Fedora Commons repository</a> to help in my digital repository work at WGBH. For the last week, or so, I have been struggling with getting Fedora and <a href="http://lucene.apache.org/solr">Solr</a> to play nice with each other, so we can do really interesting searches, faceted browsing, and more. </p>
<p>The secret, I have discovered, is to start with a pre-populated solr index (I don&#8217;t know if this is strictly necessary, but it solved one of our major errors, and to run solr from it&#8217;s included Jetty engine (again, possibly not necessary, but I don&#8217;t have the Java application experience to delve into complex configurations).</p>
<p>[UPDATE]</p>
<p>Here are some step-by-step directions to get GSearch and Fedora to play nice together:</p>
<pre>
###gsearch###
1) Download GSearch 2.1.1 and copy the fedoragsearch.war file to $TOMCAT_HOME/webapps
2) Restart tomcat to unpack WAR
3) configvalues.xml :
       Update soap.deploy.hostport, .user, and .pass
<property name="soap.deploy.hostport" value="localhost:8080" />
<property name="soap.deploy.user" value="fedoraAdmin" />
<property name="soap.deploy.pass" value="fedora" />
       :29,31s/basic/solr/g
<property name="default.config.path" location="${solr.config.path}" />
<property name="default.config.prefix" value="${solr.config.prefix}" />
<property name="default.index.1" value="${solr.index.1}" />
       Line 242: set solr.index.1.indexbase and .indexdir
<property name="solr.index.1.indexbase" value="http://localhost:8080/solr" />
<property name="solr.index.1.indexdir" value="${fedora.home}/solr_data/index" />

4) ant -f configvalues.xml configOnWebServer
5) cd $TOMCAT_HOME/webapps/fedoragsearch/WEB-INF/classes/
6) cp -R configBase/updater configDemoOnSolr
7) configDemoOnSolr/fedoragsearch.properties
       append: fedoragsearch.updaterNames                                                                      = BasicUpdaters
8) Edit configDemoOnSolr/index/DemoOnSolr/demoFoxmlToSolr.xml as appropriate. Copy this file to config/index/DemoOnSolr.

###SOLR###
9) Download solr 1.2 and unpack to $FEDORA_HOME/solr-1.2
10) mkdir $FEDORA_HOME/solr
11) cp -R solr-1.2/example/solr/* $FEDORA_HOME/solr
12) cp solr-1.2/dist/apache-solr-1.2.0.war $FEDORA_HOME/solr/solr.war
13) Edit conf/schema.xml to reflect the schema choices you made in demoFoxmlToSolr.xml
14) Create $TOMCAT_HOME/conf/Catalina/localhost/solr.xml
  <context docBase="($FEDORA_HOME)/solr/solr.war" debug="1" crossContext="true">
      <environment name="solr/home" type="java.lang.String" value="($FEDORA_HOME)/solr" override="true" />
   </context>
15) Restart Tomcat to start solr
16) Use solr to initialize the indexes
    a) Create a compatible solr ingest xml file by running one of your foxml files through your demoFoxmlToSolr.xslt file (maybe not necessary?)
             e.g. : xsltproc -o $FEDORA_HOME/tmp.xml tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/DemoOnSolr/demoFoxmlToSolr.xslt data/objects/2008/0808/14/47/wgbh_100
    b) cp $FEDORA_HOME/solr-1.2/example/exampledocs/post.sh $FEDORA_HOME/solr
    c) Edit post.sh to change the URL (e.g. URL=http://localhost:8080/solr/update)
    d) ./post.sh $FEDORA_HOME/tmp.xml &#038;&#038; rm $FEDORA_HOME/tmp.xml

###DOES IT WORK?###
17) Go to http://localhost:8080/fedoragsearch/rest -> browseIndex; Does your object exist?
18) Check gsearch -> solr integration works [updateIndex fromPid]
19) Import your current foxml files with [updateIndex fromFoxmlFiles]
</pre>
<p>This still doesn&#8217;t take advantage of the JMS capabilities of Fedora 3.0, unfortunately; that&#8217;s the next challenge.</p>
]]></content:encoded>
			<wfw:commentRss>http://ratherinsane.com/blog/2008/08/12/fedora-gsearch-solr/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
