|
Don Box and Tennessee Williams
Jon
Udell
has
joined the club
of those wanting to expose remote
XQuery
over the Internet.
I have a feeling that Jon may not have read my
security concerns
over exposing raw
XQuery
(and
XPath
) over a public access point
The reason I have this feeling is because it looks like
Jon's engine
has already melted down from too many //* queries (it's 11:36 PST and the site is effectively wedged).
When
I was on the site earlier today, I did notice that Jon's engine was
putting an upper-bound on the size of the result set. Unfortunately, it
looks as if it is not putting an upper bound on the amount of compute
resources a given query can consume.
When
I tried a //*-style query earlier this afternoon, the HTTP
infrastructure between my house and Jon's server wouldn't let a single
HTTP request go that long without returning.
If it was my one query that sent Jon's server over the edge, I'm very sorry.
[ Don Box's spoutlet: On the Kindness of Strangers]
Not to worry, Don. I'm aware of the concern, and part of this experiment is about exploring its implications. In fact, the queries that are timing out don't seem to be expensive at all. One possibility was my single-threaded use of Python's minimal BaseHTTPServer class.
So I switched from:
class myHTTPServer (BaseHTTPServer.HTTPServer):
To:
class myHTTPServer (SocketServer.ThreadingMixIn,
BaseHTTPServer.HTTPServer):
However, I think the problem may have been even more basic than that: failing to set Content-length when reporting that a query has exceeded the max result-set size. We'll see how it goes now.
As an aside, I've added a canned query that finds blog items written using InfoPath, based on its unique HTML coding signature :-)
The general question of how to constrain an engine's use of resources when exposed to arbitrary queries is, of course, extremely interesting.
|