Modules:

pykhtml

class Browser (inherits object)

A Browser is the main class you use to navigate around and visit different pages. Have a look at Browser.load and Browser.document to access basic use.

__init__(self)

Create a new Browser.

document (read-only property)

Get a reference to the document (see dom.Document) for the currently loaded page. It contains all the tasty methods for walking the DOM tree like getElementById/getElementsByTagName, and methods for browsing to other linked pages.

eval(self, script, this=None)

Evaluate a piece of JavaScript. The 'this' parameter, if specified, of type dom.Node, is the DOM node to be used when the javascript refers to 'this'. The return type should be casted to the appropriate Python type. If not, it will be a Qt QVariant.

load(self, uri, callback)

Load a webpage in the browser. It takes as parameters the URI of the page to load, and a callable object to call when the page has loaded. This callback will be given the browser object as a reference unless you set Browser.referencelessCallbacks to True.

location (property)

Browse to a new location. You probably don't want to set this directly as you'll receive no notification when the page has loaded. Have a look at Browser.load instead.

onAlert(self, s)

Set this to any callable that you want to receive alert messages. The default implementation just does nothing.

onConfirm(self, message)

Set this to any callable that you want to receive JavaScript confirm messages. It will be called passing the message. Return True or False from your callable accordingly.

onNextLoad (property)

If you're going to do something that will inadvertently cause PyKHTML to browse to a new page and you want a function to be called when the page is loaded, set onNextLoad to the function.

onPrompt(self, message, defaultText)

Set this to any callable that you want to receive JavaScript prompt messages. It will be called passing the message and the defaultText (if specified; None otherwise). Return the text you would like to be passed back to the JS interpreter or None for JavaScript null. The default implementation just returns None.

passReferenceToCallbacks (property)

Set whether callbacks passed to functions such as Browser.load or dom.Document.visit will have a reference to this browser object passed as a parameter. Default is True.

screenshot(self, fileName, callback, width=800, format=None, quality=None)

Take a screenshot of the current webpage and save it to the given file name. Once the screenshot has been taken and saved, the given callback parameter will be called. You can specify the width (the default is 800) to resize the page to. File type will be determined by extension or by the optional format parameter (one of "PNG", "BMP", "XBM", "XPM", or "JPG"). You can also specify the optional quality parameter, a value from 1-100 (leave as None for default values).

setHtml(self, source, url=None)

Set the HTML of the browser. Parses the HTML and generates the DOM tree so you can navigate it as usual. As well as the `source` parameter, a `url` parameter allows you to specify a URL with which this source code is linked so that e.g any scripts/images referenced in the HTML will be found.

class partial

Partial application of parameters. This is used internally but is also very useful with Browser.load as it allows you to pass data to other functions.
Use is as follows:

>>> def func(a, b):
... print "func:", a, b
...
>>> func2 = pykhtml.partial(func, "foo")
>>> func2("bar!")
func: foo bar!

__init__(self, func, *args, **kwargs)

Create a new functor that – when called – will call the given function, passing any extra arguments / keyword-arguments that you specify.

startEventLoop()

Starts the PyKHTML event loop. PyKHTML works with an asynchronous callback mechanism – a little like Twisted does. Calls to open a new webpage aren't synchronous, as with urllib, for example.

stopEventLoop()

Stop the event loop and hence exit the scraper.

timer(time, func)

Call the given function after the alloted time. The PyKHTML event loop needs to be checkProcessRunning.

init(display=1, registerExceptionHandler=True, _sleep=1, _stopKWallet=True, _supressQtDebug=True, _stopCookieDialogs=True)

Initiate the system if necessary (start Xvfb if it's not running, connect to it, start our program instance). This is called automatically when you create a Browser instance, so you shouldn't have to worry about it unless you want to set some of the values of the arguments. You can specify use of a certain X display by setting the `display` parameter, and can stop pykhtml registering its exception handler (the excepthook function) by setting `registerExceptionHandler` to False.

excepthook(type, value, trace)

Our exception hook that prints out the traceback, powers down the pykhtml engine, and then exits.

pathSearch(name)

Utility function to search for and get the full path of a file in $PATH.

stopEventLoopImmediately()

Stops the event loop immediately, ignoring any pending processing.