Add-on:Parsedom for xbmc plugins
Parsedom for xbmc plugins. | ||||||||||
|
Add-ons | Parsedom for xbmc plugins. |
Testing/Status
Integration and unittests are run continously by Jenkins
http://tc.tobiasussing.dk/jenkins/view/Common%20Functions/
Developers
This DOM parser is a fast replacement for Beautiful Soup.
And also a few other useful functions.
Setup
To use the parsedom functions edit your addon.xml like this.
<requires> <import addon="xbmc.python" version="2.0"/> <import addon="script.module.parsedom" version="0.9.0"/> # Add this </requires>
And add the following to your py file.
import xbmc, xbmcvfs, xbmcaddon import CommonFunctions common = CommonFunctions.CommonFunctions() common.plugin = plugin
Debugging
To enable debugging set the following values in default.py
common.dbg = True # Default common.dbglevel = 3 # Default
Whenever you debug your own code you should also debug in the cache. Otherwise you should remember to DISABLE it.
parseDOM(self, html, name = "", attrs = {}, ret = False)
- html(string or list) - String to parse, or list of strings to parse.
- name(string) - Element to match ( for instance "span" )
- attrs(dictionary) - Dictionary with attributes you want matched in the elment ( for instance { "id": "span3", "class": "oneclass.*anotherclass", "attribute": "a random tag" } )
- ret(string or False) - Attribute in element to return value of. If not set(or False), returns content of DOM element.
returns list
Getting element content.
link_html = "<a href='bla.html'>Link Test</a>" ret = common.parseDOM(link_html, "a") print repr(ret) # Prints ['Link Test']
Getting an element attribute.
link_html = "<a href='bla.html'>Link Test</a>" ret = common.parseDOM(link_html, "a", ret = "href") print repr(ret) # Prints ['bla.html']
Get element with matching attribute.
link_html = "<a href='bla1.html' id='link1'>Link Test1</a><a href='bla2.html' id='link2'>Link Test2</a><a href='bla3.html' id='link3'>Link Test3</a>" ret1 = common.parseDOM(link_html, "a", attrs = { "id": "link1" }, ret = "href") ret2 = common.parseDOM(link_html, "a", attrs = { "id": "link2" }) ret3 = common.parseDOM(link_html, "a", attrs = { "id": "link3" }, ret = "id") print repr(ret1) # Prints ['bla1.html'] print repr(ret2) # Prints ['Link Test2'] print repr(ret3) # Prints ['link3']
When scraping sites it is prudent to scrape in steps, since real websites are often complicated.
Take this example where you want to get all the user uploads.
<div id="content"> <div id="sidebar"> <div id="latest"> <a href="/video?8wxOVn99FTE">Miley Cyrus - When I Look At You</a>>br /< <a href="/video?46">Puppet theater</a><br /> <a href="/video?98">VBLOG #42</a><br /> <a href="/video?11">Fourth upload</a><br /> </div> </div> <div id="user"> <div id="uploads"> <a href="/video?12">First upload</a><br /> <a href="/video?23">Second upload</a><br /> <a href="/video?34">Third upload</a><br /> <a href="/video?41">Fourth upload</a><br /> </div> </div> </div>
The first step is to limit your search to the correct area.
One should always find the inner most DOM element that contains the needed data.
ret = common.parseDOM(html, "div", attrs = { "id": "uploads" })
The variable ret now contains
['<a href="/video?12">First upload</a><br /> <a href="/video?23">Second upload</a><br /> <a href="/video?34">Third upload</a><br /> <a href="/video?41">Fourth upload</a><br />']
And now we get the video url.
videos = common.parseDOM(ret, "a", ret = "href") print repr(videos) # Prints [ "video?12", "video?23", "video?34", "video?41" ]
Getting an element attribute.
link_html = "<a href='bla.html'>Link Test</a>" ret = common.parseDOM(link_html, "a", ret = "href") print repr(ret) # Prints ['bla.html']
Get element with matching attribute.
link_html = "<a href='bla1.html' id='link1'>Link Test1</a><a href='bla2.html' id='link2'>Link Test2</a><a href='bla3.html' id='link3'>Link Test3</a>" ret1 = common.parseDOM(link_html, "a", attrs = { "id": "link1" }, ret = "href") ret2 = common.parseDOM(link_html, "a", attrs = { "id": "link2" }) ret3 = common.parseDOM(link_html, "a", attrs = { "id": "link3
_fetchPage(self, dict)
WARNING: This call will be changed from "_fetchPage" to "fetchPage" in the future.
Fetches a page from the internet.
returns tupple of ( dict, int).
An int of 200 indicates success, and 500 error.
The dict returned contains
- content: HTML content
- new_url: Redirect url
- header: Header information
result = common._fetchPage({"url": "http://www.example.com/index.html"}) if result["status"] == 200: print "content: %s" %result["content"] result = common._fetchPage({"url": "http://www.example.com/doesnotexist.html"}) if result["status"] == 500: print "redirect url: %s" %result["new_url"] print "header: %s" %result["header"] print "content: %s" %result["content"]
log(self, str, level = 0)
Sends the string to the xbmc.log function if the level provided is less than the level set in dbglevel.
Returns None
import CommonFunctions common = CommonFunctions.CommonFunctions() common.plugin = "PluginName" common.dbg = True common.dbglevel = 3
def helloWorld(): common.log("Ran this") common.log("Ignored this", 4) common.log("Ran this as well", 2) helloWorld()
Will give the following output in the xbmc.log
[PluginName] helloWorld : 'Ran This' [PluginName] helloWorld : 'Ran This as well'
openFile(self, filepath, options = "w")
Opens a binary or text file handle
Returns filehandle
file = common.openFile("myfile.txt", "wb") file.write("my data") file.close()
This function raises a keyboard for user input
Returns string
search = common.getUserInput("Artist", "") # Will ask the user to write an Artist to search for def_search = common.getUserInput("Artist", "Miley Cyrus") # Will default to Miley Cyrus if the user doesn't enter another artist.
Warning: This will return string in the future.
This function raises a keyboard numpad for user input
Returns int
pin = common.getUserInput("Userpin", "") # Will ask the user to write a pin. def_pin = common.getUserInput("Userpin", "1234") # Will default to 1234 if the user doesn't enter another pin.
getParameters(self, dict)
Converts the request url passed on by xbmc to the plugin into a dict of key-value pairs
returns dict
params = common.getParameters(sys.argv[2]) # sys.argv[2] would be something like "?path=/root/favorites&login=true" print repr(params) # Prints '{ "path": "/root/favorites", "login": "true" }'
replaceHtmlCodes(self, str)
WARNING: This call will be changed from "replaceHtmlCodes" to "replaceHTMLCodes" in the future.
Replaces html codes with ascii.
Returns string
clean_string = common.replaceHtmlCodes("&"…><'") print clean_string # Prints &"...><'
stripTags(self, str)
Removes all DOM elements.
Returns string
clean_string = common.stripTags("I want this text <img src= alt='without this'>") print clean_string # Prints "I want this text"
makeAscii(self, str)
This function implements a horrible hack related to python 2.4's terrible unicode handling.
Returns string
clean_string = common.makeAscii("test נלה מהי test") print clean_string # Prints "test test"