Linux-Script To Find Not Scraped Movies

This script offers linux users a convenient way to check which media items were not yet added to the library. It scans the library (video only at the moment), creates lists with a) the paths that are used by XBMC, b) all files that are already added to the library and c) a list of existing movie files in the given directories. The lists b) and c) get compared and the difference between them is the list of files that are not in the library. More or less.

Due to the nature of gathering the filelists it is necessary that the script is run on the computer where XBMC is installed to. Check this website or the internet for a way to get access to a command line on the XBMC-machine if you don't know how to do it. Useful keywords: bash, ssh, telnet, shell

In future versions of this script there might be a possibility to embed this script in XBMC's user interface and call it directly from e.g. the script-menu.

Warning
The script does not change the database at all! It only queries the database and doesn't modify anything in it. The only files that are written are the ones described below. Nevertheless there might be conditions under which the script will not behave as intended. If so, it might do anything! Therefore please check carefully all settings (most of them are described below) and make sure that all needed programs are installed and working. As the outputfiles are first deleted and then written again, you should also carefully check the path and filenames of the outputfiles. If the mentioned files already do exist and don't "belong" to this script, then there are chances that they will be deleted and the content will be lost.

Again: check all settings carefully!

Furthermore, this script is work in progress. I'll try to improve it and will also try to include some more specific database queries. If you have any suggestions, let me know. Also do so if you experience unwanted behaviour or don't understand the results of the script.

Settings
Within the settings there are the following variables that you are urged to look at and change to values that fit your environment:


 * DBPATH points to the video-database (the file itself, not just the directory)
 * PREFIX will set a prefix that is used for all files that are created during the runtime of the script. It may include an absolute or relative path and a prefix of the filename. A good idea for this setting is something like the homefolder and a file-prefix or a temporary folder. Any given folder must already exist.

All needed programs should usually be installed already. If not, you need to install the corresponding packages and adjust the entry for the command, if needed. Most likely the command sqlite3 might not be available. The script might also work with older versions of the sqlite commandline tool but don't rely on this. The database that XBMC creates and uses has version 3. A short test with sqlite 2.817 resulted in an error while reading the database. So only change this value if the actual sqlite has an appropriate version.

Limitations
This script does not check whether any given entry in the library contains useful information. There might be entries that only contain the filename and a bookmark for example. These entries should also be considered as "missing in the library" but it's quit difficult to distinguish between items that were scraped but without getting useful information and items that didn't get any result at all and that are for a different reason ion the library. For conditions under which an entry is added to the library, consult the documentation.

Results
As result of this script there will be three files (each preceeded with a prefix, if set):


 * db-only.lst

This file contains entries that are in the library only but not in the filesystem. If you want to get rid of them, usually a library clean-up (within XBMC) should do the job


 * fs-only.lst

This is the most relevant file, as the files listed in it don't exist in the library but exist in the file system. There may be different reasons why they are not (yet) scraped. But at least now you know which files your library is missing


 * db-stacked.lst

This file only shows up, which entries in the library are stacked entries, i.e. entries that are considered a single media item but consisting of multiple files.

There are four more files that are written while the runtime of the script and deleted afterwards. They are also affected by the PREFIX-setting.

Execution
All you need to do is copy the code to an empty textfile, adjust the settings to your needs, save this file and run it with the following command

sh

You might also set the executable-bit to the file and run it directly, just as you like. chmod +x

Alternatives And Supplementals
Alternatively you might want to look at the logfile of the last XBMC run(s). There is some info contained which might be useful for detecting unsuccessfully scraped items. But it is not obvious which entries are relevant. At least, I didn't find much useful information. But if you know what you are looking for and want to get deeper into the resaons why a special item isn't in the library, the logfile will help.

Additionally there are several tools to work on the database itself which might help you in improving the stored information. You might have a look at the "XBMC Web Media Manager": http://forum.xbmc.org/showthread.php?t=60643 or scroll down the subforum with "Supplemental Tools": http://forum.xbmc.org/forumdisplay.php?f=116

Sourcecode

 * 1) !/bin/bash
 * 2) XBMC Orphans and Widows
 * 3) v1.1
 * 4) created by BaerMan for XBMC-community
 * 5) includes improvements from deathinator
 * 6) This script may be used for any purposes.
 * 7) You may change, sell, print or even sing it
 * 8) but you have to use it at your own risk!
 * 9) This script is ugly and may under certain circumstances crash your
 * 10) computer, kill your cat and/or drink your beer.
 * 11) Use it at your own risk!
 * 12) This script searches for media files (actually video files only) and
 * 13) checkes for
 * 14) 1) files that are not in the library
 * 15) 2) files that are in library only
 * 16) 3) entries in the library that are 'stacked' ones
 * 1) checkes for
 * 2) 1) files that are not in the library
 * 3) 2) files that are in library only
 * 4) 3) entries in the library that are 'stacked' ones


 * 1) To Do:
 * 2) * examine wether a path is marked as defined content or excluded from
 * 3)   scanning (strContent=None)
 * 4) * rewrite the whole thing in python
 * 1) * rewrite the whole thing in python


 * 1) Discussion and latest version:
 * 2) http://forum.xbmc.org/showthread.php?t=62058
 * 3) http://wiki.xbmc.org/?title=Linux-Script_To_Find_Not_Scraped_Movies


 * 1) Settings ###
 * 1) Settings ###

DBPATH="/home/xbmc/.xbmc/userdata/Database/MyVideos34.db"
 * 1) Full path to the video-database ; may be absolute (preceeded by a
 * 2) slash "/") or relative form the current directory

PREFIX="/home/xbmc/xbmc_" DBPATHLIST="${PREFIX}db_path.lst" DBFILESLIST="${PREFIX}db_files.lst" FINDLIST="${PREFIX}find.lst" DIFFLIST="${PREFIX}diff.lst" DBONLYLIST="${PREFIX}db-only.lst" FSONLYLIST="${PREFIX}fs-only.lst" STACKEDLIST="${PREFIX}db-stacked.lst"
 * 1) Filenames for results and intermediate data
 * 2) You may change these to any name and place you like but beware not to
 * 3) overwrite or delete files you may still need

SQLITECMD="sqlite3" ; FINDCMD="find" ; SORTCMD="sort" GREPCMD="grep" ; RMCMD="rm" ; UNIQCMD="uniq" DIFFCMD="diff -a -b -B -U 0 -d"
 * 1) Programs used ; either absolute path or command only if path to the
 * 2) binary is in variable $PATH ; each command may be extended by optional
 * 3) arguments - refer to the specific manpage for details


 * 1) Changes within the working code ###
 * 1) Changes within the working code ###


 * 1) There is a list of suffixes, that we will search for. You may add,
 * 2) delete or modify any entry to fit your needs, but respect the
 * 3) correct escaping of newlines


 * 1) We don't want to descent into subdirectories as they are usually
 * 2) represented by their own path-entry in the database. Deep scans would
 * 3) lead to multiple hits on the same file. But if for some reason not all
 * 4) path elements are represented in the database, you may find and delete
 * 5) the following string and force $FINDCMD to look into all subdirectories
 * 6) in any given path
 * 7) "-maxdepth 1"

${RMCMD} ${DBPATHLIST} ${DBFILESLIST} ${FINDLIST} ${DIFFLIST} ${STACKEDLIST} ${FSONLYLIST} ${DBONLYLIST} 2>/dev/null ${SQLITECMD} -list -separator '' ${DBPATH} \ "select strPath from path order by strPath;" \ | ${SORTCMD} > ${DBPATHLIST} ${SQLITECMD} -list -separator '' ${DBPATH} \ "select strPath, strFilename from path, files where path.idPath = files.idPath order by strPath, strFilename;" \ | ${SORTCMD} > ${DBFILESLIST} IFS=' ' for fPATH in $(<${DBPATHLIST}) ; do   ${FINDCMD} ${fPATH} -maxdepth 1 \( \ -name '*.avi' -o \ -name '*.divx' -o \ -name '*.iso' -o \ -name '*.m2v' -o \ -name '*.mkv' -o \ -name '*.mp4' -o \ -name '*.mpeg' -o \ -name '*.mpg' -o \ -name '*.ogm' -o \ -name '*.vob' \    \) | ${SORTCMD} >> ${FINDLIST} done unset IFS ${DIFFCMD} ${FINDLIST} ${DBFILESLIST} | ${GREPCMD} -v "^@@" | ${GREPCMD} -v [+-]\\{3\\} | ${SORTCMD} -k 1.2 | ${UNIQCMD} -s 1 > ${DIFFLIST} ${GREPCMD} ^+ < ${DIFFLIST} | ${GREPCMD} -v '://' | ${GREPCMD} -v '^+/$' > ${DBONLYLIST} ${GREPCMD} ^- < ${DIFFLIST} > ${FSONLYLIST} ${GREPCMD} "stack:///" < ${DIFFLIST} > ${STACKEDLIST} ${RMCMD} ${DBPATHLIST} ${DBFILESLIST} ${FINDLIST} ${DIFFLIST} 2>/dev/null
 * 1) working code ###
 * 1) working code ###

Discussion
Discussion about this script is in the Forum: http://forum.xbmc.org/showthread.php?t=62058

Changelog
2009-12-17 v1.1

includes improvements contributed by deathinator

added mediatypes .iso and .vob

stabilized (hopefully) variables by quoting them

generalized the matching string for exclusion of special paths

2009-11-22 v1

initial release