Now for the Master course "Data Mining and Data Warehousing" we really have to get in touch with C++. The task of my group is to develop a spectral clustering algorithm and to integrate it into a system called XVDM (developed by the DIS Centre at the Free University of Bolzano). We already presented the first prototype where the different clusters are just printed on the command line. Now after starting with the real integration into the big system (XVDM) we faced quite a lot of difficulties and then you really get a feeling of all this theoretical stuff. So for instance the integration seemed to work well on my collegues Debian installation (running inside a virtual machine on top of windows) but the SAME code produced a segmentation fault on my Ubuntu machine (installed natively).
Using "valgrind" helped to identify the problem (thanks to Prof. Arturas Mazeika for this hint). It allows you to start your program similar as the following
valgrind ./SpectralClustering < ./data.csvwhere "data.csv" contains the data to be clustered which is passed via redirection to our program. Valgrind then outputs the memory leaks and what may even be more important, the un-initialized - but used - variables which actually was the problem in our case. It turned out that our module integrated into the XVDM system did not initalize it in the right order meaning we had something like
init glut(this is just some sort of pseudocode). The problem was that the addAlgorithm(...) method made use of variables which got instantiated inside the initOpenGL() call which then of course produced a segmentation fault. That seems to be clear, but how was the program then able to run on my colleagues machine, making use of a not yet instantiated variable?? I don't know..
create cluster objects
create DBScenery object
Anyway, the clustering integration works now as you can see :)