Welcome to Pandora FMS Community!

Find answers, ask questions, and connect with our community around the world.

Welcome to Pandora FMS Community Forums Community support Too much errors parsing XML files

  • Too much errors parsing XML files

    Posted by daggett on December 19, 2006 at 19:07

    Hi,
    it’s been 4 days now that Pandora agents and server are running.
    They’re running fine.

    BUT, there are many, many errors marqued as _BADXML so the content of those files is not stored in the database.

    I investigated a bit and it seems that the pasrer doesn’t like very much the accentuated letters like “é” or “à” or “è” in french, they can be in text data of modules like last_syslog.
    So the server simply drop those files that are perfecly correct…
    And if the text string doesn’t change oftently, it can be hours (even days) of data lost.

    So is there any way to make it work even if there are accuentuated letters?
    I will try to change this behaviour.

    bye

    daggett replied 17 years, 9 months ago 3 Members · 11 Replies
  • 11 Replies
  • daggett

    Member
    December 19, 2006 at 19:50
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Ok, adding at the top of each XML _BADXML file, then renaming them with their original name, pandora server doesn’t complain anymore.

    I added this header in pandora_agent.sh, line 154 :
    [code:1]
    # Makes data packet
    echo “” > $DATA
    echo “” >> $DATA
    if [ “$DEBUG_MODE” == “1” ]
    then

    This corrects the problem very well, but what for other character sets ?

    or is it possible to actually generate UTF-8 encoded XML files so we won’t have to worry about which character set is installed on the machine.

    bye

  • daggett

    Member
    December 19, 2006 at 21:36
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    After a bit of testing, the fix I proposed above fixes :
    => no more _BADXML files due to character set error when parsing XML data files

    but now it generates some garbage characters because the server doesn’t seem to know that these are iso-8859-1 encoded data text, so in the pandora console I can not see the accentuated characters, but only things like “DÃ�©marrage” instead of “Démarrage”.

    So problem remains entirely.

    It may be added in the bug reports.

    bye for now

  • raul

    Member
    December 20, 2006 at 03:44
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    After a bit of testing, the fix I proposed above fixes :
    => no more _BADXML files due to character set error when parsing XML data files

    but now it generates some garbage characters because the server doesn’t seem to know that these are iso-8859-1 encoded data text, so in the pandora console I can not see the accentuated characters, but only things like “DÃ�©marrage” instead of “Démarrage”.

    So problem remains entirely.

    It may be added in the bug reports.

    bye for now

    Hi dennis, try to use “ also read http://www.w3schools.com/xml/xml_cdata.asp) EDIT: Well, this should be changed into the pandora agent config file. 😀 I haven’t tested it, but the xml parser “should” not have errors :-D) About the .data, yes we were thinking about adding the file pandora_agent.dtd: [code:1] file pandora_agent.xsl: [code:1] Pandora Agent

    Name Type Data
    If you do that, you can see data in browser, just for fun 😀 Raul

  • daggett

    Member
    December 20, 2006 at 20:00
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Hi, thanks for the hint, but I can’t get rid of all the BADXML errors.
    The agent already uses the

    it’s a big conflict between the charcodes sets, my .data files are encoded in iso-8859-1, say english. So all the accentuated characters can’t be displayed.

    I tried to iconv them to UTF-8, there are no more errors from thoses characters, but these are totally a mess: “DÃ�©marrage” instead of “Démarrage” in the Mysql db.

    The problem is that I don’t know much about XML and even less about DTD stuff, I will simply search and replace special chars before putting them in the data file.

    That will do the job for now, but it doesn’t kill the problem, that will remain uncorrected.
    I will try to work on this later then.

    thanks again, bye for now

  • daggett

    Member
    December 21, 2006 at 17:45
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Ok, I worked on XML DTD and XSL stuff, it’s quiet nice to be able to view the data files in a table.
    Good job!

    When I use your code and open the .data in firefox, all is OK, no error with charset.

    There must be an error in the process string (agent’s machine? andora agent? rsync? server’s machine? pandora server? MySQL? Console?), and I think that it comes from the parser, I saw somewhere on the Internet the the Perl method XMLin() can be configured with a specific characters set.
    Maybe we could work on characters set for the next version?

    I added the option of the character set in the pandora_agent.conf file, so it can now be configured and placed in the header in the .data XML files by the agent.

    I will send all my files to “manu” then he can make a package with all the installation, strt/stop and enhanced scripts I wrote.

    bye for now!
    Denis

  • Sancho

    Administrator
    December 22, 2006 at 04:17
    2214 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    There must be an error in the process string (agent’s machine? andora agent? rsync? server’s machine? pandora server? MySQL? Console?), and I think that it comes from the parser, I saw somewhere on the Internet the the Perl method XMLin() can be configured with a specific characters set.
    Maybe we could work on characters set for the next version?
    >

    I’ll bet for default encoding in your MySQL database. Try to make a mysqldump –no-data -u root -p pandora. Pandora Server uses libxml, this library is able to process XML encoded with different sets, we use in babel enterprise and we got some problems related with UTF and WIndows encoded texts, but it depends on MySQL encoding

    By the way seems a interesting problem… encodings are always a source of problems 🙁

  • daggett

    Member
    December 22, 2006 at 18:58
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Well, finally I added some code in pandora_agent.sh to eliminate EVERY special character before just after having captured the data that will be put in the CDATA XML fields.

    So there is no more error at all now on _one_ machine, other machine are still sending bad data (special characters are not detcted and still causing BADXML) I included this all in the packages I will send tonight, it’s easy to configure:

    then
    execution=`echo $a | cut -c 13- `
    res=`eval $execution`
    if [ -z “$flux_string” ]
    then
    res=`eval expr $res 2> /dev/null`
    fi
    # special chars removing (avoiding charset conflicts)
    OLD_IFS=$IFS # we change the field separator, as we don’t need ‘n’
    IFS=$’ ‘ # but we need a space
    SPEC_CHARS_LIST=’» « à ç é è ê ë î ï ô ö ù ü Â Ç É È Ê Ë Î Ï Ô Ö Ù Ü’
    EQUIV_CHAR_LIST='” ” a c e e e e i i o o u u A C E E E E I I O O U U’
    NB_SPEC_CHAR=1
    # we look in the list for each special char to remove
    for SPEC_CHAR in $SPEC_CHARS_LIST
    do
    EQUIV_CHAR=`echo $EQUIV_CHAR_LIST | cut -d’ ‘ -f$NB_SPEC_CHAR`
    # we convert them just below:
    res=`echo $res | sed “s/$SPEC_CHAR/$EQUIV_CHAR/g”`
    ((NB_SPEC_CHAR++))
    done
    # we then eliminate some unprintable characters
    res=`echo $res | tr -d “302” | tr -d “240”`
    IFS=$OLD_IFS # we put it back into ‘n’
    echo “” >> $DATA2
    fi

    bye for now

  • Sancho

    Administrator
    December 22, 2006 at 19:09
    2214 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    Well, finally I added some code in pandora_agent.sh to eliminate EVERY special character before just after having captured the data that will be put in the CDATA XML fields.

    So there is no more error at all now, I included this all in the packages I will send tonight, it’s easy to configure:

    then
    execution=`echo $a | cut -c 13- `
    res=`eval $execution`
    if [ -z “$flux_string” ]
    then
    res=`eval expr $res 2> /dev/null`
    fi
    # special chars removing (avoiding charset conflicts)
    OLD_IFS=$IFS # we change the field separator, as we don’t need ‘n’
    IFS=$’ ‘ # but we need a space
    SPEC_CHARS_LIST=’» « à ç é è ê ë î ï ô ö ù ü Â Ç É È Ê Ë Î Ï Ô Ö Ù Ü’
    EQUIV_CHAR_LIST='” ” a c e e e e i i o o u u A C E E E E I I O O U U’
    NB_SPEC_CHAR=1
    # we look in the list for each special char to remove
    for SPEC_CHAR in $SPEC_CHARS_LIST
    do
    EQUIV_CHAR=`echo $EQUIV_CHAR_LIST | cut -d’ ‘ -f$NB_SPEC_CHAR`
    # we convert them just below:
    res=`echo $res | sed “s/$SPEC_CHAR/$EQUIV_CHAR/g”`
    ((NB_SPEC_CHAR++))
    done
    # we then eliminate some unprintable characters
    res=`echo $res | tr -d “302” | tr -d “240”`
    IFS=$OLD_IFS # we put it back into ‘n’
    echo “” >> $DATA2
    fi

    bye for now

    WHOW!, this is fantastic. We’re preparing an upgrade for nexts agents, and we SURE include this cleaning code for next version. Thanks for sumitting it.

    I suppose that “problems” in this code are because HTML rendering, so please send to marostegui asap code in a tarball to test and integrate on development code.

  • daggett

    Member
    December 22, 2006 at 20:27
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    O, erh.. I just install the new version on an other server, and it began to generate BADXML… because of some accentuated chars….

    Well it seems that the “solution” I found, works just for _one_ particular configuration.
    Sorry about that, and now it’s vacations!! So I won’t be working on Pandora until January 3rd 2007 now.

    But after that, it’s sure that we will need to work hard on this feature, as we will loose hours and hours, even days of monitoring data because of that charset problem.

    I sent the files to Manu (not the console, as I only changed the limitation of 12 chars to infinite when displaying data_strings).

    have a merry christmas and happy new year! 😀
    bye

  • Sancho

    Administrator
    January 3, 2007 at 03:30
    2214 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    O, erh.. I just install the new version on an other server, and it began to generate BADXML… because of some accentuated chars….

    Well it seems that the “solution” I found, works just for _one_ particular configuration.
    Sorry about that, and now it’s vacations!! So I won’t be working on Pandora until January 3rd 2007 now.

    But after that, it’s sure that we will need to work hard on this feature, as we will loose hours and hours, even days of monitoring data because of that charset problem.

    I sent the files to Manu (not the console, as I only changed the limitation of 12 chars to infinite when displaying data_strings).

    have a merry christmas and happy new year! 😀
    bye

    We have the same problem in Babel (I entered as bug id #1626237) I’m working on a simple workaround based on detection of local codepage, adding the XML header with correct codetable, and an option to override for systems where correct detection will not possible. Solution/Patch/New version will be simple and possible get ready for a a 1.2.1 security update in the next days.

    Thanks for giving more info about this.

  • daggett

    Member
    January 3, 2007 at 13:49
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Hi all!
    Happy new year!

    Thank you for porting attention to this! Pandora agents have run during the antire vacations, and I got A LOT of BADXML data files because of charset issue… so I lost A LOT of data, and the Pandora server is just ignoring the BADXML files pretending there was no contact at all with the agent (but in fact this is just a bad encoded XML file).

    So thank you very much for helping fixing this up!

    bye