Sony Arouje

a programmer's log

A Microsoft KB Article–Example of How not to write code

leave a comment »

Couple of weeks back I was doing some parsing of xml string and want to replace the special characters in the data. I didn’t remember the list of special characters, like others I fire up a search in Google. One of the google search result leads to a Microsoft KB article. I got what I wanted, list of special character I have to deal with. The article also provided a special character replacement logic, to my surprise the code was very lengthy and looks odd to me.

I decided to scan the code in little more detail. Below is the code from the article.

Dim filepath As String = "C:\customers.xml"
Private Sub ReplaceSpecialChars(ByVal linenumber As Long)
        Dim strm As StreamReader
        Dim strline As String
        Dim strreplace As String

        Dim tempfile As String = "C:\temp.xml"
        Try
            FileCopy(filepath, tempfile)
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try

        Dim strmwriter As New StreamWriter(filepath)
        strmwriter.AutoFlush = True

        strm = New StreamReader(tempfile)


        Dim i As Long = 0
        While i < linenumber - 1
            strline = strm.ReadLine
            strmwriter.WriteLine(strline)
            i = i + 1
        End While

        strline = strm.ReadLine
        Dim lineposition As Int32
        lineposition = InStr(strline, "&")
        If lineposition > 0 Then
            strreplace = "&amp;"
        Else
            lineposition = InStr(2, strline, "<")
            If lineposition > 0 Then
                strreplace = "<"
            End If
        End If
        strline = Mid(strline, 1, lineposition - 1) + strreplace + Mid(strline, lineposition + 1)
        strmwriter.WriteLine(strline)

        strline = strm.ReadToEnd
        strmwriter.WriteLine(strline)

        strm.Close()
        strm = Nothing

        strmwriter.Flush()
        strmwriter.Close()
        strmwriter = Nothing

    End Sub

    Public Function LoadXMLDoc() As XmlDocument
        Dim xdoc As XmlDocument
        Dim lnum As Long
        Dim pos As Long
        Dim Newxml As String
        Try
            xdoc = New XmlDocument()
            xdoc.Load(filepath)
        Catch ex As XmlException
            MessageBox.Show(ex.Message)
            lnum = ex.LineNumber
            ReplaceSpecialChars(lnum)

            xdoc = LoadXMLDoc()

        End Try
        Return (xdoc)
    End Function
 

The user calls the LoadXMLDoc() function, the function loads the xml file located in the disk. The weird thing is, the function does a recursive call in the catch block until all the special characters are replaced. Have a look at the line in red.

Let’s go to the ReplaceSpecialChars() function. It creates a temp copy of the original xml file and does the character replacement with several number of IO operation. Think a scenario where we need to deal with an xml data that has some thousand line and each line has special character, we end up doing IO operation thousands of time. Wow what a logic.

Refactored Approach

I rewrote the logic, I used C# as my choice. My approach is simple, read the xml file then format it the way we need it and load it to XMLDocument. Let see my piece of code.

public XmlDocument LoadXMLDoc()
{
    XmlDocument doc = new XmlDocument();
    string xmlToLoad = this.ParseXMLFile();
    doc.LoadXml(xmlToLoad);
    return doc;
}
private string ParseXMLFile()
{
    StringBuilder formatedXML = new StringBuilder();
    StreamReader xmlReader = new StreamReader("CustomerData.xml");
    while (xmlReader.Peek() >= 0)
        formatedXML.Append(this.ReplaceSpecialChars(xmlReader.ReadLine()));
    return formatedXML.ToString();
}
private string ReplaceSpecialChars(string xmlData)
{
    int grtrPosAt = xmlData.IndexOf(">");
    int closePosAt = xmlData.IndexOf("</");
    int lenthToReplace = 0;
    if (grtrPosAt>closePosAt) return xmlData;

    lenthToReplace= (closePosAt<=0 && grtrPosAt<=0)?xmlData.Length:(closePosAt-grtrPosAt)-1;
    //get the string between xml element. e.g. <ContactName>Hanna Moos</ContactName>, 
    //you will get 'Hanna Moos'
    string data = xmlData.Substring(grtrPosAt + 1, lenthToReplace);
    string formattedData = data.Replace("&", "&amp;").Replace("<", "&lt;")
                               .Replace(">", "&gt;").Replace("'", "&apos;");
    xmlData = xmlData.Replace(data, formattedData);
    return xmlData;
}

I took reactive approach rather than waiting the XMLDocument to throw error and rectify it then try loading. ReplaceSpecialChars() function formats the xml string and return it. I spend just 10 minute to come up with this function, so may not be the best one but definitely better than what mentioned in KB article.

Point to note In my approach

  • The IO operation is only once, first time to read the xml file.
  • XMLDocument will get a clean formatted xml to do it’s operation.
  • No recursive call in catch block.
  • LOC is less, so easy to understand.

If you find any better approach, update that in the comment section.

Note: I send a feedback to Microsoft couple of weeks back about the issues in the code but no response yet.

Happy coding …..

Written by Sony Arouje

October 21, 2011 at 2:12 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: