A Microsoft KB Article–Example of How not to write code
Couple of weeks back I was doing some parsing of xml string and want to replace the special characters in the data. I didn’t remember the list of special characters, like others I fire up a search in Google. One of the google search result leads to a Microsoft KB article. I got what I wanted, list of special character I have to deal with. The article also provided a special character replacement logic, to my surprise the code was very lengthy and looks odd to me.
I decided to scan the code in little more detail. Below is the code from the article.
Dim filepath As String = "C:\customers.xml" Private Sub ReplaceSpecialChars(ByVal linenumber As Long) Dim strm As StreamReader Dim strline As String Dim strreplace As String Dim tempfile As String = "C:\temp.xml" Try FileCopy(filepath, tempfile) Catch ex As Exception MessageBox.Show(ex.Message) End Try Dim strmwriter As New StreamWriter(filepath) strmwriter.AutoFlush = True strm = New StreamReader(tempfile) Dim i As Long = 0 While i < linenumber - 1 strline = strm.ReadLine strmwriter.WriteLine(strline) i = i + 1 End While strline = strm.ReadLine Dim lineposition As Int32 lineposition = InStr(strline, "&") If lineposition > 0 Then strreplace = "&" Else lineposition = InStr(2, strline, "<") If lineposition > 0 Then strreplace = "<" End If End If strline = Mid(strline, 1, lineposition - 1) + strreplace + Mid(strline, lineposition + 1) strmwriter.WriteLine(strline) strline = strm.ReadToEnd strmwriter.WriteLine(strline) strm.Close() strm = Nothing strmwriter.Flush() strmwriter.Close() strmwriter = Nothing End Sub Public Function LoadXMLDoc() As XmlDocument Dim xdoc As XmlDocument Dim lnum As Long Dim pos As Long Dim Newxml As String Try xdoc = New XmlDocument() xdoc.Load(filepath) Catch ex As XmlException MessageBox.Show(ex.Message) lnum = ex.LineNumber ReplaceSpecialChars(lnum) xdoc = LoadXMLDoc() End Try Return (xdoc) End Function
The user calls the LoadXMLDoc() function, the function loads the xml file located in the disk. The weird thing is, the function does a recursive call in the catch block until all the special characters are replaced. Have a look at the line in red.
Let’s go to the ReplaceSpecialChars() function. It creates a temp copy of the original xml file and does the character replacement with several number of IO operation. Think a scenario where we need to deal with an xml data that has some thousand line and each line has special character, we end up doing IO operation thousands of time. Wow what a logic.
Refactored Approach
I rewrote the logic, I used C# as my choice. My approach is simple, read the xml file then format it the way we need it and load it to XMLDocument. Let see my piece of code.
public XmlDocument LoadXMLDoc() { XmlDocument doc = new XmlDocument(); string xmlToLoad = this.ParseXMLFile(); doc.LoadXml(xmlToLoad); return doc; } private string ParseXMLFile() { StringBuilder formatedXML = new StringBuilder(); StreamReader xmlReader = new StreamReader("CustomerData.xml"); while (xmlReader.Peek() >= 0) formatedXML.Append(this.ReplaceSpecialChars(xmlReader.ReadLine())); return formatedXML.ToString(); } private string ReplaceSpecialChars(string xmlData) { int grtrPosAt = xmlData.IndexOf(">"); int closePosAt = xmlData.IndexOf("</"); int lenthToReplace = 0; if (grtrPosAt>closePosAt) return xmlData; lenthToReplace= (closePosAt<=0 && grtrPosAt<=0)?xmlData.Length:(closePosAt-grtrPosAt)-1; //get the string between xml element. e.g. <ContactName>Hanna Moos</ContactName>,
//you will get 'Hanna Moos' string data = xmlData.Substring(grtrPosAt + 1, lenthToReplace); string formattedData = data.Replace("&", "&").Replace("<", "<")
.Replace(">", ">").Replace("'", "'"); xmlData = xmlData.Replace(data, formattedData); return xmlData; }
I took reactive approach rather than waiting the XMLDocument to throw error and rectify it then try loading. ReplaceSpecialChars() function formats the xml string and return it. I spend just 10 minute to come up with this function, so may not be the best one but definitely better than what mentioned in KB article.
Point to note In my approach
- The IO operation is only once, first time to read the xml file.
- XMLDocument will get a clean formatted xml to do it’s operation.
- No recursive call in catch block.
- LOC is less, so easy to understand.
If you find any better approach, update that in the comment section.
Note: I send a feedback to Microsoft couple of weeks back about the issues in the code but no response yet.
Happy coding …..
Leave a Reply