Monday, 16 January 2012

Excluding nodes from XML data before returning from SQL Server

This post follows on from a question I recently replied to, for how to exclude a specific node from an XML column value before returning it, using TSQL.

Example setup

CREATE TABLE Example
(
ID INTEGER IDENTITY(1,1) PRIMARY KEY,
XmlField XML
)

INSERT Example (XmlField) VALUES ('<Root><ChildA>Value I want to see</ChildA><ChildB>Value I do not want to see</ChildB></Root>')
So if you want to return the XML minus the ChildB node, how do you do it?

Modify it

Literally, using the modify method that is supported on the XML data type. To quote MSDN, modify:
Modifies the contents of an XML document. Use this method to modify the content of an xml type variable or column. This method takes an XML DML statement to insert, update, or delete nodes from XML data. The modify() method of the xml data type can only be used in the SET clause of an UPDATE statement.
That last statement, to me at least, seems slightly open for misinterpretation as strictly speaking it isn't only valid in an UPDATE statement - you can use it as part of a standalone SET statement on an XML variable (though yes, to UPDATE that variable).

So, while you can't do something like this:
SELECT XmlField.modify('delete /Root/ChildB')
FROM Example
WHERE ID = 1
as it results in the following error:
Msg 8137, Level 16, State 1, Line 1
Incorrect use of the XML data type method 'modify'. A non-mutator method is expected in this context.
you can instead do this:
DECLARE @xml XML
SELECT @xml = XmlField
FROM Example
WHERE ID = 1

-- Delete the node from the XML in the variable
SET @xml.modify('delete /Root/ChildB')

SELECT @xml
That does the trick.

Dealing with multiple rows

If you wanted to do something like this from the database side when returning multiple rows then you'd need to either:
  1. create a scalar UDF to perform the removal. e.g.
    CREATE FUNCTION dbo.RemoveNode(@xml XML)
     RETURNS XML
    AS
    BEGIN
     SET @xml.modify('delete /Root/ChildB')
     RETURN @xml
    END
    GO
    
    SELECT dbo.RemoveNode(XmlField) AS XmlField
    FROM Example
    
    Note the parameter to the modify method must be a literal string, so you can't parameterise the UDF to make it a bit more generic by using that parameter in the call to modify().

  2. SELECT the xml values into a temp table/table variable, do an UPDATE like this:
    UPDATE #Temp
    SET XmlField.modify('delete /Root/ChildB')
    
    and then SELECT from the temp table/table variable.

  3. Use SQLCLR
Of course, the typical question should be asked - do you need to do this on the database side, or can you return the XML as-is from SQL Server and modify it accordingly in your calling code? In the original case acting on a single XML value, it's OK but when you start looking into scalar UDF's etc for multi-row requirements then you should be careful to check performance.

2 comments:

  1. Good post. What would be wrong with updating the Example table directly, rather than using an XML variable?

    Like this:

    UPDATE Example
    SET XmlField.modify('delete /Root/ChildB')
    WHERE ID = 1

    ReplyDelete
    Replies
    1. Well you would be removing data permanently from your source table. So this would only be OK if you didn't need to store that particular node at all (in which case, the question is why store it at all in the first place).

      Delete