Friday 23 March 2012

Recursive CTE (Common Table Expression)

There is sometimes a problem of wanting to remove data (email addresses in this example) from within a string which are delimited.

For example, if you want to remove all non bybox email addresses from "some.name@bybox.com; simon@hicrest.net; fred.bloggs@bybox.com"
and do this for every table, without having to create functions to break apart the string first, how are you going to do it?

Here's how:

Our example table looks like this
CREATE TABLE data_export
(
 data_export_id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
 email_address VARCHAR(255) NOT NULL
 -- ... Other fields left out for brevity
)

Insert some test data
INSERT INTO data_export (email_address)
VALUES  ('some.name@bybox.com; simon@hicrest.net; fred.bloggs@bybox.com'),
        ('neo@matrix.com; me@bybox.com'),
        ('fred@b.com; xxx@bybox.com'),
        ('fred@bbc.com'),
        ('an.other@bybox.com')

Next is the recursive CTE SQL is in several sections:
  • split_by_delimeter - Breaking apart the string by delimeters.
  • just_bybox - Keep the @bybox emaill addresses.
  • distrinct_set - The ID's we want to update.
  • STUFF - The rebuild section is the final select/update statement which containst the STUFF keyword.

;WITH   split_by_delimeter
          AS (
              SELECT    data_export_id,
                        email_address,
                        CHARINDEX(';', email_address + ';') AS n,
                        CAST('' AS VARCHAR(255)) AS result
              FROM      data_export
              UNION ALL
              SELECT    data_export_id,
                        SUBSTRING(email_address,CHARINDEX(';',email_address)+1, 255),
                        CHARINDEX(';', email_address),
                        LTRIM(RTRIM(SUBSTRING(email_address, 0,
                            CASE WHEN CHARINDEX(';', email_address) = 0
                            THEN 255
                            ELSE CHARINDEX(';', email_address)
                            END)))
              FROM      split_by_delimeter
              WHERE     n > 0
             ),
        just_bybox
          AS (
              SELECT    data_export_id,
                        result
              FROM      split_by_delimeter
              WHERE     result <> ''
                        AND result LIKE '%@bybox%'
             ),
        distinct_set
          AS (
              SELECT DISTINCT data_export_id
              FROM just_bybox
             )
    -- For checking
    SELECT  data_export_id,
            STUFF((
                   SELECT ';' + result FROM just_bybox y
                   WHERE y.data_export_id= x.data_export_id
                  FOR XML PATH('')
                  ), 1, 1, '') AS email_address
    FROM    distinct_set x
    /*
    -- For actual update
    UPDATE  data_export
    SET     email_address = STUFF((
                                   SELECT ';' + result FROM just_bybox y
                                   WHERE y.data_export_id= x.data_export_id
                                   FOR XML PATH('')
                                  ), 1, 1, '')
    FROM    distinct_set x
            JOIN data_export
                ON data_export.data_export_id = x.data_export_id
 */

This returns the following results:
data_export_idemail_address
1some.name@bybox.com;fred.bloggs@bybox.com
2me@bybox.com
3xxx@bybox.com
5an.other@bybox.com

Perfect!