Challenging Date Spans Problem:

A health insurance plan stores information about covered members in the following table:

CREATE TABLE enrollment
(
ssn CHAR(9) NOT NULL,
startdate DATETIME NOT NULL,
enddate DATETIME NOT NULL,
policy INTEGER NOT NULL,
CONSTRAINT pk_enrollment PRIMARY KEY CLUSTERED (ssn, startdate) ,
CONSTRAINT ck_noreversedate CHECK (enddate>=startdate)
)

INSERT enrollment
VALUES (‘111111111’, ‘2000-01-01’, ‘2000-04-30’, 1)

INSERT enrollment
VALUES (‘111111111’, ‘2000-06-01’, ‘2001-12-31’, 1)

INSERT enrollment
VALUES (‘222222222’, ‘2000-01-01’, ‘2000-06-30’, 1)

INSERT enrollment
VALUES (‘222222222’, ‘2000-07-01’, ‘2001-12-31’, 2)

INSERT enrollment
VALUES (‘333333333’, ‘2000-01-01’, ‘2000-06-30’, 1)

INSERT enrollment
VALUES (‘333333333’, ‘2000-07-01’, ‘2001-12-31’, 1)

INSERT enrollment
VALUES (‘444444444’, ‘2000-01-01’, ‘2000-06-30’, 1)

INSERT enrollment
VALUES (‘444444444’, ‘2000-07-01’, ‘2000-11-30’, 1)

INSERT enrollment
VALUES (‘444444444’, ‘2001-03-01’, ‘2001-06-30’, 1)

INSERT enrollment
VALUES (‘444444444’, ‘2001-07-01’, ‘2001-12-31’, 1)

Assume that we can enforce a rule that there are no overlapping enrollments; i.e., a member can only have one insurance policy at a time.

Members can have multiple enrollments. These enrollments can be “adjacent” (e.g, member 222-22-2222 has one enrollment which terminates on 06/30/2000 and another that begins on 07/01/2000) or there can be gaps (e.g., member 111-11-1111 has one enrollment which terminates on 04/30/2000, the subsequent enrollment begins 06/01/2000).

Our task is to write a SELECT statement that will coalesce all “adjacent” enrollments where the policy is the same. E.g., member 333-33-3333 has two records in the source table, these records would be combined into a single enrollment with start date = 01/01/2000 and end date = 12/31/2001.


SELECT e.ssn, e.policy, MIN(e.startdate) AS startdate, MAX(e.enddate) AS enddate FROM enrollment e GROUP BY e.ssn, e.policy WHERE EXISTS (SELECT * FROM enrollment e2 WHERE e.ssn = e2.ssn AND e.policy = e2.policy AND e.startdate = DATEADD(y,1,e2.enddate)) OR EXISTS (SELECT * FROM enrollment e3 WHERE e.ssn = e3.ssn AND e.policy = e3.policy AND e.enddate = DATEADD(y,-1,e3.startdate))

UNION

SELECT e4.ssn, e4.policy, e4.startdate AS startdate, e4.enddate FROM enrollment e4 WHERE NOT EXISTS (SELECT * FROM enrollment e5 WHERE e4.ssn = e5.ssn AND e4.policy = e5.policy AND e4.startdate = DATEADD(y,1,e5.enddate)) AND NOT EXISTS (SELECT * FROM enrollment e6 WHERE e4.ssn = e6.ssn AND e4.policy = e6.policy AND e4.enddate = DATEADD(y,-1,e6.startdate))

This works for member 333-33-3333 who has only one set of “adjacent” enrollments.

However it fails for member 444-44-4444 who has two sets of “adjacent” enrollments. Can anyone suggest an approach that would return two rows for 444-44-4444, one with start date 01/01/2000 and end date 11/30/2000 and another with start date 03/01/2001 and end date 12/31/2001?