Watchin' you watch

Published in Thu 10 March 2016 in Tech

#datenschutz #iptv #scapy #sniffing

First off: when I use the word sniffing in this article I mean plugging in a cable and start wireshark on my own machine. The packets the network distributes to IPTV receivers are multicasted and reach every device per default. This does not need any man-in-the-middle hardware nor any software routings.

Intro
What's in the packets?
Dumping the traffic
Analysing collected data
Going further

Intro

All began when I was upgrading this network. I plugged in some new switches and sorted out old cables. Testing went good so far. But sometimes the lights on the switches just went off like crazy and some connections gained latency. Even inside the network!

As I was investigating it dawned me: the owner just recently installed a triple IP package sold by Telekom. It's the new generation IP-everything package with phone, internet and television - all routed over one shared connection.

The cause of the flood were IPTV receivers which share the LAN with all other devices in the house.

What's in the packets?

The packets clogging the network are UDP packets coming from a Telekom server and going to a multicast address in the 239.0.0.0/8 range. Every tv channel has its own address in this range. You can find a full list of channels with their address at grinch.itg-em.de.

To parse the Playlist für VLC (PLS) (downloaded as senderlist.txt) and save it as a json, you can use this parsing script.

#!/usr/bin/env python3

from configparser import SafeConfigParser
import json
import re

p = SafeConfigParser()
p.read('senderlist.txt')

pl = p['playlist']
sender_dict = {}

identifier = 1  # For later; replace the IP with a smaller, but still unique, id
for key in pl:
    if key.startswith('file'):
        no = key[4:]
        address = pl[key]
        sender = pl['Title'+no]

        m = re.match('\(\d*\) (.*)', sender)
        sender = m.group(1)

        m = re.match('rtp://@([0-9\.]*):10000', address)
        address = m.group(1)

        sender_dict[address] = (identifier, sender)

        identifier += 1

with open('senderlist.json', 'w') as fh:
    json.dump(sender_dict, fh)

Dumping the traffic

Since I have a non-IPTV device in the network as well, I can log all udp packets on port 10000 and see what any IPTV device currently is subscribed to.

At first I built a script which just captured all traffic from the network. You can do this for example with tcpdump -i eth0 'udp and port 10000' -w iptv.pcap. This file grows extremely large and contains mostly useless redundant data. We can tweak the file size by limiting the captured packet length at 34 bytes, which ends after the IP header with the destination address: tcpdump -s 34 -i eth0 'udp and port 10000' -w iptv.pcap. But even this file is not free of uninteresting entries. So I switched to python.

With this python script (which should run unsupervised) our sniff client first takes a snapshot of some packets. Secondly it parses the raw packets and dumps the time and channel into a CSV. To access the pcap interface I use the Scapy library.

Running the following script also prints a list of currently distributed channels in the network.

#!/usr/bin/env python2

"""IPTV-sniffer
If you have an IPTV provider and share the network with your receiver devices
chances are your network gets flooded with multicast UDP packages.

This tool is for demonstation purposes to show people how easy it is to
capture what they watch on television.
"""
import csv
from datetime import datetime
import json
import os
from time import sleep
from scapy.all import sniff, IP

# You need senderlist.txt and the parser to load the senderlist json
senderlist = {}
if os.path.exists('senderlist.json'):
    with open('senderlist.json', 'r') as fh:
        senderlist = json.load(fh)

collected_data = []  # (Datetime, [..Destinations..]) tuples

while True:
    try:
        now = datetime.now()
        timestamp_text = now.strftime('%Y-%m-%d %H:%M.%S')

        # Take snapshot with max 10 packets of udp traffic on port 10000
        # Times out after 1 second (if there are no packets)
        stats = sniff(iface='eth0', filter='udp and port 10000',
            count=10, store=True, timeout=1)

        # If no packets were logged, skip
        if len(stats) == 0:
            print('No packets captured at {}'.format(timestamp_text))
            continue

        targets = []  # Multiple iptv devices may use the same network
        for packet in stats:
            dest = packet[IP].dst
            if dest not in targets:
                sender = 'unknown'
                if dest in senderlist:
                    sender = senderlist[dest][1]

                print('{}: {}'.format(timestamp_text, sender))
                targets.append(dest)

        collected_data.append((now, targets))

        # Append to csv file
        with open('sniffed.csv', 'a') as fh:
            csvwriter = csv.writer(fh)
            for target in targets:  # if multiple channels at the same time
                identifier = 0
                if target in senderlist:
                    identifier = senderlist[target][0]
                csvwriter.writerow([now.strftime('%s'), identifier])

        # Sleep for a period of time.
        # When capturing with '10 packets and 1 second timeout' it logs multiple
        # times per second. If the capture has no packet limit but a timeout,
        # the 'stats' set grows really big.
        # So 'sleep' seems to be the most ressource friendly solution.
        sleep(5)
    except KeyboardInterrupt:
        # KeyboardInterrupt was needed before 'sleep' was introduced because
        # the 'while True' loop did not recognize single keyboard commands
        print(collected_data)
        break

This way most of the read/write load to disk (or SD card) is reduced greatly. That's a bonus if you want to try this on a Raspberry Pi in your network.

Analysing collected data

Now that we have a CSV file with periodically logged IPTV subscriptions, time to analyze this data.
This script uses both the senderlist.json and the sniffed.csv. The output shows a list of timespans with the channel watched in this timespan.

#!/usr/bin/env python3

from collections import namedtuple
import csv
from datetime import datetime
import json


# First: get a dict of identifiers with their channel name
senderlist = {}
with open('senderlist.json') as fh:
    senderlist = json.load(fh, parse_int=str)
sender_identifiers = {}
for v in senderlist.values():
    sender_identifiers[v[0]] = v[1]


data = []
Slice = namedtuple('Slice', 'start end channelid')
with open('sniffed.csv', 'r') as fh:
    csvreader = csv.reader(fh)

    first = None  # begin of slice
    last = None  # end of slice
    channel = None  # channel id

    for row in csvreader:
        dt = datetime.fromtimestamp(float(row[0]))
        chan = row[1]

        # Initialize values
        if not first:
            first = dt
        if not channel:
            channel = chan
        if not last:
            last = dt

        if channel != chan:
            # Channel switched
            sl = Slice(first, last, channel)
            data.append(sl)

            first = dt
            last = dt
            channel = chan
            continue

        last = dt  # Update end of slice

    # Last entry
    sl = Slice(first, last, channel)
    data.append(sl)

timeformat = '%Y-%m-%d %H:%M.%S'
for e in data:
    print('From {} to {} channel {}'.format(
        e.start.strftime(timeformat),
        e.end.strftime(timeformat),
        sender_identifiers[e.channelid]
    ))

Output example:

From 2016-03-10 14:16.27 to 2016-03-10 15:27.20 channel Kabel 1
From 2016-03-10 15:27.28 to 2016-03-10 15:28.10 channel VOX
From 2016-03-10 15:28.23 to 2016-03-10 15:29.05 channel ProSieben
From 2016-03-10 15:29.10 to 2016-03-10 15:29.31 channel Sat.1
From 2016-03-10 15:29.38 to 2016-03-10 15:30.25 channel RTL
From 2016-03-10 15:30.35 to 2016-03-10 15:31.11 channel Das Erste

Going further

If you want to present this to someone to demonstrate how easy it is to snoop on him/her, there's some graphing tools for Python!

And even better: load a TV magazines catalog and add the show name to the timespan! (I looked at tvbrowser and for some online magazine APIs, but that went nowhere.)

I am going back to fixing this clogging now. I hope to find something with OpenWRT, see UDP multicast.