Week 9 - Censorship
Additional reading
Important Readings
Towards a Comprehensive Picture of the Great Firewall’s DNS Censorship
https://www.usenix.org/system/files/conference/foci14/foci14-anonymous.pdfLinks to an external site.
Ignoring the Great Firewall of China
https://www.cl.cam.ac.uk/~rnc1/ignoring.pdfLinks to an external site.
Global Measurement of DNS Manipulation
https://www.cc.gatech.edu/~pearce/papers/dns_usenix_2017.pdfLinks to an external site.
Analysis of Country-wide Internet Outages Caused by Censorship
https://www.caida.org/publications/papers/2011/outages_censorship/outages_censorship.pdfLinks to an external site.
Augur: Internet-Wide Detection of Connectivity Disruptions
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7958591Links to an external site.
Adapting Social Spam Infrastructure for Political Censorship
https://www.icir.org/vern/papers/kremlin-bots.leet11.pdfLinks to an external site.
Five incidents, one theme: Twitter spam as a weapon to drown voices of protests
https://www.usenix.org/system/files/conference/foci13/foci13-verkamp.pdfLinks to an external site.
DNS censorship
DNS censorship
Link to originalDNS censorship
DNS censorship is a large-scale network traffic filtering strategy opted by a network to enforce control and censorship over Internet infrastructure to suppress material that they deem as objectionable.
Great Firewall of China (GFW)
Link to originalGreat Firewall of China (GFW)
China runs a massive firewall that covers the whole of the country. It uses DNS censorship by injecting fake DNS record responses into the network.
Researchers have tried to reverse engineer the GFW and to understand how it works. Researchers have started to identify some of the properties:
- Locality of GFW nodes: There are two differing notions on whether the GFW nodes are present only at the edge ISPs or whether they are also present in non-bordering Chinese ASs. The majority view is that censorship nodes are present at the edge.
- Centralized management: Since the blocklists obtained from two distinct GFW locations are the same, there is a high possibility of a central management (GFW Manager) entity that orchestrates blocklists.
- Load balancing: GFW load balances between processes based on source and destination IP address. The processes are clustered together to collectively send injected DNS responses.
DNS injection
DNS injection
Link to originalDNS injection
This is a form of DNS censorship. It uses a ruleset for which DNS entries are unacceptable and fakes the replies from a DNS lookup so the requester can not resolve the IP address. The works in the following way:
- DNS probe is sent to the open DNS resolvers
- The probe is checked against the blocklist of domains and keywords
- For domain-level blocking, a fake DNS A record response is sent back. There are two levels of blocking domains: the first one is by directly blocking the domain, and the second one is by blocking it based on keywords present in the domain
To detect DNS injection you can use probing techniques to search for injected paths.
There are multiple different techniques to use here.
- Packet dropping: For a specific set of IP addresses you drop packets going to or from that address.
- Strengths:
- Easy to implement
- Low cost
- Weaknesses:
- Maintaining a block list can be hard if users rotate their IP addresses.
- Overblocking if multiple services use the same IP address.
- Strengths:
- DNS poisoning: For a DNS lookup you do not respond or respond with a fake address.
- Strength: No over blocking since we are using the domain name.
- Weakness: Blocks the whole domain - no possible to allow different protocols through.
- Content inspection
- Proxy-based content inspection: This censorship technique is more sophisticated in that it allows for all network traffic to pass through a proxy where the traffic is examined for content, and the proxy rejects requests that serve objectionable content.
- Strengths:
- Precise censorship: A very precise level of censorship can be achieved, down to the level of single web pages or even objects within the web page.
- Flexible: Works well with hybrid security systems. E.g., with a combination of other censorship techniques like packet dropping and DNS poisoning.
- Weakness:
- Not scalable: They are expensive to implement on a large-scale network as the processing overhead is large (through a proxy).
- Strengths:
- Intrusion detection systems (IDS) based content inspection: An alternative approach is to use parts of an IDS to inspect network traffic. An IDS is easier and more cost-effective to implement than a proxy-based system as it is more responsive than reactive in nature in that it informs the firewall rules for future censorship.
- Proxy-based content inspection: This censorship technique is more sophisticated in that it allows for all network traffic to pass through a proxy where the traffic is examined for content, and the proxy rejects requests that serve objectionable content.
- Blocking with Resets: This technique sends a TCP reset (RST) to block individual connections that contain requests with objectionable content. We can see this by packet capturing of requests that are normal and requests that contain potentially flaggable keywords. Let’s look at one such example of packet capture. Ok request
cam(53382) → china(http) [SYN]
china(http) → cam(53382) [SYN, ACK]
cam(53382) → china(http) [ACK]
**cam(53382) → china(http) GET / HTTP/1.0**
**china(http) → cam(53382) HTTP/1.1 200 OK (text/html) etc...
china(http) → cam(53382) ..._more of the web page_**
**cam(53382) → china(http) [ACK]
..._and so on until the page was complete_**
Blocked request
cam(54190) → china(http) [SYN]
china(http) → cam(54190) [SYN, ACK] TTL=39
cam(54190) → china(http) [ACK]
cam(54190) → china(http) GET /?falun HTTP/1.0
**china(http) → cam(54190) [RST] TTL=47, seq=1, ack=1**
**china(http) → cam(54190) [RST] TTL=47, seq=1461, ack=1**
**china(http) → cam(54190) [RST] TTL=47, seq=4381, ack=1**
china(http) → cam(54190) HTTP/1.1 200 OK (text/html) _etc..._
cam(54190) → china(http) [RST] TTL=64, seq=25, ack zeroed
china(http) → cam(54190) ..._more of the web page_
cam(54190) → china(http) [RST] TTL=64, seq=25, ack zeroed
china(http) → cam(54190) [RST] TTL=47, seq=2921, ack=25
After the client (cam54190) sends the request containing flaggable keywords, it receives 3 TCP RSTs corresponding to one request, possibly to ensure that the sender receives a reset. The RST packets received correspond to the sequence number of 1460 sent in the GET request.
- Immediate Reset of Connections: In addition to inspecting content, to suspend traffic coming from a source immediately for a short period of time.
After the request above
cam(54191) → china(http) [SYN]
china(http) → cam(54191) [SYN, ACK] TTL=41
cam(54191) → china(http) [ACK]
china(http) → cam(54191) [RST] TTL=49, seq=1
The reset packet received by the client is from the firewall. It does not matter that the client sends out legitimate GET requests following one “questionable” request. It will continue to receive resets from the firewall for a particular duration. Running different experiments suggests that this blocking period is variable for “questionable” requests.
Measuring DNS manipulation
It is believed over 60 countries are impacted by some form of DNS censorship byt there is little comprehensive knowledge of what is blocked in which country because of the following issues:
- Diverse Measurements
- Geographic and Political Variation:
- Different geographic regions, ISPs, and countries exhibit diverse political dynamics affecting censorship.
- Censorship techniques can vary even within regions of the same country.
- Different Filtering Techniques:
- ISPs may employ various methods, such as IP address blocking or keyword-based web request blocking.
- Need for Longitudinal Studies:
- Continuous and widespread measurements are necessary to understand the global scope and diversity of DNS manipulation.
- Geographic and Political Variation:
- Need for Scale
- Limitations of Volunteer-Based Methods:
- Initial methods relied on volunteers installing and running measurement software.
- This approach lacks the scale needed for comprehensive analysis.
- Automation and Independence:
- There is a need for automated measurement tools that do not depend on human intervention.
- Limitations of Volunteer-Based Methods:
- Identifying Intent to Restrict Content Access
- Complexity in Detection:
- Inconsistent or anomalous DNS responses may be due to various causes, including misconfigurations.
- Intent Detection:
- Detecting DNS manipulation involves discerning intent to block access, which is inherently challenging.
- Reliance on Multiple Indications:
- It is essential to identify multiple signals to infer deliberate DNS manipulation.
- Complexity in Detection:
- Ethics and Minimizing Risks
- Risks to Citizens:
- Participation in censorship measurement can pose risks, especially in countries penalising access to censored content.
- Safer Alternatives:
- Avoid using home network DNS resolvers or forwarders.
- Prefer open DNS resolvers within Internet infrastructure, such as those hosted by ISPs or cloud providers.
- Risks to Citizens:
Good method to measure censorship require different vantage points on the internet. Some of these did use servers to rent such as CensMon others such as OpenNet used volunteers - though this can be difficult in exactly the places where you would want to measure it.
Iris
Link to originalIris
This is a system that detects DNS censorship. It does this by comparing the responses of open DNS resolvers on the internet. This is done in a multi-step process as shown below.
This first looks for open DNS resolvers that are part of the internet infrastructure (i.e. not home routers that are sometimes open due to misconfiguration).
Then we query them all for the same set of domains and compare the responses.
- Perform global DNS queries - establish a based line using 3 of them within the control of the Iris team.
- Annotate DNS responses with auxiliary information to assist classification.
- Additional PTR and TLS scanning - this is to allow inconsistencies due to virtual hosting to be resolved.
After the dataset is gathered we then calculate two types of metrics:
- Consistency metrics: Checking if the same look up in different locations provides different responses for IP address, AS, HTTP content, ect.
- Independent verifiable metrics: These are metrics that use other datasets to verify they are correct such as HTTPS certificates.
If both of these metrics are satisfied then the response is considered correct otherwise it is labelled as incorrect.
Censorship through connectivity disruptions
The most direct way of censorship is to block access to the whole or parts of the internet at the IP level. The main methods to do this are:
- Physically disconnecting infrastructure: If the network is sufficiently small then you could take down the access points to the internet. This is hard however, as normally this infrastructure is distributed.
- Router disruption: Abusing the BGP to change the routes that are offered or removing them completely. This is fairly easy to detect as you would be able to notice the change in routing behaviour.
- Packet filtering: Such as what a firewall or a switch does but on the level on of the whole network. This can be harder to detect as you would need to probe for these IP address or the paths packets follow.
Angur
Link to originalAngur
This is a system that monitors for censorship through connectivity disruptions. It uses two internet protocols:
- IP ID: This uses the IPv4 field for the packet ID. Which is a 16-bit field used to identify packets that are fragmented. (There is an analogous IPv6 field on the fragmentation extension header - though in this protocol you should not fragment packets.) Normally servers keep a count of the packets they have sent and increment the IP ID by one for each subsequent packet.
- TCP RST: When an unexpected TCP such as a TCP SYN-ACK packet without a previous SYN is sent to a host, it sends back a reset packet. (This makes some assumptions about complex behaviours not happening.)
The system aims to detect if filtering exists between two hosts, a reflector and a site. A reflector is a host which maintains a global IP ID. A site is a host that may be potentially blocked. To identify if filtering exists, it makes use of a third machine called the measurement machine.
There are two techniques it will then use:
- Probing: The measurement machine send a SYN-ACK message to the reflector and records the IP ID of the RST message it sends back.
- Pertubation: The measurement machine sends a spoofed packet to the site with the source IP being the reflector. The site will respond to the reflector with a SYN-ACK. Then the reflector will return a TCP RST to the site - incrementing its IP ID by 1.
The reflectors IP ID will only increase by 1 if the communication from the site to the reflector is not censored. Therefore we have the three circumstances below happening.
This last picture relies on the server trying to resend the SYN-ACK when it got no ACK back from the reflector. This will increment the IPID again by two on a third probe.