home

SNE Master Research Projects 2017 - 2018

http://uva.nl/
2004-2005 2005-2006 2006-2007 2007-2008 2008-2009 2009-2010 2010-2011 2011-2012 2012-2013 2013-2014 2014-2015 2014-2015 2015-2016 2016-2017 2017-2018
Contact TimeLine Projects LeftOver Projects Presentations-rp1 Presentations-rp2 Objective Process Tips Project Proposal

Contact

Cees de Laat, room: C.3.152
And the OS3 staff.
Course Codes:

Research Project 1 53841REP6Y
Networking Research Project 2 53842NRP6Y    
Security Research Project 2 53842SRP6Y

TimeLine

RP1 (January):
  • Wednesday xxx xxx, 2017, 10h15-13h00: Introduction to the Research Projects.
  • Wednesday xxx xxx, 2017, 10h15-13h00: Detailed discussion on chosen subjects for RP1.
  • Monday Jan 9th - Friday Feb 3th 2018: Research Project 1.
  • Friday Jan 13th: (updated) research plan due.
  • Monday Jan 23, 16h00, progress meeting (not mandatory).
  • Monday Feb 6, 2017 13h00-17h00: Presentations RP1 in B1.23 at SP 904.
  • Tuesday Feb 7, 2017 10h00 - 17h00: Presentations RP1 in B1.23 at SP 904.
  • Sunday Feb 12th 24h00: RP1 - reports due
RP2 (June):
  • Wednesday XXXXX, 2018, 10h15-12h15, B1.23 Detailed discussion on chosen subjects for RP2.
  • Tuesday XXXXX, 2018, 16h00-17h00, B1.23 Detailed discussion on chosen subjects for RP2.
  • Tuesday Jun 6th - Friday Jun 30, 2018: Research Project 2.
  • Friday Jun 9th: (updated) research plan due.
  • Monday Jun 19, 16h00 progress meeting (not mandatory).
  • Monday Jul 3 2018, 13h00-17h00: presentations in C0.110 @ SP904.
  • Tuesday Jul 4 2018, 13h00-17h00: presentations in C0.110 @ SP904.
  • Sunday July 9th 24h00 2018: RP2 - reports due.

Projects

Here is a list of student projects. Find here the left over projects this year: LeftOvers.
In a futile attempt to prevent spam "@" is replaced by "=>" in the table.
Color of cell background:
Project available Presentation received. Confidentiality was requested.
Currently chosen project. Report received. Blocked, not available.
Project plan received. Completed project. Report but no presentation
Outside normal rp timeframe

wordle-s.png



title
summary
supervisor contact

students
R

P
1
/
2
1

Real-time Video Stream filtration for Data and Facial Anonymization.

The NI-1772C are cameras that are used frequently in healthcare settings. Often it is required that the video streamed frames are sent after removing all metadata and facial characteristics are anonymised too. This research project aims at building a lightweight solution that investigates novel methods of video stream data and facial information anonymisation.

This project can be of great value if implement rightly.

The supervisor is available full time over Skype for consultation for students.
Junaid Chaudhry <xunayd=>gmail.com>

Sandino Moeniralam <Sandino.Moeniralam=>os3.nl>
R

P
2
2

Automated migration testing.

Unattended content management systems are a serious risk factor for internet security and for end users, as they allow trustworthy information sources on the web to be easily infected with malware and turn evil.
  • How can we use well known software testing methodologies (e.g. continuous integration) to automatically test if available updates to software running on a website that fix security weaknesses can be safely implement with as minimal involvement of the end user as possible?
  • How would such a migration work in a real world scenario?
In this project you will at the technical requirements for automated migration testing, and if possible design a working prototype.
Michiel Leenaars <michiel=>nlnet.nl>




3

Virtualization vs. Security Boundaries.

Traditionally, security defenses are built upon a classification of the sensitivity and criticality of data and services. This leads to a logical layering into zones, with an emphasis on command and control at the point of inter-zone traffic. The classical "defense in depth" approach applies a series of defensive measures applied to network traffic as it traverses the various layers.

Virtualization erodes the natural edges, and this affects guarding system and network boundaries. In turn, additional technology is developed to add instruments to virtual infrastructure. The question that arises is the validity of this approach in terms of fitness for purpose, maintainability, scalability and practical viability.
Jeroen Scheerder <Jeroen.Scheerder=>on2it.net>




4

Efficient delivery of tiled streaming content.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
The technique known as Tiled Streaming build on this concept by not only splitting up content temporally, but also spatially, allowing for specific areas of a video to be independently encoded and requested. This method allows for the navigation in ultra-high resolution content, while not requiring the entire video to be transmitted.
An open question is how these numerous spatial tiles can be distributed and delivered most efficiently over a network, reducing both unnecessary overhead as well as latency.
Omar Niamut <omar.niamut=>tno.nl>



5

What is the effectiveness of monitoring darknet fora to predict possible hacking attempts against, for example, Dutch targets (banks, critical infrastructure, etc)?

The purpose of the research is that in theory, a well built system might have foreseen the DDOS attack against Ziggo's nameservers a few months ago based on chatter on hiring a botnet to "target a Dutch ISP". It may have been enough to at least take preparations against such an attack.

We reference the OS3 paper of Diana Rusu, which is titled "Forum post classification to support forensic investigations of illegal trade on the Dark Web".

As for an introduction of both of us: we are not experts on the machine learning part, but are enthusiastic to learn new subjects. Machine learning is becoming more important these days due to the growth of data, so we think learning this skill is a good investment. We both like to program in different languages.

The exact research question could be slightly changed if need be, for example if it seems that the research question is too broad.
<martijn.spitters=>tno.nl>
<stefan.verbruggen=>tno.nl>




6

System Security Monitoring using Hadoop.

It involves looking into data mining of system and network logs using Hadoop and then focusing on system security. This research will investigate a real time ’streaming’ approach for monitoring system security - so streaming data through hadoop (e.g. via spark streaming) and then identifying and storing possible incidents. As an example of visualization you could think of a real-time map of the world displaying both failed and successful login attempts. In any case an important first part of the project would be investigating what others have done in this field and which systems and techniques they used. This to get an overview of all the possibilities. Finally implementing a small proof of concept based on ‘best-practice’ or cutting edge tools/API’s would be a great final result.
Machiel Jansen <machiel.jansen=>surfsara.nl>
Mathijs Kattenberg <mathijs.kattenberg=>surfsara.nl>




7

Qualitative analysis of Internet measurement methods and bias.

In the past year NLnet Labs and other organisations have run a number of measurements on DNSSEC deployment and validation.  We used the RIPE Atlas infrastructure for measurements, while other used Google ads where flash code runs the measurements.  The results differ as the measurement points (or observation points) differ: RIPE Atlas measurment points are mainly located in Europe, while Google ads flash measurements run global (or with some stronger representation of East-Asia).

Question is can we quantify the bias in the Atlas measurements or qualitative compare the measurements, so we can correlate the results of both measurement platforms.  This would greatly help interpret our results and the results from others based on the Atlas infrastructure. The results are highly relevant as many operational discussions on DNS and DNSSEC deployment are supported or falsified by these kind of measurements.
Willem Toorop <willem=>nlnetlabs.nl>




8

Leader election and logical organization in inter-cloud virtual machines.

The objective of the project is to create a service that is deployed on every VM of a distributed cluster which allows the cluster of VMs to elect a leader. This can be extended further so that the cluster of VMs can have different groups with each group having its own leader.
 
When considering highly distributed volatile systems as those that can be created using virtual machines from different cloud providers, mapping distributed applications to the virtual machines is not a trivial task. The basic necessities for having a functioning distributed system can not be taken for granted. E.g. networking between nodes on different providers can quickly get out of hand. Another issue is logical organization of the nodes into groups with leaders. Many application mapping scenarios require temporary leaders to e.g. coordinate replication or coalesce monitoring information. Without any central node this leader needs to be elected just-in- time in a distributed fashion. It is convenient that virtual machines be equipped with a service that help organize themselves into logical groups where every group elects a leader. The volatile cloud environment means that virtual machines come and go thus the service must be dynamic enough to ensure a leader is elected in every scenario. With such a service running on each VM, an application can query the service to request the leader IP or other ID information and use this info to further optimize the application scheduling.
 
This area has been studied for a long time and there are various algorithms to achieve consensus and leader election. The most common are Paxos and Raft algorithms where Raft is simpler. Also, of research interest is blockchain consensus which has been popularized by bitcoin and is aimed at achieving consensus on untrusted nodes.
Yuri Demchenko <y.demchenko=>uva.nl>
Reggie Cushing <r.s.cushing=>uva.nl>


9

Building an open-source, flexible, large-scale static code analyzer.

Background information
Data drives business, and maybe even the world. Businesses that make it their business to gather data are often aggregators of client­side generated data. Client­side generated data, however, is inherently untrustworthy. Malicious users can construct their data to exploit careless, or naive, programming and use this malicious, untrusted data to steal information or even take over systems.
It is no surprise that large companies such as Google, Facebook and Yahoo spend considerable resources in securing their own systems against would­be attackers. Generally, many methods have been developed to make untrusted data cross the trust­boundary to trusted data, and effectively make malicious data harmless. However, securing your systems against malicious data often requires expertise beyond what even skilled programmers might reasonably possess.
Problem description
Ideally, tools that analyze code for vulnerabilities would be used to detect common security issues. Such tools, or static code analyzers, exist, but are either out­dated (http://rips­scanner.sourceforge.net/) or part of very expensive commercial packages (https://www.checkmarx.com/ and http://armorize.com/). Next to the need for an open­source alternative to the previously mentioned tools, we also need to look at increasing our scope. Rather than focusing on a single codebase, the tool would ideally be able to scan many remote, large­scale repositories and report the findings back in an easily accessible way.
An interesting target for this research would be very popular, open­source (at this stage) Content Management Systems (CMSs), and specifically plug­ins created for these CMSs. CMS cores are held to a very high coding standard and are often relatively secure. Plug­ins, however, are necessarily less so, but are generally as popular as the CMSs they’re created for. This is problematic, because an insecure plug­in is as dangerous as an insecure CMS. Experienced programmers and security experts generally audit the most popular plug­ins, but this is: a) very time­intensive, b) prone to errors and c) of limited scope, ie not every plug­in can be audited. For example, if it was feasible to audit all aspects of a CMS repository (CMS core and plug­ins), the DigiNotar debacle could have easily been avoided.
Research proposal
Your research would consist of extending our proof­of­concept static code analyzer written in Python and using it to scan code repositories, possibly of some major CMSs and their plug­ins, for security issues and finding innovative ways of reporting on the massive amount of possible issues you are sure to find. Help others keep our data that little bit more safe.
Patrick Jagusiak <patrick.jagusiak=>dongit.nl>
Wouter van Dongen <wouter.vandongen=>dongit.nl>


10

Mobile app fraud detection framework.

How to prevent fraud in mobile banking applications. Applications for smartphones are commodity goods used for retail (and other) banking purpose. Leveraging this type of technology for money transfer attracts criminal organisations trying to commit fraud. One of many security controls can be detection of fraudulent transactions or other type activity. Detection can be implemented at many levels within the payment chain. One level to implement detection could be at the application level itself. This assignment will entail research into the information that would be required to detect fraud from within mobile banking applications and to turn fraud around by building a client side fraud detection framework within mobile banking applications.
Steven Raspe <steven.raspe=>nl.abnamro.com>


11

Malware analysis NFC enabled smartphones with payment capability.

The risk of mobile malware is rising rapidly. This combined with the development of new techniques provides a lot of new attach scenarios. One of these techniques is the use of mobile phones for payments. In this research project you will take a look at how resistant these systems are against malware on the mobile. We would like to look at the theoretical threats, but also perform hands-on testing.
NOTE: timing on this project might be a challenge since the testing environment is only available during the pilot from August 1st to November 1st.
Steven Raspe <steven.raspe=>nl.abnamro.com>




12

Research MS Enhanced Mitigation Experience Toolkit (EMET).

Every month new security vulnerabilities are identified and reported. Many of these vulnerabilities rely on memory corruption to compromise the system. For most vulnerabilities a patch is released after the fact to remediate the vulnerability. Nowadays there are also new preventive security measures that can prevent vulnerabilities from becoming exploitable without availability of a patch for the specific issue. One of these technologies is Microsoft’s Enhanced Mitigation Experience Toolkit (EMET) this adds additional protection to Windows, preventing many vulnerabilities from becoming exploitable. We would like to research whether this technology is efficient in practice and can indeed prevent exploitation of a number of vulnerabilities without applying the specific patch. Also we would like to research whether there is other impact on the system running EMET, for example a noticeable performance drop or common software which does not function properly once EMET is installed. If time permits it is also interesting to see if existing exploits can be modified to work in an environment protected by EMET.
Henri Hambartsumyan <HHambartsumyan=>deloitte.nl>


13

Triage software.

In previous research a remote acquisition and storage solution was designed and built that allowed sparse acquisition of disks over a VPN using iSCSI. This system allows sparse reading of remote disks. The triage software should decide which parts of the disk must be read. The initial goal is to use meta-data to retrieve the blocks that are assumed to be most relevant first. This in contrast to techniques that perform triage by running remotely while performing  a full disk scan (e.g. run bulk_extractor remotely, keyword scan or do a hash based filescan remotely).

The student is asked to:
  1. Define criteria that can be used for deciding which (parts of) files to acquire
  2. Define a configuration document/language that can be used to order based on these criteria
  3. Implement a prototype for this acquisition
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>



14

Parsing CentralTable.accdb from Office file cache and restoring cached office documents.

The Microsoft Office suit uses a file cache for several reasons, one of them is delayed uploading and caching of documents from a sharepoint server.
In these cache files office partial or complete documents that have been opened on a computer might be available. Also the master database in the file cache folder contains document metadata from sharepoint sites. In this project you are asked to research the use of the office file cache and deliver a POC for extraction and parsing of metadata from the database file, also decode or parse document contents from the cachefiles (.FSD).
Yonne de Bruijn <yonne.debruijn=>fox-it.com>
Rick van Gorp <Rick.vanGorp=>os3.nl>
R

P
1
15

UsnJrnl parsing for Microsoft Office activity.

In modern Windows versions, the NTFS filesystem keeps a log (the UsnJrnl file) of all operations that take place on files and folders. This can include interesting information about read- and write-operations on files. Microsoft Office programs perform a lot of file-operations in the background while a user is working on a file (think of autosave, back-up copies, copy-paste operations, etc.). While a lot of this activity leaves short-term traces on the file system, they can often only be found in the UsnJrnl after a while. Only little research has been done on the forensic implications of these traces. In this project, you are requested to research which traces are left in the UsnJrnl when using Office applications like Word and Excel and how these traces can be combined into a hypothesis about what activity was performed on a document.
Gina Doekhie <gina.doekhie=>fox-it.com>

Joost van Oorschot <Joost.vanOorschot=>os3.nl>
R

P
1
16

The Serval Project.

Here a few projects from the Serval project. Not everything is equally appropriate for the SNE master, but it gives possibly ideas for rp's.

1. Porting Serval Project to iOS

The Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is looking to port to iOS.  There are a variety of activities to be explored in this space, including how to provide interoperability with Android and explore user interface issues.

3. C65GS FPGA-Based Retro-Computer

The C65GS (http://c65gs.blogspot.nl, http://github.com/gardners/c65gs) is a reimplementation of the Commodore 65 computer in FPGA, plus various enhancements.  The objective is to create a fun 8-bit computer for the 21st century, complete with 1920x1200 display, ethernet, accelerometer and other features -- and then adapt it to make a secure 8-bit smart-phone.  There are various aspects of this project that can be worked on.

4. FPGA Based Mobile Phone

One of the long-term objectives of the Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is to create a fully-open mobile phone.  We believe that the most effective path to this is to use a modern FPGA, like a Zynq, that contains an ARM processor and sufficient FPGA resources to directly drive cellular communications, without using a proprietary baseband radio.  In this way it should be possible to make a mobile phone that has no binary blobs, and is built using only free and open-source software.  There are considerable challenges to this project, not the least of which is implementing 2G/3G handset communications in an FPGA.  However, if successful, it raises the possibility of making a mobile phone that has long-range UHF mobile mesh communications as a first-class feature, which would be an extremely disruptive innovation.
Paul Gardner-Stephen <paul.gardner-stephen=>flinders.edu.au>

17

SURFdrive security.

SURFdrive is a personal cloud storage service for the Dutch higher education and research community, offering staff, researchers and students an easy way to store, synchronise and share files in the secure and reliable SURF community cloud.

SURFdrive is based on Owncloud, an open-source personal cloud storage product. Our challenge is to make the software environment as safe and secure as possible. Question is:
  • How can we make the environment resistant to future 0-day attacks?
Maybe anomaly detection techniques might be helpful. Research task is to examine which techniques are helpful against 0-day attacks.
Rogier Spoor <Rogier.Spoor=>surfnet.nl>




18

Comparison of security features of major Enterprise Mobility Management solutions

For years, Gartner has identified the major EMM (formarly known as MDM) vendors. These vendors are typically rated on performance and features; security often is not addressed in detail.
This research concerns an in-depth analysis of the security features of major EMM solutions (such as MobileIron, Good, AirWatch, XenMobile, InTune, and so forth) on major mobile platforms (iOS, Android, Windows Phone). Points of interest include: protection of data at rest (containerization and encryption), protection of data in transit (i.e. VPN), local key management, vendor specific security features (added to platform API's),
Paul van Iterson <vanIterson.Paul=>kpmg.nl>




19

Partitioning of big graphs.

Distributed graph processing and GPU processing of graphs that are bigger than GPU memory both require that a graph be partitioned into sections that are small enough to fit in a single machine/GPU. Having fair partitions is crucial to obtaining good workload balance, however, most current partitioning algorithms either require the entire graph to fit in memory or repeatedly process the same nodes, causing the partitioning to be a very computationally intensive process.

Since a good partitioning scheme depends on both the number of machines used (i.e., the number of partitions) and the graph itself, this means that precomputing a partitioning is unhelpful. It would mean that incrementally updating the graph becomes impossible, we therefore need to do partitioning on-the-fly, preferably distributedly. This project involves investigating 1 or more possible partitioning schemes and developing prototypes. Possible starting points:
  • Partitioning that minimises cross-partition communication
  • Fine-grained partitioning that allows easy recombining of partitions to scale to the appropriate number of machines.
  • Distributed edge-count based partitioning that minimises communication.
Expected deliverables:
  • One or more partitioning prototypes
  • Write-up of the partitioning scheme and it's benefits
Merijn Verstraaten <M.E.Verstraaten=>uva.nl>




20

Analysing ELF binaries to find compiler switches that were used.

The Binary Analysis Tool is an open source tool that can automate analysis of binary files by fingerprinting them. For ELF files this is done by extracting string constants, function names and variable names from the various ELF sections. Sometimes compiler optimisations move the string constants to different ELF sections and extraction will fail in the current implementation.

Your task is to find out if it is possible by looking at the binary to see if optimisation flags that cause constants of ELF sections to be moved were passed to the compiler and reporting them. The scope of this project is limited to Linux.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>


21

Designing structured metadata for CVE reports.

Vulnerability reports such as MITRE's CVE are currently free format text, without much structure in them. This makes it hard to machine process reports and automatically extract useful information and combine it with other information sources. With tens of thousands of such reports published each year, it is increasingly hard to keep a holistic overview and see patterns. With our open source Binary Analysis Tool we aim to correlate data with firmware databases.

Your task is to analyse how we can use the information from these reports, what metadata is relevant and propose a useful metadata format for CVE reports. In your research you make an inventory of tools that can be used to convert existing CVE reports with minimal effort.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>

22

RedStar OS reverse engineering.

During 32C3 conference, two researchers showed that Redstar OS - North Koreas OS - implements custom cryptography in the pilsung.ko kernel module. Reverse engineer this module, understand the difference in the pilsung implementation of AES compared to normal AES. Is there some kind of backdoor or weakness in pilsung?
Note that we expect that deep understanding of assembly/reverse engineering and the Linux kernel is required to successfully research this topic.

See
for more info on RedStar OS reversing.
Tim van Essen <TvanEssen=>deloitte.nl>
Henri Hambartsumyan <hhambartsumyan=>deloitte.nl>


23

Efficient networking for clouds-on-a-chip.

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the use of shared memory to communicate data between processes running on the same processor.
As multi-core chip become prevalent, we can do better and use more modern techniques. In particular, the direct connections between cores on the chip can be used to implement a faster network than using the off-chip shared memory. This is what this project is about: demonstrate that direct use of on-chip networks yield better networking between VMs on the same chip than using shared memory.
The challenge in this project is that the on-chip network is programmatically different than "regular" network adapters like Ethernet, so we cannot use existing network stacks as-is.
The project candidate will thus need to explore the adaptation and simplification of an existing network stack to use on-chip networking.
The research should be carried out either on a current multi-core product or simulations of future many-core accelerators. Simulation technology will be provided as needed.

Raphael 'kena' Poss <r.poss=>uva.nl>


24

Secure on-chip protocols for clouds-on-a-chip.

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the virtualization of shared memory using virtual address translation within the core.
The problem with this old technique is that it assumes that the connection between cores is "secure". The physical memory accesses are communicated over the chip without any protection: if a VM running on core A exchanges data with off-chip memory, a VM running on core B that runs malicious code can exploit hardware errors or hardware design bugs to snoop and tamper with the traffic of core A.
To make Clouds-on-a-chip viable from a security perspective, further research is needed to harden the on-chip protocols, in  particular the protocols for accessing memory, virtual address translation and the routing of I/O data and interrupts.
The candidate for this project should perform a thorough analysis of the various on-chip protocols required to implement VMs on individual cores, then design protocol modifications that provide resistance against snooping and tampering by other cores on the same chip, together with an analysis of the corresponding overheads in hardware complexity and operating costs (extra network latencies and/or energy usage).
The research will be carried out in a simulation environment so that inspection of on-chip network traffic becomes possible. Simulation tools will be provided prior to the start of the project.
Raphael 'kena' Poss <r.poss=>uva.nl>

25

Multicast delivery of HTTP Adaptive Streaming.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
There is a growing interest from both content parties as well as operators and CDNs to not only be able to deliver these chunks over unicast via HTTP, but to also allow for them to be distributed using multicast. The question is how current multicast technologies could be used, or adapted, to achieve this goal.
Ray van Brandenburg <ray.vanbrandenburg=>tno.nl>



26

Generating test images for forensic file system parsers.

Traditionally, forensic file system parsers (such as The Sleuthkit and the ones contained in Encase/FTK etc.) have been focused on extracting as much information as possible. The state of software in general is lamentable — new security vulnerabilities are found every day — and forensic software is not necessarily an exception. However, software bugs that affect the results used for convictions or acquittals in criminal court are especially damning. As evidence is increasingly being processed in large automated bulk analysis systems without intervention by forensic researchers, investigators unversed in the intricacies of forensic analysis of digital materials are presented with multifaceted results that may be incomplete, incorrect, imprecise, or any combination of these.

There are multiple stages in an automated forensic analysis. The file system parser is usually one of the earlier analysis phases, and errors (in the form of faulty or missing results) produced here will influence the results of the later stages of the investigation, and not always in a predictable or detectable manner. It is relatively easy (modulo programmer quality) to create strict parsers that bomb-out on any unexpected input. But real-world data is often not well-formed, and a parser may need to be able to resync with input data and resume on a best-effort basis after having reached some unexpected input in the format. While file system images are being (semi-) hand-generated to test parsers, when doing so, testers are severely limited by their imagination in coming up with edge cases and corner cases. We need a file system chaos monkey.

The assignment consists of one of the following (may also be spawned in a separate RP:
  1. Test image generator for NTFS. Think of it as some sort of fuzzer for forensic NTFS parsers. NTFS is a complex filesystem which offers interesting possibilities to trip a parser or trick it into yielding incorrect results. For this project, familiarity with C/C++ and the use of the Windows API is required (but only as much as is necessary to create function wrappers). The goal is to automatically produce "valid" — in the sense of "the bytes went by way of ntfs.sys" — but hopefully quite bizarre NTFS images.
  2. Another interesting research avenue lies in the production of /subtly illegal/ images. For instance, in FAT, it should be possible, in the data format, to double-book clusters (aking to a hard link). It may also be possible to create circular structures in some file systems. It will be interesting to see if and how forensic filesystem parsers deal with such errors.
"Wicher Minnaard (DT)" <wicher=>holmes.nl>
Zeno Geradts <zeno=>holmes.nl>




27

Large scale Log Analytics.

Central log analysis is a "Big Data" challenge at Vancis. We have thousands of servers, devices and applications logging  data. We'd like to retrieve intelligent (or preferably, actionable) information from logs by applying machine learning techniques. We expect that you select and apply methods that should (substantiated by research) deliver a tangible result. The initial business question is intentionally broad. We expect you to narrow the scope such that you are left with a final research question that can be answered in the limited time you are given. You can focus on a particular type of data (e.g. system-, audit-, network-, application- logs) or combine different sets.

We expect you to demo your solution (algorithm, code-pieces) on both a small set of data (<1TiB) and a large set of data (>TiB) and proof that the solution scales. A big bonus would be if the chosen method delivers a tangible business outcome (e.g. security is improved, the speed of finding the cause for a failing service is increased, etc).

We are facilitating a ready-to-use cluster including Hadoop/Spark, ElasticSearch, LogStash & related technologies. During the project you are free to add applications if necessary to execute your task. We are more than happy to interact with you to scope the research question and support you by supplying data that you need to execute the case.
Anthony Potappel <Anthony.Potappel=>vancis.nl>
Patrick Beitsma <Patrick.beitsma=>vancis.nl>




28

Android Application Security.

Recent Android releases have significantly improved support for full disk encryption, with it being enabled by default as of version 5.0. As we have seen on iOS full disk encryption is not fully effective (powering on the device decrypts the disk). With disk encryption potentially not fully effective there may be need for encryption on the application level that developers can include in their app. Research the possibility for secure encryption per app, either via loadable libraries in the app, or perhaps a encryption layer between OS and app. Make a proof-of-concept implementation if the time allows for it. Note that dynamic code loading comes with its own set of application security tradeoffs.
  • Sufficient programming skills are needed.
Rick van Galen <vanGalen.Rick=>kpmg.nl>


R

P

29

(In)security of java usage in large software frameworks and middleware.

Java is used in almost all large software application packages. Examples such packages are middleware (Tomcat, JBoss and WebSphere) and products like SAP and Oracle. Goal of this research is to investigate on the possible attacks that exists on Java (e.g. RMI) used in such large software packages and develop a framework to securely deploy (or attack) those.
Martijn Sprengers <Sprengers.Martijn=>kpmg.nl>



30

Text mining on the basis of Natural Language Processing.

This project involves using Natural Language Processing  (NLP) to analyse registrant data, e.g. to identify false information and other abuses promptly when a new domain name is registered.
More info:
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>


31

Virtual reality interface for data analysis.

This project involves designing and developing a virtual reality (VR) interface for the analysis of large volumes of DNS data. The virtual world should enable the user to explore the data on an intuitive basis. The VR interface should also aid the recognition of irregularities and interrelationships.
More info:
Marco Davids <marco.davids=>sidn.nl>
Cristian Hesselman <cristian.hesselman=>sidn.nl>


32

Usage Control in the Mobile Cloud.

Mobile clouds [1] aim to integrate mobile computing and sensing with rich computational resources offered by cloud back-ends. They are particularly useful in services such as transportation, healthcare and so on when used to collect, process and present data from physical world. In this thesis, we will focus on the usage control, in particular privacy, of the collected data pertinent to mobile clouds. Usage control[2] differs from traditional access control by not only enforcing security requirements on the release of data by also on what happens afterwards. The thesis will involve the following steps:
  • Propose an architecture over cloud for "usage control as a service" (extension of authorization as a service) for the enforcement of usage control policies
  • Implement the architecture (compatible with Openstack[3] and Android) and evaluate its performance.
References
[1] https://en.wikipedia.org/wiki/Mobile_cloud_computing
[2] Jaehong Park, Ravi S. Sandhu: The UCONABC usage control model. ACM Trans. Inf. Syst. Secur. 7(1): 128-174 (2004)
[3] https://en.wikipedia.org/wiki/OpenStack
[4] Slim Trabelsi, Jakub Sendor: "Sticky policies for data control in the cloud" PST 2012: 75-80
Fatih Turkmen <F.Turkmen=>uva.nl>

33

Detection of DDoS Mitigation.

Recent rise in DDoS issues have given rise to a wide range of mitigation approaches.

An attacker that seeks to maximize impact could be interested in predicting potential success: is a potential target "protected" or not? Deciding this question  probably involves measurements, and reasoning about measurement results -- heuristics? -- among other things.  How to?  To what extent can an attacker expect to succeed in detecting the presence/absence of protective layers on the intermediate network path?

For more information in Dutch: SURFnet Project DDos
Jeroen Scheerder <js=>on2it.net>

Kenneth van Rijsbergen <Kenneth.vanRijsbergen=>os3.nl>
R

P
2
34

Automated asset identification in large organizations.

Many large organizations are struggling to remain in control over their IT infrastructure. What would help for these organizations is automated asset identification: given an internal IP range, scan the network and based on certain heuristics identify what the server's role is (i.e. is it a web server, a database, an ERP system, an end user, or a VoIP device).
Rick van Galen <vanGalen.Rick=>kpmg.nl>

35

Automatic phishing mail identification based on language markers.

Phishing mails are still a large threat for organizations. Phishing mails are hard to identify from end users' perspective. Quite often even, internal organizations send mails around that are very similar to phishing mails. Security operations centers often miss these emails as they are not caught by spam filters. What identifiers are included in phishing mails that can be used for automatic alerting of security teams in organizations?
Rick van Galen <vanGalen.Rick=>kpmg.nl>

36

Forensic investigation of smartwatches.

Smartwatches are an unknown area in information risk. They are an additional display for certain sensitive data (i.e. executive mail, calendars and other notifications), but are not necessarily covered by organizations' existing mobile security products. In addition, it is often much easier to steal a watch than it is to steal a phone. What is the data that gets 'left behind' on smartwatches in case of theft, and what information risks do they pose?
Rick van Galen <vanGalen.Rick=>kpmg.nl>


37

Pentest auditability 2.0: Digging into the network.

During security tests, it is often difficult to achieve great accountability of actions. Systems may be disrupted by a security test, or may be disrupted by unrelated bugs and administration within the organization. To prove accountability of certain actions, one must keep good records of pentest activities. One such method is to simply log and analyze network traffic. But is it feasible to do this? Does one log all network traffic, or only meta-information? And is it feasible to do this given storage requirements?
Rick van Galen <vanGalen.Rick=>kpmg.nl>

Marko Spithoff <mspithoff=>os3.nl>
R

P
1
38

WhatsApp end-to-end encryption: show us the money.

WhatsApp has recently switched to using the Signal protocol for their messaging, which should provide greatly enhanced security and privacy over their earlier, non end-to-end encrypted propietary protocol. Of course, since WhatsApp is closed source, one has to trust WhatsApp to actually use this Signal protocol, since one cannot review the source code. What other (automated) methods are there to verify that WhatsApp actually employs this protocol? This research is about reverse engineering Android and/or iOS apps. 
Rick van Galen <vanGalen.Rick=>kpmg.nl>

39



40



41

Various projects @ Deloitte.

Please follow the link below and look specifically for the one month projects. Inform me (CdL) which one you want to do an we create a separate project number for that.

Topic: Adding some new tests to our existing QuickScan vulnerability scanner.
Area of expertise: Development / Hacking.
Abstract: We are in the process of updating our existing QuickScan vulnerability scanner. It currently scans for issues such as improperly configured certificates, existence of admin interfaces, vulnerabilities such as Heartbleed, etc. We would like to add some tests, such as a check for Shellshock, HttPoxy, support for Perfect Forward Secrecy and Secure Renegotiation.
Duration: 1 month

Topic: Evaluating various executable packers (MS Windows) and understanding how A/V products behave
Area of expertise: Red Teaming Operations
Abstract:  An executable packer is a software that modifies the actual executable code while maintaining the files behavior. Commonly used to reduce the file size of large executables for added portability or most commonly to obfuscate them and make reverse engineering an complicated and costly or intensive process. There are multiple legitimate and underground software packers. The purpose of this research is to identify the most common of them and evaluate them against a number of common Antivirus (A/V) products in order to understand the particularities between different A/V products, signature based detection and heuristic algorithms.
Duration: 1 month

Topic: Building an A/V assessment platform
Area of expertise: Red Teaming Operations
Abstract:  Using common tools such as Puppet, Docker or other mass-deployment solutions create a Windows and Linux blended solution that enables the automatic creation of a virtualized test lab for the evaluation of a potential malware across multiple Antivirus (A/V) products concurrently and securely. This does not involve analysis of the potential malware in a sandbox such as Cuckoo sandbox but the evaluation of an executable across multiple free and commercial A/V products.
Duration: 1 month

Topic: How to remain undetected in an environment with Microsoft Advanced Threat Analytics (ATA)
Area of expertise: Red Teaming Operations
Abstract: In 2015 Microsoft launched an on-premises platform that protects Microsoft-driven environments from advanced targeted attacks by automatically analyzing, learning and identifying normal and abnormal behavior of users, devices and resources. This platform can detect a number of attacks commonly used during Red Teaming  engagements such as Pass-the-Hash and abnormal usage of the Kerberos Golden Ticket within a domain. The purpose of this research is to figure out how to identify one or more of the following items; the usage of ATA within a network, the location of the "beacons" that can be used to detect an attack and to investigate what specific Windows events, network signatures or other events (could) trigger an alert.
Duration: 1 month
"van Essen, Tim (NL - Amsterdam)" <TvanEssen=>deloitte.nl>



42

Developing a public permissioned blockchain.

Blockchain technology is getting much attention triggered by the popularity of the bitcoin cryptocurrency. Ethereum (https://ethereum.org/) is a blockchain-based computer that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference. However, the unlimited openness of Ethereum poses risks. For example, bad actors can permanently put illegal content or applications on such a blockchain. This risk, and the associated legal liability will refrain legitimate businesses from running applications or supporting such an infrastructure. Permissioned blockchains, see e.g.
allow for certain parties to have more control in who can do what and, therefore, can help mitigate this risk. The hypothesis is that such permissioned blockchains can retain many of the benefits of blockchain technology.

In this project, you will investigate the hypothesis by
  • Performing a brief risk analysis, identifying the most prominent risks of permissionless blockchains
  • Performing a brief analysis of the main (quantifiable) benefits of permissionless blockchains
  • Developing permission requirements for managing the identified risks and relating those to the (potential) loss of benefits (e.g., openness, censorship resistance).
  • Implementing a permissioned blockchain (for example by using technology provided by Eris (https://erisindustries.com/) or Tendermint (http://tendermint.com/)).
  • Demonstrating the functionality of the system with a test application
  • Evaluating the system
Oskar van Deventer <oskar.vandeventer=>tno.nl>
Maarten Everts <maarten.everts=>tno.nl>

Peter Bennink <Peter.Bennink=>os3.nl>
R

P
1
43

Mobile device fingerprinting from App land.

Mobile device registration/binding can be an effective security measure in securing mobile Apps.

In this project you will investigate the technical possibilities to fingerprint mobile devices from the App sandbox?

Can a solution be designed that Apps can use to uniquely identify Android and/or iOS devices?

Such a solution might be used to strengthen authentication solutions or in the fight against mobile banking malware.
David Vaartjes <david.vaartjes=>securify.nl>
Jurgen Kloosterman
<jurgen.kloosterman=>securify.nl>


44

A hooking framework for security research on apps build on the Xamarin framework.

Xamarin (https://www.xamarin.com) is a popular framework for building cross platform mobile application using C#.

In this project you will investigate the security internals of the Xamarin framework via reverse engineering.

More specifically, how to hook/patch the Xamarin framework to speed up security research (pentesting) of Apps build in Xamarin.

Some examples: disable Certificate Pinning, log all Crypto, Keychain, Storage and Network operations.
David Vaartjes <david.vaartjes=>securify.nl>
Jurgen Kloosterman
<jurgen.kloosterman=>securify.nl>


45

Hijack video stream via malicious Thunderbold adapter (mimicking a display).

Apple Macbooks (amongst others) have multi Thunderbold ports which can be used to attach ethernet adapters, displays, storage devices etc. When attaching a display it will be automatically mirrored without any visible indication for the user.

In this project you will investigate if this behavior can be used to silently hijack (mirror) the video output of a target laptop/user by plugging in a crafted Thunderbold device, which internally mimics a display to silently forward the video output to an external (hacker controlled) system.
David Vaartjes <david.vaartjes=>securify.nl>
Jurgen Kloosterman
<jurgen.kloosterman=>securify.nl>


46

Detection of Android apps with Drammer/Rowhammer payload.

The very recent bug discoveries such as Drammer (https://vvdveen.com/publications/drammer.pdf) and Rowhammer results in multiple questions.
  • Will vendors issue a patch?
  • And will this patch actually solve the underlying problem?

It will just be a matter of time before an app will be released that will actively try to exploit a device for malicious purposes.

In this research project we want you to find a way to create a rogue app and secondly to find (forensic) patterns in order to discover and detect the presence of such an application
Jurgen Kloosterman
<jurgen.kloosterman=>securify.nl>


47

Windows Security log manipulation.

In-depth knowledge about windows internals required.
On Windows not even administrators can clean or modify the logs without getting noted. Defenders trust the integrity of the Windows event logs and use it as an important part for tracking down attacker’s actions. We want students to research possibilities for generic (low level) manipulation of windows security event logs. Could be alteration, dilation or adding of log rules, without the Microsoft provided API to do so. Possible ways for manipulation are raw file access and kernel patching. But we encourage any other way that gets the job done.
Marc Smeets <marc=>outflank.nl>
Rick Lahaye <rick.lahaye=>os3.nl>
R

P
2
48

Apple File System (APFS).

Apple recently introduce APFS with their latest version of OS X, Sierra. The new file system comes with some interesting new features that either pose challenges or opportunities for digital forensics. The goal in this project is to pick one or more relevant features (i.e. encryption, nanosecond timestamps, flexible space allocation, snapshot/cloning, etc.) and reverse engineer their inner workings to come up with a proof-of-concept parsing tool that provides useful input for forensic investigations of Apple systems.
Yonne de Bruijn <yonne.debruijn=>fox-it.com>


49

vmdk snapshot support for DfVFS.

DfVFS is a back-end library that provides read-only access to file system objects from various storage media types. In this day and age, virtual machines are becoming more and more common in company infrastructures. This project aims at analyzing the vmdk snapshot structure and the implementation of the vmdk snapshot structure in DfVFS.
Yonne de Bruijn <yonne.debruijn=>fox-it.com>

50

Virtual infrastructure partitioning and provisioning under nearly real-time constraints: Arie Taal, Zhiming Zhao

A complex cloud application often requires resources from different data centres or providers, e.g., because of the geographical location of some specific components, particular physical elements in the Internet of Things or a sensor network, or because of limits on the available resources for optimizing system performance or for balancing workloads. Instead of letting cloud providers do the provisioning, some developers need to plan infrastructure directly, and oversee the provisioning in order to optimize system performance or cost based on their own requirements and understanding of their application. Mapping a complex infrastructure on different data centre or providers basically involves several steps: partitioning the graph of the infrastructure, provisioning sub-graphs, and connecting the interstitial network. This project focuses on the first phase of the problem: how to effectively partition an infrastructure graph based on the constraints of data centres, application characteristics, and locations of non-Cloud components. This project therefore focuses on the graph-partitioning problem.

The students will:
  1. Review the state of the art of the problem and the existing algorithms.
  2. Evaluate the key algorithms based on characteristics of specific applications, cloud providers and quality of service (QoS) constraints.
  3. Test a prototype with the parallel provisioning components developed by a researcher in another concurrent project.
Zhiming Zhao <z.zhao=>uva.nl>
Arie Taal <a.taal=>uva.nl>


52

IoT DOS prevention and corporate responsibility.

The Dyn DOS attacks shows a fundamental problem in internet connected devices. Huge swathes of unpatched and improperly configured devices with access to high bandwidth are misused to bring down inter  What technical prevention and detection methods can organizations employ to make sure that they are not a contributor to this problem? And what can they do once it does appear they are inadvertently contributing to this problem? This would focus on literary research combining research in DoS prevention, asset management, patch management and network monitoring.
Rick van Galen <vanGalen.Rick=>kpmg.nl>



53

Practical blockchain on mobile devices.

Blockchain technology is currently something that requires a steady source of internet and power, since to be synchronized with the grander blockchain requires frequent receiving  and processing of blockchain data to keep in sync. This currently prevents blockchain technology to be effective in mobile devices, somewhat limiting its use. How would practical blockchain on mobile actually look? And how can this be accomplished? What are the relevant security aspects for this? This goal of this research is to provide a literary overview of the different aspects of making blockchain tech practical on mobile.
Rick van Galen <vanGalen.Rick=>kpmg.nl>

Sander Lentink <Sander.Lentink=>os3.nl>
R

P
1
54

Penetration test dashboarding.

A penetration test is a difficult thing for both penetration tester as the penetration tested. How does the penetration tested really know what is going in their penetration test, and keep in control? How can the tester himself stay up to date on what his/her team members are actually doing?
 
Penetration testing is a creative process, but it can be dashboarded to some degree. The data is there – mostly in log files – but it requires an extraordinary amount to make this log data understandable in human terms. But, making this understandable can be an automated process using penetration test tooling. But – what information is in fact required to be displayed in this dashboard, and what is the best way of actually showing this data? This research would combine literary research and interviews with the development of a small proof-of-concept.
Rick van Galen <vanGalen.Rick=>kpmg.nl>

55

Forensic investigation of wearables.

Wearables and especially smartwatches are an unfamiliar area in information risk because of their novelty. At the moment, primary concerns are aimed at privacy issues, but not at information risk issues. However, these devices are an additional display for certain sensitive data (i.e. executive mail, calendars and other notifications), but are not necessarily covered by organizations' existing mobile security processes and technology. In addition, it is often simply much easier to steal a watch or another wearable than it is to steal a phone.

This research focuses on the following question: what value could a wearable have to cyber criminals when it is stolen? What is the data that gets 'left behind' on smartwatches in case of theft, and what information risks do they pose?
Rick van Galen <vanGalen.Rick=>kpmg.nl>

56

Does a healthy lifestyle harm a cyber healthy lifestyle?

With the “recent” health trend of fitness apps and hardware such as fitbits, combined with the /need/ to share results with friends, family and the world through facebook, runkeeper, strava and other sites we have entered into an era of potential cyber unhealthiness. What potentially valuable information could be retrieved from the web or through bluetooth about people to influence health insurance rates of individuals? Note: this is a broad question, and it is up to the student to choose his/her own liking (e.g. focus on bluetooth security of fitbits/mi’s; identification of individuals through strava/runkeeper posts; quantifying the public sharing of health information; etc. etc.).
Ruud Verbij <Verbij.Ruud=>kpmg.nl>

R

P

57

Internet transport protocol: mdtmFTP across distance at a 100Gb/s.

Internet transport protocol: mdtmFTP is a middleware solution developed by Fermilab to transfer large volumes of data, that may be contained in lost of small files, across long distances using the concept of a Data Transfer Node. At KLM a DTN has been connected to Netherlight allowing experiments with national and international institutes. A research project deploying DTN's to share Big Data is for example the Pacific Research Platform project in which UvA participates. This project should evaluate the capabilities of mdtmFTP across short and long distance and compare it with other of FTP implementations (e.g GridFTP). It should also investigate if the middleware could be adopted to serve other Big Data type applications, e.g. allow data replication in a Hadoop File System across distance.

For more info on mdtmFTP see:
Leon Gommans <Leon.Gommans=>klm.com>

Kees de Jong <kees.dejong=>os3.nl>
R

P
1
58

Inventory of smartcard-based healthcare identification solutions in Europe and behond: technology and adoption.

For potential international adoption of Whitebox technology in the future, in particular the technique of patients carrying authorization codes with them to authorize healthcare professionals, we want to make an inventory of the current status of healthcare PKIs and smartcard technology in Europe and if possible also outside Europe.

Many countries have developed health information exchange systems over the last 1-2 decades, most of them without much regard of what other countries are doing, or of international interoperability. However, common to most systems developed today is the development of a (per-country) PKI for credentials, typically smartcards, that are provided to healthcare professionals to allow the health information exchange system to identify these professionals, and to establish their 'role' (or rather: the speciality of a doctor, such as GP, pharmacist, gyneacologist, etc.). We know a few of these smartcard systems, e.g., in Austria and France, but not all of them, and we do not know their degree of adoption.

In this project, we would like students to enquire about and report on the state of the art of healthcare smartcard systems in Europe and possibly outside Europe (e.g., Asia, Russia):
  • what products are rolled out by what companies, backed by what CAs (e.g., governmental, as is the case with the Dutch "UZI" healthcare smartcard)?
  • Is it easy to obtain the relevant CA keys?
  • And what is the adoption rate of these smartcards under GPs, emergency care wards, hospitals, in different countries?
  • What are relevant new developments (e.g., contactless solutions) proposed by major stakeholders or industry players in the market?
Note that this project is probably less technical than usual for an SNE student, although it is technically interesting. For comparison, this project may also be fitting for an MBA student.

For more information, see also (in Dutch): https://whiteboxsystems.nl/sne-projecten/#project-2-onderzoek-adoptie-health-smartcards-in-europa-en-daarbuiten
General introduction
Whitebox Systems is a UvA spin-off company working on a decentralized system for health information exchange. Security and privacy protection are key concerns for the products and standards provided by the company. The main product is the Whitebox, a system owned by doctors (GPs) that is used by the GP to authorize other healthcare professionals so that they - and only they - can retrieve information about a patient when needed. Any data transfer is protected end-to-end; central components and central trust are avoided as much as possible. The system will use a published source model, meaning that although we do not give away copyright, the code can be inspected and validated externally.

The Whitebox is currently transitioning from an authorization model that started with doctor-initiated static connections/authorizations, to a model that includes patient-initiated authorizations. Essentially, patients can use an authorization code (a kind of token) that is generated by the Whitebox, to authorize a healthcare professional at any point of care (e.g., a pharmacist or a hospital). Such a code may become part of a referral letter or a prescription. This transition gives rise to a number of interesting questions, and thus to possible research projects related to the Whitebox design, implementation and use. Two of these projects are described below. If you are interested in these project or have questions about other possibilities, please contact <guido=>whiteboxsystems.nl>.

For a more in-depth description of the projects below (in Dutch), please see https://whiteboxsystems.nl/sne-projecten/
Guido van 't Noordende <g.j.vantnoordende=>uva.nl>


59

Decentralized trust and key management.

Currently, the Whitebox provides a means for doctors (General Practitioner GPs) to establish static trusted connections with parties they know personally. These connections (essentially, authenticated TLS connections with known, validated keys), once established, can subsequently be used by the GP to authorize the party in question to access particular patient information. Examples are static connections to the GP post which takes care of evening/night and weekend shifts, or to a specific pharmacist. In this model, trust management is intuïtive and direct. However, with dynamic authorizations established by patients (see general description above), a question comes up on whether the underlying (trust) connections between the GP practice (i.e., the Whitebox) and the authorized organization (e.g,. hospital or pharmacist) may be re-usable as a 'trusted' connection by the GP in the future.

The basis question is:
  • what is the degree of trust a doctor can place in (trust) relations that are established by this doctor's patients, when they authorize another healthcare professional?
More in general:
  • what degree of trust that can be placed in relations/connections established by a patient, also in view of possible theft of authorization tokens held by patients?
  • What kind of validation methods can exist for a GP to increase or validate a given trust relation implied by an authorization action of a patient?
Perhaps the problem can be raised to a higher level also: can (public) auditing mechanisms -- for example, using block chains -- be used to help establish and validate trust in organizations (technically: keys of such organizations), in systems that implement decentralized trust-base transactions, like the Whitebox system does?

In this project, the student(s) may either implement part of a solution or design, or model the behavior of a system inspired by the decentralized authorization model of the Whitebox.

As an example: reputation based trust management based on decentralized authorization actions by patients of multiple doctors may be an effective way to establish trust in organization keys, over time. Modeling trust networks may be an interesting contribution to understanding the problem at hand, and could thus be an interesting student project in this context.

NB: this project is a rather advanced/involved design and/or modelling project. Students should be confident on their ability to understand and design/model a complex system in the relatively short timeframe provided by an RP2 project -- this project is not for the faint of heart. Once completed, an excellent implementation or evaluation may become the basis for a research paper.

See also (in Dutch): https://whiteboxsystems.nl/sne-projecten/#project-2-ontwerp-van-een-decentraal-vertrouwensmodel
General introduction
Whitebox Systems is a UvA spin-off company working on a decentralized system for health information exchange. Security and privacy protection are key concerns for the products and standards provided by the company. The main product is the Whitebox, a system owned by doctors (GPs) that is used by the GP to authorize other healthcare professionals so that they - and only they - can retrieve information about a patient when needed. Any data transfer is protected end-to-end; central components and central trust are avoided as much as possible. The system will use a published source model, meaning that although we do not give away copyright, the code can be inspected and validated externally.

The Whitebox is currently transitioning from an authorization model that started with doctor-initiated static connections/authorizations, to a model that includes patient-initiated authorizations. Essentially, patients can use an authorization code (a kind of token) that is generated by the Whitebox, to authorize a healthcare professional at any point of care (e.g., a pharmacist or a hospital). Such a code may become part of a referral letter or a prescription. This transition gives rise to a number of interesting questions, and thus to possible research projects related to the Whitebox design, implementation and use. Two of these projects are described below. If you are interested in these project or have questions about other possibilities, please contact <guido=>whiteboxsystems.nl>.

For a more in-depth description of the projects below (in Dutch), please see https://whiteboxsystems.nl/sne-projecten/
Guido van 't Noordende <g.j.vantnoordende=>uva.nl>




60

Behavioral analysis through the hypervisor

Dynamic analysis is often used to gain more insight into the functionality and behavior of malicious (or not-so-malicious) samples. Most sandboxes and dynamic analysis solutions use various hooking techniques or the OS debugging APIs to for example monitor API calls or interact with the execution flow. Various issues arise:
  • the confidentiality and integrity of the analysis tooling and its output can’t be guaranteed
  • analyzing kernel-mode code from tooling running on top of that kernel doesn’t work too well
The goal is therefore to research and develop a monitoring and instrumentation API using hypervisor-level debugging functionality.
Mitchel Sahertian <sahertian=>fox-it.com>

61

Bitcoin intelligence collection.

Intelligence collected from a large number of sources help to provide context and insight in various scenarios, for example:
•    Contextual querying in (Forensic) Investigations
•    Activity of malicious actors are tracked and subsequently turned into indicators of compromise that can be used to detect and counter malicious activity.
The decentral and anonymous Bitcoin currency is exploited by actors with malicious intentions. The goal is to research the metadata that is available on a node within the Bitcoin network, and to develop code that structures and provides a real-time feed of this metadata.
Mitchel Sahertian <sahertian=>fox-it.com>
Tim Dijkhuizen <tim.dijkhuizen=>os3.nl>
R

P
1
62

Research into darkweb scraping.

The darkweb contains among others tons of information about illegal activity, which might be interesting from an intelligence perspective. The intelligence can be used to monitor activity related to specific high profile organizations, or specific threat actors. Since there are a lot of different types of websites, with sometimes unique subscriber requirements it is hard to scrape these websites. In some cases an existing member has to vouch for a new member, users have to post at least once a month a message on the website (otherwise they will be banned), you have to pay in bitcoins to get access etc.
The goal of this research project is to come up with a theoretical framework for scraping (potentially) interesting darkweb websites, taking into account the different kind of subscription models and subscriber requirements. For this research project it is not required to develop a PoC.
Krijn de Mik <mik=>fox-it.com>

63

LDBC Graphalytics.

LDBC Graphalytics, is a mature, industrial-grade benchmark for graph-processing platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance.

Until recently, graph processing used only common big data infrastructure, that is, with much local and remote memory per core and storage on disk. However, operating separate HPC and big data infrastructures is increasingly more unsustainable. The energy and (human) resource costs far exceed what most organizations can afford. Instead, we see a convergence between big data and HPC infrastructure.
For example, next-generation HPC infrastructure includes more cores and hardware threads than ever-before. This leads to a large search space for application-developers to explore, when adapting their workloads to the platform.

To take a step towards a better understanding of performance for graph processing platforms on next-generation HPC infrastructure, we would like to work together with 3-5 students on the following topics:
  1. How to configure graph processing platforms to efficiently run on many/multi-core devices, such as the Intel Knights Landing, which exhibits configurable and dynamic behavior?
  2. How to evaluate the performance of modern many-core platforms, such as the NVIDIA Tesla?
  3. How to setup a fair, reproducible experiment to compare and benchmark graph-processing platforms?
Alex Uta <a.uta=>vu.nl>
Marc X. Makkes <m.x.makkes=>vu.nl>


64

The security of ICS/SCADA in Rail.

Interconnected ICS/SCADA systems around the world are exposed to risk due to lack of security countermeasures or misconfiguration issues. This project aims to regularly perform online scanning on the country i.e. (Netherlands) to identify permanent or mistakenly interconnected ICS/SCADA systems by recognizing default ICS ports, vendors’ interfaces and online search engines’ results.

Required area of expertise: Hacking

More info: http://werkenbijdeloitte.nl/cyber-graduate

Dima van de Wouw
<DvandeWouw=>deloitte.nl>

Peter.Prjevara=>os3.nl
R

P
1
65

Normal traffic flow information distribution to detect malicious traffic.

In the era of an increasingly encrypted communication it is getting harder to distinguish normal from malicious traffic. Deep packet inspection is no longer an option, unless the trusted certificate store of the monitored clients is altered. However, Netflow data might still be able to provide relevant information about the parties involved in the communication and the traffic volumes they exchange. So would it be possible to tell apart ill-intentioned traffic by looking only at the flows and using a little help from the content providers, like for example website owners and mobile application vendors?

The basic idea is to research a framework or a data interchange format between the content providers, described above, and the monitoring devices. Both in the case of a website and a mobile application such a description can be used to list the authorised online resources that should be used and what is the relative distribution of the traffic between them. If such a framework proves to be successful, it can help in alerting for covert channel malware communication, cross-site scripting and all other types of network communication not initially intended by the original content provider.
TBD




66

Profiling critical path related algorithms for different graph families and deadlines distributions.

Critical path based algorithms are effective means of scheduling tasks with deadlines, but it is difficult to determine which algorithm variants work in what scenarios. Customization of the critical algorithms for different input graph and deadline distributions is needed. The purpose of this project is to determine which variants are most appropriate for which graph structures. The work will be done in the context of EU projects SWITCH[1]; initial algorithms developed in SWITCH will be used as part of the test.

The student will:
  1. Review the state of the art, and identify a set of properties to characteristics application graph (workflows)
  2. Prepare a set of critical path related algorithms or strategies for testing
  3. Collect workflow graphs with certain characteristics
  4. Schedule experiments for different configurations and collect results
  5. Classify the results and discover the correlation among algorithm configuration and application characteristics
  6. (Optional) Prototype software tool for automating such profiling 
Reference:
  1.   http://www.switchproject.eu
  2.   Wang, J., Taal, A., Martin, P., Hu Y., Zhou, H., Pang, J., de Laat, C., Zhao, Z. (2017) Planning Virtual Infrastructures for Time Critical Applications with Multiple Deadline Constraints, International journal of Future Generation Computer System, volume 75, page 365-375.
More info: Arie Taal, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

67

Smart performance information discovery for Cloud resources.

The selection of virtual machines (VMs) must account for the performance requirements of applications (or application components) to be hosted on them. The performance of components on specific types of VM can be predicted based on static information (e.g. CPU, memory and storage) provided by cloud providers, however the provisioning overhead for different VM instances and the network performance in one data centre or across different data centres is also important. Moreover, application-specific performance cannot always be easily derived from this static information.

An information catalogue is envisaged that aims to provide a service that can deliver the most up to date cloud resource information to cloud customers to help them use the Cloud better. The goal of this project will be to extend earlier work [1], but will focus on smart performance information discovery. The student will:
  1. Investigate the state of the art for cloud performance information retrieval and cataloguing.
  2. Propose Cloud performance metadata, and prototype a performance information catalogue.
  3. Customize and integrate an (existing) automated performance collection agent with the catalogue.
  4. Enable smart query of performance information from the catalogue using certain metadata.
  5. (Optional) Test the results with the use cases in on-going EU projects like SWITCH.
Some reading material:
  1. Elzinga, O., Koulouzis, S., Hu, Y., Wang, J., Zhou, H., Martin, P., Taal, A., de Laat, C., and Zhao, Z (2017), Automatic collector for dynamic cloud performance Information, IEEE Networking, Architecture and Storage (NAS), Shenzheng, China, Auguest 7-8, 2017 https://doi.org/10.1109/NAS.2017.8026845
More info: Arie Taal, Paul Martin, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

68

Network aware performance optimization for Big Data applications using coflows.

Optimizing data transmission is crucial to improve the performance of data intensive applications. In many cases, network traffic control plays a key role in optimising data transmission especially when data volumes are very large. In many cases, data-intensive jobs can be divided into multiple successive computation stages, e.g., in MapReduce type jobs. A computation stage relies on the outputs of the the previous stage and cannot start until all its required inputs are in place. Inter-stage data transfer involves a group of parallel flows, which share the same performance goal such as minimising the flow's completion time.

CoFlow is an application-aware network control model for cluster-based data centric computing. The CoFlow framework is able to schedule the network usage based on the abstract application data flows (called coflows). However, customizing CoFlow for different application patterns, e.g., choosing proper network scheduling strategies, is often difficult, in particular when the high level job scheduling tools have their own optimizing strategies.

The project aims to profile the behavior of CoFlow with different computing platforms, e.g., Hadoop and Spark etc.
  1. Review the existing CoFlow scheduling strategies and related work
  2. Prototyping test applications using  big data platforms (including Apache Hadoop, Spark, Hive, Tez).
  3. Set up coflow test bed (Aalo, Varys etc.) using existing CoFlow installations.
  4. Benchmark the behavior of CoFlow in different application patterns, and characterise the behavior.
Background reading:
  1. CoFlow introduction: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-211.pdf
  2. Junchao Wang, Huan Zhouy, Yang Huz, Cees de Laatx and Zhiming Zhao, Deadline-Aware Coflow Scheduling in a DAG, in NetCloud 2017, Hongkong, to appear [upon request]
More info: Junchao Wang, Spiros Koulouzis, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

69

Elastic data services for time critical distributed workflows.

Large-scale observations over extended periods of time are necessary for constructing and validating models of the environment. Therefore, it is necessary to provide advanced computational networked infrastructure for transporting large datasets and performing data-intensive processing. Data infrastructures manage the lifecycle of observation data and provide services for users and workflows to discover, subscribe and obtain data for different application purposes. In many cases, applications have high performance requirements, e.g., disaster early warning systems.

This project focuses on data aggregation and processing use-cases from European research infrastructures, and investigates how to optimise infrastructures to meet critical time requirements of data services, in particular for different patterns of data-intensive workflow. The student will use some initial software components [1] developed in the ENVRIPLUS [2] and SWITCH [3] projects, and will:
  1. Model the time constraints for the data services and the characteristics of data access patterns found in given use cases.
  2. Review the state of the art technologies for optimising virtual infrastructures.
  3. Propose and prototype an elastic data service solution based on a number of selected workflow patterns.
  4. Evaluate the results using a use case provided by an environmental research infrastructure.
Reference:
  1. https://staff.fnwi.uva.nl/z.zhao/software/drip/
  2. http://www.envriplus.eu
  3. http://www.switchproject.eu
More info: —Spiros Koulouzis, Paul Martin, Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

70

Contextual information capture and analysis in data provenance.

Tracking the history of events and the evolution of data plays a crucial role in data-centric applications for ensuring reproducibility of results, diagnosing faults, and performing optimisation of data-flow. Data provenance systems [1] are a typical solution, capturing and recording the events generated in the course of a process workflow using contextual metadata, and providing querying and visualisation tools for use in analysing such events later.

Conceptual models such as W3C PROV (and extensions such as ProvONE), OPM and CERIF have been proposed to describe data provenance, and a number of different solutions have been developed. Choosing a suitable provenance solution for a given workflow system or data infrastructure requires consideration of not only the high-level workflow or data pipeline, but also performance issues such as the overhead of event capture and the volume of provenance data generated.

The project will be conducted in the context of EU H2020 ENVRIPLUS project [1, 2]. The goal of this project is to provide practical guidelines for choosing provenance solutions. This entails:
  1. Reviewing the state of the art for provenance systems.
  2. Prototyping sample workflows that demonstrate selected provenance models.
  3. Benchmarking the results of sample workflows, and defining guidelines for choosing between different provenance solutions (considering metadata, logging, analytics, etc.).
References:
  1. About project: http://www.envriplus.eu
  2. Provenance background in ENVRIPLUS: https://surfdrive.surf.nl/files/index.php/s/uRa1AdyURMtYxbb
  3. Michael Gerhards, Volker Sander, Torsten Matzerath, Adam Belloum, Dmitry Vasunin, and Ammar Benabdelkader. 2011. Provenance opportunities for WS-VLAM: an exploration of an e-science and an e-business approach. In Proceedings of the 6th workshop on Workflows in support of large-scale science (WORKS '11). http://dx.doi.org/10.1145/2110497.2110505
More info: - Zhiming Zhao, Adam Belloum, Paul Martin
Zhiming Zhao <z.zhao=>uva.nl>

71

Container deployment scheduling in Kubernetes/Swarm.

Operating system (OS) containers, are becoming increasingly popular among the cloud and DevOps community with emerging open source container management technologies (e.g., Docker). Orchestrator tools such as Kubernetes and Swarm can automate deployment, scale, and manage containerized applications, which have been adopted by a lot of enterprises, such as eBay, PHILIPS, SAMSUNG.
This research concerns an in-depth analysis of the schedulers in Kubernetes and Swarm. Their schedulers significantly impact availability, performance, and capacity of the container cluster. For example, they can ensure that containers are only placed on nodes that have sufficient free resources, it tries to balance out the resource utilization of nodes, etc.

The students will:
  1. Review the state of the art of container orchestration and deployment scheduling technologies
  2. Investigate how many kinds of schedulers these two systems can support
  3. Compare the performance of different schedulers to understand the characteristics of those schedulers
  4. Implement a new scheduler to enhance certain properties of the system
Reading material:
  1. Kubernetes: https://kubernetes.io/
  2. SWARM: https://docs.docker.com/swarm/
  3. Deployment scheduling: Hu, Y., Wang, J., Zhou, H., Martin, P., Taal, A., de Laat, C., and Zhao, Z. (2017) Deadline-aware Deployment for Time Critical Applications in Clouds, proceedings of the Euro-Par 2017 Conference in Santiago de Compostela, August 30- September 1, 2017 https://doi.org/10.1007/978-3-319-64203-1_25
More info: Yang Hu, Spiros Koulouzis Zhiming Zhao
Zhiming Zhao <z.zhao=>uva.nl>

72

Optimizing data services using system logs.

Environmental research infrastructures such as Euro-Argo [1], EPOS [2] and LTER [3] provide data services for managing the lifecycle of environmental observation data and platforms for scientific experimentation in a number of different research domains. Services such as subscription, discovery and retrieval are frequently used by researchers, and the performance of such services are crucial for the user experience. The quality of the services is influenced by the volume of the data, the number of concurrent requests, the available resources (e.g. bandwidth) of the infrastructure, etc. One way to optimise service quality is to characterise usage patterns by analysing service logs, and then predicting the best resource allocation for data services in advance.

The goal of this project is therefore to investigate such approach in the context of the Euro-Argo research infrastructure. Euro Argo is an infrastructure for managing ocean observation data and providing data services as part of the global Argo system. In the project, the student will start from existing work [4] but will specifically focus on the issue of service optimisation using system logs. They will:
  1. Review the state of the art of log analytics based service optimisation.
  2. Analyse the logs of Euro-Argo data download services from the past two years (more than 10 GB of text files), and profile data access patterns of the service using a selected method.
  3. Propose optimisation recommendations for Euro-Argo based on the analysis results.
Some background material:
  1. http://www.euro-argo.eu/
  2. https://epos-ip.org/
  3. http://www.lter-europe.net/
  4. Earlier work on data subscription service: https://www.youtube.com/watch?v=PKU_JcmSskw

More info: Spiros Koulouzis, Paul Martin, Zhiming Zhao

Zhiming Zhao <z.zhao=>uva.nl>

73

Profiling Partitioning Mechanisms for Graphs with Different Characteristics.

In computer systems, graph is an important model for describing many things, such as workflows, virtual infrastructures, ontological model etc. Partitioning is an frequently used graph operation in the contexts like parallizing workflow execution, mapping networked infrastructures onto distributed data centers [1], and controlling load balance of resources. However, developing an effective partition solution is often not easy; it is often a complex optimization issue involves constraints like system performance and cost constraints. 

A comprehensive benchmark on graph partitioning mechanisms is helpful to choose a partitioning solver for a specific model. This portfolio can also give advices on how to partition based on the characteristics of the graph. This project aims at benchmarking the existing partition algorithms for graphs with different characteristics, and profiling their applicability for specific type of graphs. 
This project will be conducted in the context of EU SWITCH [2] project. the students will:
  1. Review the state of the art of the graph partitioning algorithms and related tools, such as Chaco, METIS and KaHIP, etc.
  2. Investigate how to define the characteristics of a graph, such as sparse graph, skewed graph, etc. This can also be discussed with different graph models, like planar graph, DAG, hypergraph, etc.
  3. Build a benchmark for different types of graphs with various partitioning mechanisms and find the relationship behind. 
  4. Discuss about how to choose a partitioning mechanism based on the graph characteristics.
Reading material:
  1. Zhou, H., Hu Y., Wang, J., Martin, P., de Laat, C. and Zhao, Z., (2016) Fast and Dynamic Resource Provisioning for Quality Critical Cloud Applications, IEEE International Symposium On Real-time Computing (ISORC) 2016, York UK http://dx.doi.org/10.1109/ISORC.2016.22
  2. SWITCH: www.switchproject.eu

More info: Huan Zhou, Arie Taal, Zhiming Zhao

Zhiming Zhao <z.zhao=>uva.nl>












Presentations-rp2

I hereby would like to invite you to the annual RP2 presentations, where the SNE students will be presenting their research. Considering the wide variety of presentations the day promises to be very interesting and we hope you will join us.
Program (Printer friendly version: HTML, PDF): The event is stretched over two days: Monday-Tuesday July 3-4, 2018.
Monday July 3, 2017, Auditorium C0.110, FNWI, Sciencepark 904, Amsterdam.
Time D #RP Title Name(s) LOC
RP #stds
13h00

Welcome, introduction. Cees de Laat


13h00 20





13h20 20





13h40 25





14h05 25
bio break



14h30 25





14h55 25





15h20 20





15h40 20

break



16h00 20





16h20 20





16h40

*
End



Tuesday July 4, 2017, Auditorium C0.110, FNWI, Sciencepark 904, Amsterdam.
Time D #RP Title Name(s) LOC
RP #stds
13h00

Welcome, introduction. Cees de Laat


13h00 20





13h20 20





13h40 20





14h00 20
bio break



14h20 20





14h40 20





15h00 20





15h20 20
break



15h40 20





16h00 25





16h25

*
End



Presentations-rp1

Program (Printer friendly version: HTML, PDF) : Monday feb 6th 2018, 13h00 - 1700 in B.1.23 at Science Park 904 NL-1098XH Amsterdam.
(all presentations are 20 minutes for single and 25 minutes for pairs of students.)

Time D #RP Title Name(s) LOC
RP #stds
13h00

Welcome, introduction. Cees de Laat


13h05






13h30






13h55






14h20 20
bio break



14h40 25





15h05 25





15h30 25





15h55 15
break



16h10 25





16h35 25





17h00

*
End



Tuesday feb 7th 2018, 11h00 - 15h05 in room B1.23
at Science Park 904 NL-1098XH Amsterdam.
Program:
Time D #RP Title Name(s) LOC RP #stds
11h00

Welcome, introduction. Cees de Laat


11h05 25





11h30 25





11h55

Lunch



13h00 20





13h20 20





13h40 25





14h05 15
bio break



14h20 20





14h40 25





15h05

*
End



Out of normal schedule presentations:
Room B1.23
at Science Park 904 NL-1098XH Amsterdam.
Program:
Date Time Place D #RP Title Name(s) LOC RP #stds


B1.23 20







B1.23 20







B1.23 20










*
End