I am a consulting software engineer based in San Francisco. I have also been an architect and manager, a data scientist, and a researcher (my Ph.D. is in theoretical computer science). This site shows some of my open-source, independent, and academic projects. It also serves as personal archive.
See my LinkedIn profile if interested in my resume.
GitHub LinkedIn Facebook Twitter Email
The following two projects I have worked on as part of my daytime jobs. For other smaller-scale projects, please see my GitHub repositories.
CloudKeeper is a domain-specific language and runtime system for implementing and running dataflows on the Java Virtual Machine. Designed to facilitate “programming in the large”, CloudKeeper abstracts away concerns such as data transfer, serialization, scheduling, checkpointing, and package/dependency management. It is written in Java 8 and uses Akka Actors for the distributed runtime system.
I originally designed and implemented CloudKeeper while at Lifecode, a molecular-diagnostics startup that helped oncologists make treatment decisions for cancer patients. CloudKeeper was the core infrastructure piece at Lifecode, used by computational biologists to write genome-analysis workflows in a high-level, modular, and portable fashion.
CloudKeeper has been open source since January 2016. Currently, work on the project is geared towards the first public release, version 2.0. See the web site for more information, including slide decks and examples (also note manuscript S16 below, which gives a formal description of CloudKeeper’s check-pointing algorithm).
MADlib is an open-source library for running machine-learning algorithms at scale within relational databases such as PostgreSQL and Greenplum. I co-started the project after joining Greenplum in 2010 (now Pivotal) and was the MADlib lead engineer. MADlib is written in C, C++, Python, and SQL.
One of my main contributions to MADlib included designing and writing a C++ layer that eases the burden of writing high-performance user-defined function (UDFs) and that encapsulates DBMS-specific logic inside the abstraction layer. In brief, the MADlib C++ abstraction provides three classes of functionality: type bridging, resource management shims, and math library integration. See the MADlib design document (publication M12 below) and chapters 3 and 4 in “The MADlib Analytics Library” (publication HRSW+12 below) for more information.
Using the abstraction layer, I implemented core algorithms such as linear/logistic regression, k-means, random sampling, statistical tests, etc. I also owned other essential parts of the project, including the build system, continuous integration, release management, and documentation (using a custom SQL parser for Doxygen, which I wrote in flex and bison). In order to formally document the machine-learning algorithms implemented in MADlib, I started the design document (manuscript M12 below).
After leaving Greenplum, I no longer had time to contribute much to MADlib. Luckily, the project is in good shape and is currently in the Apache Incubator.
Apache Incubator Site GitHub (official) GitHub (my old fork)
The Latin-German dictionary is indispensable when searching for words and when translating Latin writings into German. It allows to instantly look up entries in all conjugated/declined forms, in order to obtain the base form and its translation. In total, the software can recognize almost one million distinct word forms stemming (pun intended) from over 11,000 dictionary entries. For each base form, it can also show overviews of all inflected forms.
I started developing the Latin-Germany while in 10th grade in high school and released the first version in 1998. Over the years, the Latin-German dictionary found more than 2,000 paying customers. Today, it is available as freeware.
I implemented the Latin-German dictionary in C++, using initially the PowerPlant framework for the Mac version (those glorious 1990’s!) and the Microsoft Foundation Classes for the Windows version. After the advent of Mac OS X, I rewrote the user interface code of the Mac version in Objective-C.
I developed an inventory control system called “WinCOWAS” (German for “computergestütztes Warenwirtschaftssystem”), tailored and simplified for educational use. From 1999 to the mid-2000’s, I sold this software to vocational schools. It was later acquired by the Europa-Lehrmittel Verlag and became part of a text book (when following the link, see the table of contents). I also developed a likewise simplified accounting software for education purposes, called “WinCOBUS”. I discontinued distribution and development of this software.
RS15 | Tim Roughgarden and Florian Schoppmann: Local Smoothness and the Price of Anarchy in Atomic Splittable Congestion Games. In: Journal of Economic Theory. Volume 156, March 2015. Pages 317–342. Supersedes conference version (RS11). Article DOI |
HRSW+12 | Joe Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar: The MADlib Analytics Library (or MAD Skills the SQL). In: Proceedings of the 38th Very Large Data Bases Endowment (VLDB’12). My part of the VLDB talk were slides 10–23 (corresponding to chapters 3 and 4 in the paper). Article Slides DOI |
ADGMS11 | Sebastian Aland, Dominic Dumrauf, Martin Gairing, Burkhard Monien, and Florian Schoppmann: Exact Price of Anarchy for Polynomial Congestion Games. In: SIAM Journal on Computing. Volume 40, Issue 5, 15 September 2011. Pages 1211–1233. Supersedes conference version (ADGMS06). Article DOI |
RS11 | Tim Roughgarden and Florian Schoppmann: Local Smoothness and the Price of Anarchy in Atomic Splittable Congestion Games. In: Proceedings of the 22th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’11). Superseded by extended version (RS15). Article Slides DOI |
PSSW09 | Paolo Penna, Florian Schoppmann, Riccardo Silvestri, and Peter Widmayer: Pseudonyms in cost-sharing games. In: Proceedings of the 5th International Workshop on Internet and Network Economics (WINE’09). Article Slides DOI |
S08 | Florian Schoppmann: The Power of Small Coalitions in Cost Sharing. In: Proceedings of the 4th International Workshop on Internet and Network Economics (WINE’08). Article Slides DOI |
MMPS08 | Marios Mavronicolas, Burkhard Monien, Vicky G. Papadopoulou, and Florian Schoppmann: Voronoi Games on Cycle Graphs. In: Processings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS’08). Article Slides DOI |
BS08b | Yvonne Bleischwitz and Florian Schoppmann: Group-Strategyproof Cost Sharing for Metric Fault Tolerant Facility Location. In: Proceedings of the 1st International Symposium on Algorithmic Game Theory (SAGT’08). Article DOI |
BS08a | Yvonne Bleischwitz and Florian Schoppmann: New Efficiency Results for Makespan Cost Sharing. In: Information Processing Letters. Volume 107, Issue 2, 16 July 2008. Pages 64–70. Elsevier B.V. Article DOI |
BMS07 | Yvonne Bleischwitz, Burkhard Monien, and Florian Schoppmann: To be or not to be (served). In: Proceedings of the 3rd International Workshop On Internet and Network Economics (WINE’07). Superseded by extended manuscript (BMS10). Article DOI |
GS07 | Martin Gairing and Florian Schoppmann: Total Latency in Singleton Congestion Games. In: Proceedings of the 3rd International Workshop On Internet And Network Economics (WINE’07). Article Slides DOI |
BMST07 | Yvonne Bleischwitz, Burkhard Monien, Florian Schoppmann, and Karsten Tiemann: The Power of Two Prices: Beyond Cross-Monotonicity. In: Proceedings of the 32nd International Symposium on Mathematical Foundation of Computer Science (MFCS’07). Superseded by extended manuscript (BMST09). Article Slides DOI |
MMST06 | Vladimir Mazalov, Burkhard Monien, Florian Schoppmann, and Karsten Tiemann: Wardrop Equilibria and Price of Stability for Bottleneck Games with Splittable Traffic. In: Proceedings of the 2nd Workshop on Internet and Network Economics (WINE’06). Article Slides DOI |
ADGMS06 | Sebastian Aland, Dominic Dumrauf, Martin Gairing, Burkhard Monien, and Florian Schoppmann: Exact Price of Anarchy for Polynomial Congestion Games. In: Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS’06). Superseded by extended version (ADGMS11). Article DOI |
The following articles contain original and novel work but are so far unpublished in this form.
S16 | Florian Schoppmann: The CloudKeeper dataflow language and runtime system, January 2016. This document (currently) gives a description of the interpreter component, including a formal proof of the check-pointing algorithm. To be extended into a full article about all aspects of CloudKeeper. Document |
M12 | The MADlib project (multiple authors): MADlib Design Document, my contributions from 2012. The design document gives technical details about architecture decisions as well as mathematical details about implemented machine-learning algorithms. While leading the MADlib project (see above), I started the design document in order to ensure formal correctness in mathematical code. Document |
S10 | Florian Schoppmann: Generalizing the Smoothness framework, June 2010. The purpose of this note is to further generalize the smoothness framework introduced in “Intrinsic robustness of the price of anarchy” (Roughgarden, 2009), in order to provide a unified proof for many existing price-of-anarchy results. Document |
BMS10 | Yvonne Bleischwitz, Burkhard Monien, and Florian Schoppmann: To Be or Not to Be (Served): Cost-Sharing Without Indifferences, March 2010. Supersedes conference version (BMS07). Roughly equivalent to chapter 4 of my Ph.D. thesis (S09). Document |
BMST09 | Yvonne Bleischwitz, Burkhard Monien, Florian Schoppmann, and Karsten Tiemann: The Power of Lexicographic Maximization: Beyond Cross-Monotonicity, August 2009. Supersedes conference version (BMST07). Roughly equivalent to chapter 3 of my Ph.D. thesis (S09). Document |
The following notes and slides provide introductions, summaries, simplified proofs, and opinions.
Dynamic Programming and Sequence Alignment, March 2015.
I was invited to give a talk at Lowell High School, San Francisco, and decided to give students an idea of a university-level computer-science class, real-world applications of algorithms, and career paths in computer science.
SlidesParallel k-Means in Theory and Practice, August 2012.
I gave this informal talk to coworkers at Greenplum to explain how I had implemented k-Means in the MADlib open-source library and how this implementation could be improved. My personal notes go beyond that and summarize the key results from the papers “k-means++: The Advantages of Careful Seeding” (Arthur and Vassilvitskii, 2007) and “Scalable K-Means++” (Bahmani et al., 2012).
Notes SlidesRandom Sampling, August 2010.
I gave this talk during a job interview at Greenplum after being told that random sampling in the Greenplum database required workarounds that were not performing up to the expectation.
Notes SlidesDifferential Privacy, March 2010.
I presented “Differential Privacy: A Survey of Results” (Dwork, 2008) in the Stanford Theory reading group.
NotesThe Multiplicative Weights Update Method, February 2010.
I presented (a preliminary version of) the paper “The Multiplicative Weights Update Method: A Meta Algorithm and its Applications” (Arora et al., 2012) in the Stanford Theory group.
NotesUniqueness of Nash equilibria in atomic splittable congestion games, September 2009
These notes give a more explicit proof of the uniqueness result in the paper “Competitive routing in networks with polynomial costs” by Altman et al. (2002).
NotesCost Sharing (a small and biased summary of results), June 2009
These notes summarize some results in the area of mechanism design and cost sharing, the topic of my Ph.D. thesis.
NotesPresent-day information and communication technologies crucially rely on large-scale networks that are created and maintained by a huge number of autonomous players with heterogeneous goals. Evidently, this includes the Internet as the most prominent example, but no less also peer-to-peer, logistics, and social networks. Research on (distributed) algorithms has responded to the rapid emergence of these new decentralised networks by a corresponding paradigm shift: Starting shortly before the new millennium, computer scientists have been increasingly focusing on optimization in networks where the self-interested players’ behaviour cannot be directly controlled (Nisan and Ronen, 2001; Koutsoupias and Papadimitriou, 2009; Papadimitriou, 2001). The lack of central coordination ensuing from players’ self-interested behavior is instead regarded as an inevitable constraint, similar to the lack of computing power when devising approximation algorithms or the lack of information about the future when designing online algorithms.
Located at the intersection of computer science with mathematics and economics, this field of research is known today as algorithmic game theory. Bringing together algorithms and game theory has given rise to several important questions that were not rigorously considered before: Can the loss due to selfish behaviour be quantified? In a more catchy term: What is the “price of anarchy”? What is the complexity to compute stable states (“equilibria”) or approximations of them? How should new systems be designed in order to give players an incentive to act in a particular fashion? How can the “right” incentives be reconciled with computational tractability? The following sections briefly outline three topics from my research in algorithmic game theory.
In a cost-sharing problem, finitely many players have an unknown valuation for some non-rivalrous but excludable service (e.g., network connectivity). The challenge is to design mechanisms that elicit truthful reports of the players’ valuations, determine which set of players Q to serve, and decide how to distribute the incurred service cost C(Q). Providing players with an incentive to reveal truthful information is key, because it removes the possibility to game the system. Further constraints for cost-sharing problems include budget balance (i.e., recovery of the service cost with the prices charged) and economic efficiency (i.e., a reasonable trade-off between the service cost and the excluded players’ valuations). Practical applications moreover require that the outcome of cost-sharing mechanisms be computable in polynomial time.
Cost-sharing problems are fundamental in economics and have a broad area of applications; e.g., distributing volume discounts in electronic commerce, sharing the cost of public infrastructure projects, allocating development costs of low-volume built-to-order products, etc. Despite this fundamental nature, general techniques for solving cost-sharing problems used to be rare. When requiring group-strategyproofness – meaning that not even coordinated manipulation must ever be profitable – essentially only one technique was known, due to Moulin (1999). Unfortunately, there are several natural cost-sharing problems for which any Moulin mechanism inevitably suffers poor budget balance and economic efficiency (Immorlica et al., 2008).
My research in this area therefore focused on finding alternative general techniques that perform better than Moulin mechanisms (BMST07, BMS07), on characterising the inherent limitations of group-strategyproofness with respect to the other desirable properties of cost-sharing mechanisms (BMST07, BS08b, S08), on investigating new incentive-compatibility desiderata that arise particularly in large anonymous settings (PSSW09), and on extending the cost-sharing model to a more general fault-tolerant setting where players may have demand for multiple levels of service (BS08b).
So-called congestion games play a central role for modeling self-interested resource usage. There is a ground set of resources, and each player selects a subset of them. For concreteness, we assume here that resources are edges in a network, and each player needs to route traffic from some source to a destination node. A natural behavioral assumption is that each player simply chooses the fastest route when taking into account the current network congestion – while being oblivious to the effect her own action has on the system. In fact, similar models have already been considered in the 1950’s in the context of road traffic. Today, selfish routing models gain further motivation from the necessity of efficient routing protocols in the Internet, in logistics networks, etc.
My research in this area primarily focused on quantifying the loss due to selfishness. In two works (ADGMS06, GS07), we considered a model with players who each have a non-negligible effect on the system. For the case that the travel time over each edge is described by a polynomial of the total traffic on this edge, we gave exact values for the price of anarchy. In a different work (MMST06), we considered a selfish routing scenario where not the total travel time (e.g., of data packets) is an issue but much more the throughput. The latter is clearly a question of the bottleneck, i.e., the most congested edge on a path. We gave results on the minimum loss any equilibrium inevitably incurs, together with a characterisation of special network topologies (series-parallel graphs).
We also precisely determined the price of anarchy for the case when players can arbitrarily split their traffic over several routes (RS11) – a problem that had been open for several years (see, e.g., “Algorithmic Game Theory”, chapter 18, 2007).
Competitive location games provide simple scenarios of competitive sellers: The market is modeled as some metric measurable space, and a seller’s market share is assumed to be proportional to the measure of the points that are closer to it than to any other seller. How will the sellers behave in order to maximise their market shares? There is a huge body of literature on location theory (Eiselt et al., 1993), with contributions also from operations researchers, geographers, political scientist etc. However, articles dealing with discrete location models – such as graphs – are rare.
Motivated by applications in computer networks, my research on competitive location made some first steps towards understanding discrete models: We considered competitive location games on cycle graphs and give a comprehensive examination of their equilibria. This then allowed us to also determine the exact price of anarchy (MMPS08).
From 2006–2009, I was a Ph.D. student at the International Graduate School Dynamic Intelligent Systems. Before that, from 2001–2005, I studied computer science, with mathematics as minor subject, at the University of Paderborn.
S09 | Collusion-Resistant Cost-Sharing Mechanisms: Design Techniques, Analyses, Trade-Offs. Ph.D. thesis. University of Paderborn, Germany, June 2009. Thesis Slides URN:NBN |
S05b | Price of Anarchy for Congestion Games with Polynomial Latency Functions. Diploma thesis (Diplomarbeit). University of Paderborn, Germany, December 2005. I was awarded the prize of the computer-science department for the best Diplom (first among 76 graduates during the academic year 10/2005–09/2006). Thesis Slides |
S05a | Online Occlusion Culling. Bachelor’s thesis. University of Paderborn, Germany, January 2005. Thesis Slides |
Included are many of my summaries, term papers, and slide decks for talks. The computer-science department at the University of Paderborn distinguished five areas of study. Undergraduate/Diplom classes are tagged accordingly.
Please note: While the summaries are for specific classes, and while they may even incorporate official lecture material, they are otherwise entirely my own work and have not been reviewed. Beware of mistakes. Most of the documents below are in German.
Methodologies of Scientific Work, winter semester 2006/2007, Prof. Dr.-Ing. Jörg Wallaschek
Our talk was titled “Organization and Management of Research and Development Projects”.
SlidesScientific Computation: Data Science and Classification, summer semester 2007, Dr. Robert Preis
My talk was titled “Analysis of Music data”.
SlidesProjectgruppe SEROSE, winter and summer semesters 2004/2005, Dr. Rainer Feldmann and Yvonne Bleischwitz
This was a year-long project with 8 students working on “Selfish Routing in Sokoban Environments”. The goal was to use theoretical work for practical software engineering, creating a system that would: assign transport tasks to players bidding for contracts, compute both macro- and microeconomically optimal solutions, and have a graphical-user interface with convenience features such as generating test instances.
The summary below is for the papers “Bidding and allocation in combinatorial auctions” (Nisan, 2000) and “Selfish Routing in Capacitated Networks” (Correa et al., 2004). The term paper and the slides are about the latter of the two papers.
The project also included a talk series, in which I gave a talk about the latter of the previous papers.
Summary Term Paper SlidesApproximationsalgorithmen, summer semester 2005, Dr. Ulf-Peter Schröder
Topics included graph coloring, non-approximability results, algorithms with relative approximation guarantees, approximation schemes, pseudo-polynomial-time algorithms, complexity classes and hierarchy, randomized algorithms, semidefinite programming.
SummaryNicht-kooperative Netze, summer semester 2005, Prof. Dr. Burkhard Monien
Introduction to non-cooperative game theory, congestion games, self-interested load balancing, computation and computational complexity of Nash equilibria, the Wardrop model.
Our solution (together with Sebastian Aland and Dominic Dumrauf) to a problem set in this class was the basis for publication ADGMS06.
SummarySeminar Real-Time Operating Systems, summer semester 2005, Prof. Dr. Franz Rammig and Marcelo Götz
This talk series was about real-time operating systems. We gave a talk about Enea OSE.
Term Paper SlidesStochastik I, winter semester 2004/2005, Prof. Dr. Björn Schmalfuß
Topics included set systems, algebras, measure theory with Lebesgue integration and product spaces, probability theory, limit theorems, random-number generation, stochastic processes, Black–Scholes, and statistical tests.
SummarySeminar Zahlen: Berechnung von π, winter semester 2004/2005, Prof. Dr. Joachim von zur Gathen and Dr. Michael Nüsken
This talk series was about the computation of π. I gave a talk about the paper “A Spigot Algorithm for the Digits of π” (Rabinowitz and Wagon, 1995).
Term Paper SlidesSeminar Perspektiven der Software-Ergonomie, summer semester 2004, Prof. Dr. Keil-Slawik and Joanna Slawik.
This talk series was about software ergonomics. We gave a talk about usability in the software development lifecycle.
Term Paper SlidesComputergrafik II, summer semester 2004, Prof. Dr. Gitta Domik
Topics included: Mathematical modelling, splines, lighting, animation, morphing, texture mapping, radiosity, ray tracing, volume rendering.
SummaryDigitale Bildverarbeitung, summer semester 2004, Prof. Dr. Gitta Domik
Topics included, among others: Color schemes, rasterization, Fourier Transform, filters, image restauration, compression.
SummaryAlgebra I, summer semester 2004, Prof. Dr. Henning Krause
Introduction to abstract algebra, from group theory over Galois theory to proofs of the impossibility of ruler-and-compass constructions.
Summary Overview of DependenciesOnline Algorithmen, winter semester 2003/2004, Jun. Prof. Dr. Christian Sohler
Online algorithms and formal proofs establishing their competitive ratios. Amortized analysis. Examples included list access, paging, load balancing, and self-adjusting search trees.
SummaryProgramming Languages and Compilers, winter semester 2003/2004, Prof. Dr. Uwe Kastens
Compiler structure, tokenization, parsing, hierarchy of grammars and their “parsability”, attribute grammars, name binding, type resolution.
SummaryKryptographie I / II, winter semester 2003/2004, Prof. Dr. Johannes Blömer
Number theory basics, ciphers, AES, asymmetric cryptography, Miller-Rabin test, Rabin cryptosystem, ElGamal encryption, and others. Morever, digital signatures, cryptographic hash functions, message authentication codes, undeniable signatures, identification protocols, and semantic security.
SummaryBetriebssysteme, winter semester 2003/2004, Prof. Dr. Odej Kao
Operating systems: History, processes, threads, scheduling, memory management, I/O, drivers, file systems, security. Term project was modifying Minix.
SummaryComputergrafik I, winter semester 2003/2004, Prof. Dr. Gitta Domik
Introduction to computer graphics: from geometry and linear algebra to OpenGL, occlusion culling, and rasterization.
SummaryNumerik I, winter semester 2003/2004, Prof. Dr. Sönke Hansen
Introduction to numerical analysis, from floating point (IEEE-754) over matrix decompositions and numerical solutions for systems of equations to numeric integration and splines.
SummaryKonzepte und Methoden der Systemsoftware, summer semester 2003, Prof. Dr. Odej Kao
Computer architecture, system software, scheduling, synchronization, resource management, and inter-process communication.
SummaryBerechenbarkeit und formale Sprachen, winter and summer semesters 2002/2003, Prof. Dr. Johannes Blömer
Models of computation, formal languages, Chomsky hierarchy, pumping lemma. Moreover, complexity classes, non-determinism, and introduction to approximation algorithms.
SummaryStochastik für Informatiker, winter semester 2002/2003, Prof. Dr. Hans M. Dietz
Introductory probability theory: Combinatorics, algebras, measure spaces, distributions, random variables, limit theorems, stochastic processes, etc.
SummaryTechniken des Softwareentwurfs II, winter semester 2002/2003, Dr. Reiko Heckel
Software development as an engineering discipline, part II: data models, the relational model and relational algebra, normal forms, database management systems
SummaryTechniken des Softwareentwurfs I, winter semester 2002/2003, Dr. Reiko Heckel
Software development as an engineering discipline, part I: functional specification, architecture, design, organizational planning. UML.
SummaryAnalysis I / II, winter and summer semesters 2002/2003, Prof. Dr. Sönke Hansen.
Introduction to mathematical analysis, with an axiomatic deduction of basic topics including: complex numbers, compactness, completeness, the Bolzano–Weierstrass theorem, series, continuity, Taylor series, multi-dimensional differentiation and integration, extrema with side constraints.
Summary (only Analysis II) Overview of DependenciesDatenstrukturen und Algorithmen, summer semester 2002, Prof. Dr. Burkhard Monien
Big O notation and master theorem, fundamental algorithms and data structures, algorithm design patterns.
SummaryGrundlagen der Programmiersprachen, summer semester 2002, Prof. Dr. Uwe Kastens
Foundations of programming languages, touching on different programming paradigms (imperative, functional, logic) as well as basic responsibilities of compilers.
SummarySoftwareentwicklung, winter and summer semesters 2001/2002, Prof. Dr. Gerd Szwillus.
Introduction to Java as well as fundamental aspects of imperative and object-oriented programming.
SummaryPraxis der Systemgestaltung, winter semester 2001/2002, Prof. Dr. Johannes Magenheim.
Practical aspects of engineering, such as the project dvelopment lifecycle, human-computer interaction, ergonomics, privacy laws, etc.
SummaryLineare Algebra I / II, winter and summer semesters 2001/2002, Dr. Christian Nelius.
Introduction to linear algebra, from vector spaces to the fundamental theorem of algebra and unitary spaces.
Summary Overview of Dependencies