Summary

Research and development of systems that involve natural language understanding and knowledge. Previously, worked on systems software, including research on operating systems, computer architecture, distributed systems and cloud computing.

Currently, I am a Software Engineer at Google Research, where I work with the Google Research Language team. Previously, I was a Research Staff Member at IBM Watson Group as well as at IBM Research. I worked on the Watson Concept Insights cloud service, having co-founded the research project and later serving as lead software engineer and system architect.
I received a Ph.D. in Computer Engineering at the University of Toronto in 2012, advised by Prof. Michael Stumm. For my Ph.D., I invented and developed mechanisms to improve the performance of network and I/O applications, operating systems and run-time systems.

Projects

Natural Language Understanding at Google Research (2016 - present)

Working on a variety of projects related to Natural Language Understanding at Google Research (NYC). In conjunction with Google Cloud teams, extended language support for syntatic and entity analysis from three initial languages to eleven. Also worked internally to improve quality of entity recognition, entity typing and entity linking. More recently, I have been working on information extraction and knowledge graphs, specifically learning how to generate neural representations of information, directly from text, with minimal or no direct supervision from existing knowledge graphs.

Relevant publications [17], [18], [19], [20], [21] and [22].

Watson Concept Insights (2013-2016)

System architect, lead software engineer and co-founder of “Watson Concept Insights” cloud service. Took the project from a research idea all the way to production. Worked on all aspects of research and development including algorithm and data research, system architecture and development, API design, pricing, dashboard UI, testing, and devops.
Resources:
  • System architect, lead engineer and co-founder of "Watson Concept Insights" cloud service.
  • Took the project from a research idea all the way to production. I was involved in all aspects of the project, including algorithm and data research, system architecture and development, API design, pricing, dashboard UI, testing, and devops.
  • Co-authored over 15 patents in the areas related to cognitive computing, information retrieval and graph analysis
  • Designed and developed REST API sustaining over 500,000 API calls per day in production.
  • Developed micro-service based distributed system with over 10 independent services, spanning 100s of machines.
  • Deployed and managed large installations of storage clusters, including Cassandra, MongoDB, Ceph/Rados Object Store and Redis.
  • Developed novel storage library allowing transparent stacking ("union") of distributed object stores, as well as on demand object retrieval for distributed computational kernels.
  • Significant portion of system code written in Go; some portions of algorithmic kernels in C/C++.
  • Optimized C kernel from over 2 minutes of execution down to under 10 seconds. Optimized conceptual query latencies from over 30 seconds down to under 1 second.
  • Developed monitoring solution of services and machines encompassing metrics, logs and health checks.
  • Service used for ACM’s digital library author recommendation found on papers hosted at dl.acm.org

Service oriented peer-to-peer middleware (2012-2013)

Researched and prototyped novel peer-to-peer distributed system with the goal of offering core services for the construction of service-oriented applications. The main goal of the project was to produce a suite of tools and micro-services to make it easier for developers to write robust, large distributed applications or cloud services. The project brought together a distributed key-value store with dynamic consistency and availability guarantees, a membership/directory service, a topology aware messaging bus, a deployment system based on Linux containers (LXC), and integrated with monitoring.
  • Designed and developed 6 different scale-out peer-to-peer microservices in Go.
  • Designed and developed fully distributed in-memory key-value store with variable consistence and availability guarantees (per-collection). Collections could be configured to use fully consistent (RAFT based consensus), or fully available (peer-to-peer protocol).
  • Designed and developed highly-available distributed DHCP service for dynamically assigning IPs to VMs and containers within a data-center
  • Designed and developed fully distributed (peer-to-peer) membership and directory service for storing cluster and service-level membership information.
  • Designed and developed distributed network monitoring micro-service, allowing clients to query network failures within different regions (machine, rack, zone or data-center)
  • Designed and developed distributed messaging service.
  • Designed and developed distributed deployment system for micro-services using distributed fleet of agents, responsible for launching container or virtual-machine based run-times (using libvirt).

Cloud management resiliency (2012-2013)

Explored resiliency of cloud management systems, focusing on OpenStack. Developed mechanism for monitoring and tracking distributed requests within OpenStack services. This mechanism was used to log specific distributed flows in the presence of faults or crashes. Extended this basic mechanism to introduce artificial faults at specific events, along with automatically validating expected outputs from specific series of requests.

Relevant publications [14] and [15].

FlexSC (exception-less system calls) (2010-2011)

Research in the area of operating systems and computer architecture, focusing on run-time performance. Created a novel operating system interface for traditional monolithic kernels (e.g., Linux), called exception-less system calls, that enables applications to communicate with the operating system via asynchronous messages. Created a new POSIX compatible threading library to support multi-threaded server applications to efficiently use exception-less system calls (e.g., Apache, MySQL and BIND). Created a new event-driven library to support explicitly asynchronous application/OS execution, and ported memcached and nginx to this new library. Demonstrated that exception-less system calls leads to efficient execution on multi-core processors.

Relevant publications: [10] and [11].

Runtime systems using hardware performance counter (2007-2010)

Created tools for exploring peformance of applications, run-time and operating systems using hardware performance counters. The tooling and analysis I worked on allowed our research group to design novel techniques for improving utilization of processor cache. The technique I pioneered was an operating system based system for improving processor cache performance called "software pollute buffer". This system profiles application cache performance at run-time, through hardware performance counters, and dynamically remaps application pages with poor cache utilization, using standard page-coloring. I implemented this prototype within the Linux kernel, using the hardware performance counters from a PowerPC processor.

Relevant publications: [7], [8], [5], [6], [9] and [3].

Publications

  1. Entities as Experts: Sparse Memory Access with Entity Supervision
    Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, Tom Kwiatkowski
    arXiv preprint 2020: https://arxiv.org/abs/2004.07202
  2. Collecting Entailment Data for Pretraining: New Protocols and Negative Results
    Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler
    arXiv preprint 2020: https://arxiv.org/abs/2004.11997
  3. Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking
    Thibault Févry, Nicholas FitzGerald, Livio Baldini Soares, Tom Kwiatkowski
    AKBC 2020 Automated Knowledge Base Construction
  4. Learning Cross-Context Entity Representations from Text
    Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault Févry, David Weiss, Tom Kwiatkowski
    arXiv preprint 2020: https://arxiv.org/abs/2001.03765
  5. Matching the Blanks: Distributional Similarity for Relation Learning
    Livio Baldini Soares, Nicholas Arthur FitzGerald, Jeffrey Ling, Tom Kwiatkowski
    ACL 2019 - The 57th Annual Meeting of the Association for Computational Linguistics (2019)
  6. Learning Entity Representations for Few-Shot Reconstruction of Wikipedia Categories
    Jeffrey Ling, Nicholas FitzGerald, Livio Baldini Soares, David Weiss, Tom Kwiatkowski
    The 2nd Learning from Limited Labeled Data (LLD 2019) Workshop
  7. Watson Concept Insights: A Conceptual Association Framework
    Michele M Franceschini, Livio Soares, Luis A Lastras Montaño
    Proceedings of the 25th International Conference Companion on World Wide Web (Demo WWW 2016)
  8. On fault resilience of OpenStack
    Xiaoen Ju, Livio Soares, Kang G Shin, Kyung Dong Ryu, Dilma Da Silva
    4th ACM Symposium on Cloud Computing (SOCC 2013)
  9. Towards a Fault-Resilient Cloud Management Stack
    Xiaoen Ju, Livio Soares, Kang G. Shin, and Kyung Dong Ryu
    5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2013)
  10. Pointy: a hybrid pointer prefetcher for managed runtime systems
    Ioana Burcea, Livio Soares, Andreas Moshovos
    Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (PACT 2012)
  11. Mind the gap: reconnecting architecture and OS research
    Jeffrey C Mogul, Andrew Baumann, Timothy Roscoe, Livio Soares
    Proceedings of the 13th USENIX conference on Hot topics in Operating Systems (HotOS 2011)
  12. Exception-less System Calls for Event-driven Servers
    Livio Soares, Michael Stumm
    Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC 2011)
  13. FlexSC: Flexible System Call Scheduling with Exception-less System Calls
    Livio Soares, Michael Stumm
    Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation (OSDI 2010)
  14. Enhancing operating system support for multicore processors by using hardware performance monitoring
    Reza Azimi, David K Tam, Livio Soares, Michael Stumm
    SIGOPS Operating Systems Review, 2009
  15. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations
    David K Tam, Reza Azimi, Livio Soares, Michael Stumm
    Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS 2009)
  16. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer
    Livio Soares, David Tam, Michael Stumm
    Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2008)
  17. Experiences understanding performance in a commercial scale-out environment
    Robert W Wisniewski, Reza Azimi, Mathieu Desnoyers, Maged M Michael, Jose Moreira, Doron Shiloach, Livio Soares
    Euro-Par 2007 Parallel Processing (EuroPar 2007)
  18. Managing shared L2 caches on multicore systems in software
    David Tam, Reza Azimi, Livio Soares, Michael Stumm
    Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA 2007)
  19. Experience distributing objects in an SMMP OS
    Jonathan Appavoo, Dilma Da Silva, Orran Krieger, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, Amos Waterland, Robert W Wisniewski, Jimi Xenidis, Michael Stumm, Livio Soares
    ACM Transactions on Computer Systems (TOCS) 25(3), 6, ACM, 2007
  20. PATH: Page Access Tracking to Improve Memory Management
    Reza Azimi, Livio Soares, Michael Stumm, Thomas Walsh, Angela Demke Brown
    Proceedings of the 6th International Symposium on Memory Management (ISMM), pp. 31--42, ACM, 2007
  21. KFS: Exploring Flexibility in File System Design
    Dilma M da Silva, Livio B Soares, Orran Krieger
    Brazilian Workshop on Operating Systems (WSO'2004)
  22. Meta-data Snapshotting: A Simple Mechanism for File System Consistency
    Livio Soares, Orran Y Krieger, Dilma Da Silva
    Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2003)
space left blank.