Meego Wiki
Views

Build Infrastructure

From MeeGo wiki
Revision as of 11:39, 18 February 2010 by Tjyrinki (Talk | contribs)
Jump to: navigation, search

Contents

MeeGo Build Infrastructure

Overview

MeeGo uses the openSUSE Build System (OBS) which provides a complete distribution development platform providing the infrastructure required for development of MeeGo releases. Moblin started using OBS in June 2008 and released Moblin 2.0 and Moblin 2.1 and many other derivative releases out of OBS which has proven to be a very reliable infrastructure that is fast evolving and rich with features supporting every single aspect of the process required for distribution building and maintenance.

MeeGo is using OBS 1.7 which is the latest version released early Feb. 2010.

The roadmap for OBS can be found here.

Some of the the interesting things on the roadmap:

  • Project source linking
  • User group handling as stable feature
  • Complete patchinfo support may be a candidate
  • Build Dependency Cycles are visible via the api
  • Integrated QA support to run all kinds of additiional testsuites.

More details soon....

Missing Features

With all the great features OBS provides, many features that we wish to have now are either on the long term roadmap or specific to our needs or not seen as high priority. Some of the things we wish to see implemented in OBS are:

  • Proxy Support with inter-obs communication (http, https)
  • ACL Support and support for hidden projects/packages and their resulting binaries and build information
  • Better user/group/role management interface. Probably an overhaul of current user management system (active rbac)
  • LDAP backend support for user/group management
  • Linking of projects with smaller set of packages (for updates)
  • Better build job allocation and improvement of the dispatcher

Some details about the wish list above:

Proxy Support

OBS offers a very nice feature allowing different instances to be linked. This works very nicely if you are not behind a firewall. When proxy is needed, various workarounds are necessary to create a usable link. While experimenting with adding proxy support we found out it is more complicated than just changing the code and make it point to a proxy. It seems that OBS requires certain headers and features that are not supported by proxies such as Squid.

ACL

A proposal exists to add ACL support at http://en.opensuse.org/Build_Service/Concepts/AccessControl.

Better User Management and LDAP support

Since Novell started using iChain the user management back-end is not maintained or improved. At the moment it is not even possible to change passwords and the interface for managing users is lacking. OBS has two authentication methids:

  • iChain
  • Built in user management (active rbac)

Session management and authentication/authorization code is spread all over the place in the API and needs to be cleaned up before it can be actually replaced with something decent that would also support LDAP and other authentication back-ends. Also we need to figure a way to realize the webui <-> api interface, since all authentication goes over the API and is implemented there (in the API).

Looking for a replacement for current user management system, AuthLogic seems to have all the ingredients we need to implement this and add LDAP support at the same time. iChain probably can be implemented as an AuthLogic add-on to make integration easier and to get this accepted upstream.

To summarize, we need an authentication system that can at least support the following 3 scenarios:

  • LDAP - Implemented on 17 Feb 2010 and being backported to 1.7
  • Database (needed by all, but also with the passwords)
  • iChain (that needs to be done with some coordination with the OBS team, since the internals are probably not publicly available).

In addition to that, the above needs to support "Role Based Access Control" (RBAC) which is currently available in obs and being used with active_rbac and would also integrate with the ACL proposal above.

Improving Job Allocation

Looking at the dispatcher code in OBS, the dispatcher uses a very basic shuffling algorithm to distribute jobs among workers in the build system. This might be a good approach for specific OBS configurations (maybe the opensuse.org setup), but observing dispatcher behavior over time shows that it can be improved, especially for small instances. It is often annoying to see the kernel being dispatched from different project into the same host while all other workers are being idle. There are a few packages in MeeGo that needs lots of time to build (kernel, Qt, WebKit, Browsers..) and stacking them into one build host often takes very long time. The dispatcher does not take into account the current load status of the hosts or what is being built on those hosts. Typically every build host has 8 workers depending on the memory and cores available.

A few ideas on how this can be improved:

  • Use a different job distribution algorithm based on load status and other information collected from previous builds
  • Maintain an up to date repository about the status of hosts using agents that report about current load and other metrics and make this data available to the dispatcher. For example, it would help to know current utilization of the hosts and the busy status of the workers before dispatching a new job to the same host. We need to check if there are other hosts with less utilization.
  • If a host has 8 worker instances and 4 are busy and there is another host with only 2/8 busy, then the dispatcher should prefer the latter.
  • Avoid building the same package (from different projects) on the same host. Always try to find another hosts with less utilization first that is not building the same package.
  • Since most of the build time for the majority of the packages is spent on creating the build environment, it would help if we can collect data about where the actual time is being spent
    • Creation of Build Environment
    • Compilation
    • Testing
    • Creating the package binaries..

An easier approach would be to use a round-robin mechanism assisted by information about load of hosts to skip busy hosts and try to find a less loaded host. All of the above requires a very robust metrics collection system with agents running on all hosts that deliver timely information about load, memory and other system data that might be useful to make a decision where to dispatch the next build job.

Personal tools