Camera subsystem overview for i.MX Gingerbread

Published on April 15, 2011

Archived Notice

This article has been archived and may contain broken links, photos and out-of-date information. If you have any questions, please Contact Us.

That's kind of how I felt as a naive software intern thumbing through the vast amount of information in the Android source code. There's a lot to take in: Android covers an enormous set of use cases, and it follows a secure and extensible model that requires function calls to dive through layer after layer of abstraction to reach the bottom. Perhaps not so complicated as some heavyweight operating systems that we know of, but the user-focused nature of Android demands that the experience be seamless, so it's important for engineers involved in porting to a new device to understand how it all fits together.

There's documentation, of course, but most of Google's material is focused on the applications-level developer. Whether or not you're interested in messing with the lower layers, it helps to have an understanding of what happens beneath the surface of your application.

In this post, we're going to try and provide a brief overview of the Android 2.3 (aka “Gingerbread”) camera subsystem on the i.MX51, which we're in the process of configuring for our Nitrogen boards. If you don't already have it, get it here.

First, an architectural diagram:

It's color-coded by origin, by the way. Blue is Boundary Devices (either software we've written or pieces of the Nitrogen board itself), orange is Freescale, yellow is Linux, green is Google, and purple is alsa.

Anyway, now you have some idea of how it's all laid out from a bird's eye view. Next we'll go over the pieces.

Application layer

The heart of any camera app is the Camera class. This is an abstraction for the Camera device itself and everything that sits between it and the Android API. It provides methods to set parameters, auto-focus, and, of course, take a picture. When you initialize a Camera object, you'll pass a SurfaceView that represents the area of the UI that will contain the camera preview image. Then you're ready to take pictures with the takePicture() method. Pretty self-explanatory!

To make your app record video, you'll need a MediaRecorder object as well. This hooks into the media framework (which we'll discuss below) and abstracts the task of recording audio and video. It expects a Camera object and a file descriptor for the output file, as well as a number of parameters, but again, using it is very simple. When an application developer needs to use the Camera, he grabs a Camera object; when he needs to record something, he grabs a MediaRecorder object, and that's that. Works out well for an independent app developer working on his own time.

We'll show some examples on how to use these classes in the next post. What we're really interested right now is what's underneath. For brevity's sake, we should mention that source files in the following sections (until we get to Freescale proprietary components) are listed are under frameworks/base/, unless otherwise stated. Also, if we link to the implementation but not the header, you can assume that the header is somewhere under include/.

JNI layer

The Java Native Interface (JNI) is what allows Android classes to use the native C++ libraries from within the Dalvik virtual machine. Look in frameworks/base/core/jni/ or frameworks/base/media/jni/ and you'll see C++ implementation files corresponding to many of the Android-specific Java classes – for example, android_hardware_camera.cpp. These contain methods for passing messages between their Java counterparts running inside a Dalvik virtual machine and a native C++ implementation of that class.

Native layer glue classes

Okay, here's where things get really interesting. Below the JNI layer, there is a complex set of processes that does the dirty work of the Android operating system as a whole. You can actually see these processes if you open up an adb shell and run ps to find the list of running processes.

Binder interface

The way these processes communicate is through the Binder system. The binder system was designed by Google as a custom inter-process communication system for Android. With adbshell, you can see that there is a binder driver under /dev/. This driver is sort of like a mailbox that every process hooks into, and is responsible for passing messages between service providers and service clients.

You will frequently see objects in the native libraries with names like ICamera, ICameraService, IMediaRecorder, and so on (look here for the ICamera example). These are objects that implement the Binder interfaces and thus represent proxy objects that marshal data across process boundaries. Each header/implementation pair usually contains an interface class, IObject as well as a BpObject class and a BnObject class. Bp stands for binder proxy, the class that sits in the application process, and Bn stands for binder native, the class that sits in the remote process (such as the MediaServer or SurfaceFlinger) that is basically invisible to the end user. Binder proxy objects call a transact() method which passes messages to the binder native object, which handles them with an onTransact() callback. transact() never returns until onTransact() returns, so they are synchronous.

Native proxies

The Camera and MediaRecorder objects, as well as the Surface object that is held by the SurfaceHolder you may have passed to the Camera object, have native proxy objects with the same name sitting below the JNI layer; for example, the Camera object. One of these will be created for each Java object you instantiate on the top layer. They implement the native methods in the Java object, but in this case it really means that they wrap a binder proxy object and call methods on it when told to by the JNI layer, which in turn become transactions across the process boundary to the binder native object. Once the binder proxy receives a response from the other side, it passes it back up to the native proxy, which passes it back up through the JNI layer to you.

MediaServer

The MediaServer process is the heart of Android's media framework. It wraps up everything that is necessary to make media playback and recording possible, such as codecs, file authoring, and connections to hardware abstraction layers (HALs).

Upon startup, MediaServer launches server threads for each of its major functions, including the CameraService and the MediaPlayerService. Within the media server, our binder native objects correspond to client objects, which connect to their corresponding services.

The camera service handles clients which sit on top of the hardware abstraction layer and correspond to Camera objects. Actually, they sit on top of an interface which itself abstracts the HAL (since the HAL will be different depending on the target system's own hardware). Its primary purpose is to manage synchronization among clients – in other words, while several can be connected to the service, only one can actually use the camera at any given time. The classes for both client and server are in the CameraService header and implementation under services/camera/libcameraservice/.

We'll mention that MediaPlayerService actually creates clients that connect to MediaRecorders, but the MediaPlayerService doesn't actually handle any of the recording duties. That's handled in the next layer within the MediaServer process.

Stagefright, OpenCore, and OpenMAX

Stagefright is a new addition to the Android source – although pieces of it have been in the source since Eclair, it was only fully implemented in Gingerbread. The job of Stagefright is to abstract (again with the abstraction!) the codec library. There are a number of other interesting parts and pieces that are more or less outside of our scope here, but the idea is that Stagefright is where all the encoding and decoding takes place above the platform-specific level. For some good block diagrams, detailing the Stagefright architecture, look here and here (last one is in Chinese, so use Google Translate). Most of the Stagefright files are in media/libstagefright/ and include/media/stagefright/.

Prior to Gingerbread, the primary vehicle for encoding/decoding was PacketVideo's OpenCORE framework. Unlike Stagefright, it was particularly well-documented and if you need to know how it works, the information isn't nearly as difficult to find. Stagefright is basically Google's in-house version of OpenCORE, created with help from PV. It is not well-documented at all, mainly because the average app developer doesn't really need to know how it works. However, my understanding is that it is simpler and more extensible than OpenCORE, and with Google's attention it will evolve quickly and efficiently.

By the way, it is possible to swap Stagefright or OpenCORE out for an alternate media framework, such as Gstreamer, which Eric has covered in previous blog posts. That's a serious undertaking, though, and definitely out of the scope of this post!

The central class in the recording subsystem is StagefrightRecorder (header and implementation, confusingly under media/libmediaplayerservice/). A reference to a StagefrightRecorder object is bundled into initialized MediaRecorderClient objects. The job of StagefrightRecorder is to take the short list of calls passed in from upper layers and dissect them. Given an encoding format, output format, and list of parameters, it selects an encoder and a MediaWriter object, as well as MediaSources representing data sources like the camera and microphone, then manages them as the MediaRecorder object tells it to (or complains if a bad combination of codec and file format were supplied at any point in the call stack).

MediaWriter is actually an interface that we use as a simplification for a wide array of file authoring classes which implement it. These classes call on the codec library to encode media coming in from the camera and microphone and then write it to file in a particular format. There is an MPEG4Writer, an AMRWriter, an ARTPWriter, and so on with each of the implemented file formats. Notice that this is the endpoint for file authoring: each of these classes has all it needs to write a video file in the correct format and store it to the SD card.

The great thing about Stagefright (and OpenCORE, for that matter) is that it makes it easy for codecs to be implemented. The codecs used by Stagefright are expected to adhere to the OpenMAX (OMX) standard, a system for developing codecs that are interchangeable between media frameworks. So, in theory, if you have an OMX-compliant codec, you should be able to directly plug it into Stagefright and expect it to work. More information about that here.

Freescale proprietary components

Android is portable, so it doesn't provide the actual glue to the hardware. Instead, it has a handful of interfaces, which the hardware designer can write implementations for. Notice on the diagram that none of the actual Google code interfaces directly with the kernel layer – this has to be done by Freescale-provided components. (by the way, the source for proprietary components isn't available via AOSP's gitweb, so follow along with your own local copy of the Freescale release).

The camera, for example, is hidden under a hardware abstraction layer, the CameraHal class (under hardware/mx5x/libcamera/). The HAL actually contains the name of the camera driver (as well as other important system files it needs to query), and calls IOCTLs on it to set it up and use its functions. It also performs other important functions, such as encoding the picture in JPEG format or converting between color schemes.

Then there's the codec libraries. If you look on your Android device in /system/lib/ you'll see a pile of .so files. These are shared libraries, compiled when you first built Android for your device, and are referenced by running processes such as MediaServer. You can look in /proc/maps to see which libraries a particular process is using. Among these are a handful of precompiled libraries provided by Freescale – back on your host machine, these are under device/fsl/proprietary/. You won't be able to see the source code for what's in these libraries, because they're closed-source; however, you can get an idea of who is calling on who with objdump.

Ultimately, each of the OMX codecs are linked within these libraries to a VPU codec, which itself connects to libvpu.so. The codecs are hardware-accelerated, meaning that since they perform a complex job, it's easier to offload it to the VPU to do it. The proprietary code in libvpu does exactly that. We'll talk more about the VPU later.

Audio subsystem

Audio input is handled via an AudioRecord object that is abstracted within Stagefright as an AudioSource object. AudioRecord manages the buffers and parameters for recording on a single audio channel.

The system's audio channels themselves are managed by AudioFlinger, which is another service exposed by the MediaServer. AudioFlinger manages and mixes all of the audio channels in the system. This is more important for other use cases, such as being able to listen for calls while streaming music, and games with complex, multi-track audio; the important part is that AudioFlinger acts as the interface between alsa and the rest of Android.

alsa, if you aren't aware, is the Advanced Linux Sound Architecture. alsa provides the drivers and virtual devices necessary to handle quality sound on most Linux systems, and is part of the kernel. The user-space alsa library is in external/alsa-lib.

Overlays

On the other side of the diagram is the Surface. Android uses Surface objects as a wrapper for graphics buffers, and has a Surface for every UI window you see on your Android device. You can directly manipulate a Surface from a topside application by using a SurfaceHolder, exposed to the UI via a SurfaceView object. More important to this discussion is you pass a SurfaceHolder object to your Camera to tell it where to post preview frames within your app's UI.

Below the surface, Surfaces are typically managed by SurfaceFlinger. SurfaceFlinger's job is to organize all the Surfaces thrown at it by Android and organize them for display. So, it figures out what parts of which windows are hidden by the one on top. To SurfaceFlinger, our preview frame is just another UI element, which it happily sorts with all the other Surfaces and posts to the display hardware.

Actually, this isn't what happens on the i.MX51. Android also provides the pipework for surfaces to be posted as a video overlay. Overlaying allows the Surfaces to be posted directly to the hardware, without having to go through SurfaceFlinger. Since preview windows are constantly being updated with complex data, using overlays makes sense – less overhead, better performance, happier user. There are three layers here: an Overlay class which abstracts the hardware from the rest of the system, an interface header (hardware/libhardware/include/hardware/overlay.h) which connects to the hardware, and then Freescale's Overlay library in hardware/mx5x/liboverlay/. The final layer connects to the overlay device, /dev/video16, and posts YUV-formatted overlays to it directly.

Kernel components

That about does it for the Android userland. Whew.

Android runs on a Linux kernel, and follows most of the rules that apply to normal Linux systems. So to communicate with the hardware, Android processes talk to device drivers, exposed as usual in /dev/.

First, you should know about V4L2. It's the official Linux API for developing video drivers and applications to work with Linux. If you have any recent experience developing video applications for Linux then you've probably worked with V4L2. It makes sense that Android would utilize it. At any rate, both the camera and overlay drivers are V4L2-compliant. The device drivers are exposed in the filesystem as /dev/video0 (for the camera) and /dev/video16 (for the overlay device). /dev/fb0 is the actual display device driver.

The camera driver itself is in android/kernel_imx/drivers/media/video/boundary/. It's a V4L2 module that is designed to work with the OV5642 cameras that we use with our Nitrogen boards, and makes calls on the IPU driver to empty buffers that are filled by the camera. Eric has already said a lot about it in previous posts here and here.

We've already mentioned the binder driver and ALSA. You should also know that there is an mxc_vpu driver which corresponds to the VPU, and is responsible for filling and emptying buffers between the codec subsystem and the VPU itself. The source is in kernel_imx/drivers/mxc/vpu. There is also an IPU driver in kernel_imx/drivers/mxc/ipu, which does essentially the same thing for the IPU.

I.MX51 components

It also helps to know a few things about the relevant i.MX51 subsystems that are involved in this process.

The Image Processing Unit is the component that connects to both the camera and display. The IPU automates the process of managing the display. It actually provides a lot of cool features such as hardware image conversion, but they aren't really used by Freescale's Android build.

More important to understand is the Video Processing Unit, which implements complex encoding and decoding algorithms in hardware, greatly speeding up the transcoding process. So, we say that the codecs are hardware-accelerated. According to Freescale, all of the codecs offered by their Android build (RTP and HTTP streaming excepted) are VPU-accelerated, with H.263 and 264 being available for recording and a number of others for playback. The VPU itself is capable of working with a lot more formats, but not all of them are supported under Freescale's Android release yet.

You should know that there is documentation included with the Freescale Android release (look in android_source_folder/doc/SDK_UserGuides; unfortunately, they are not available on the web) that describes an API for the i.MX51 VPU and IPU. So, support is there should you ever find yourself needing to write new kernel code (such as a hardware-accelerated codec) to work with the i.MX51 media system.

Camera

Finally there's the camera itself. We use an Omnivision 5642 camera with our Nitrogen boards. Again, most of what you need to know about this is covered in either the datasheet or Eric's previous posts.

Overview: Function call lifecycle

So to recap, let's follow a few of the important function calls through the flowchart above.

Camera.takePicture(...)

Callbacks are registered, and the takePicture() native method is called through the JNI interface. The native Camera object then starts a transaction with its corresponding CameraService::Client through the Binder interface, notifying it that a takePicture request has been made. CameraService::Client calls its takePicture() method, which in turn calls the takePicture() method of the CameraHardwareInterface.

At the hardware abstraction layer, the cameraTakePicture() method is called, which sets up a buffer and calls IOCTLs on the driver to fill it from memory held by the IPU's DMA controller. That data is converted to JPEG data and written to memory for later retrieval. A status message is returned to each of the upper layers, up to the native layer, at which point the topmost function returns and you have your picture.

MediaRecorder.start()

Once again, the start() native method is called through the JNI interface, and the native MediaRecorder object starts a transaction with its corresponding MediaRecorderClient class, notifying it that a start request has been made. The MediaRecorderClient calls its start() method, which in turn calls the start() method of the wrapped StagefrightRecorder class. Supposing that we have chosen to record an MPEG4 video, the startMPEG4Recording() method is called, and a new MPEG4Writer object is created with the file descriptor we previously passed to the top-level MediaRecorder object.

The MPEG4Writer is set up with some initial parameters, including an encoder of some type, and then its own start() method is called. The parameters are copied, some header information is written, and then a writer thread is started, which calls the start() methods of a number of Tracks which wrap the AudioSource and CameraSource which were passed in previously. Each has its own start() method called – in the case of the CameraSource, it calls on the startRecording() method of the Camera object which it wraps, and that call proceeds down the chain as described above. At the CameraHAL layer, buffers are set aside and are filled with preview frames. The information is made available to the writer thread as a pointer to the memory where these frames can be found.

A discussion on how writing works, in detail, is probably beyond the scope of this article, since it is very complex, but the gist of it is that data comes in through the source pipes, gets directed to the VPU for hardware-accelerated encoding, and is written to file by the MPEG4Writer or whatever other Writer class is being employed.

That about covers it! In the next post we'll go over the AOSP Camera app in detail, to give you an idea of how to write your own.