[libcamera-devel] swisp performance notes was Re: [RFC] Add 8-bit bayer support.
diff mbox series

Message ID ZXt3I+osSMeuhXMr@duo.ucw.cz
State New
Headers show
Series
  • [libcamera-devel] swisp performance notes was Re: [RFC] Add 8-bit bayer support.
Related show

Commit Message

Pavel Machek Dec. 14, 2023, 9:44 p.m. UTC
Hi!

So... For some reason latest version uses full sensor resolution. That
gave me 0.1 fps. If I copy image to cached memory (like below), I get
~1 fps with -O0, and ~4 fps with optimalizations. (On pinephone).

[9:26:04.598819548] [22455]  INFO Camera camera.cpp:1183 configuring streams: (0) 2584x1940-RGB888
cam0: Capture until user interrupts by SIGINT
copy...5038848
process...
copy...5038848
33967.480079 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 15038880
process...
copy...5038848
33967.652354 (5.80 fps) cam0-stream0 seq: 000001 bytesused: 15038880
process...
copy...5038848
33968.685997 (0.97 fps) cam0-stream0 seq: 000007 bytesused: 15038880

You may want to check if caching works for you. (Adjust that 7000.. number.)

Best regards,
								Pavel

commit 190d351eb3158c7d5f1eddd81b0a2aeedffaef9c
Author: Pavel Machek <pavel@ucw.cz>
Date:   Thu Dec 14 21:58:26 2023 +0100

    Copy image to cached memory. This gives me more than 10x performance
    improvement.

Comments

Milan Zamazal Dec. 15, 2023, 9:47 a.m. UTC | #1
Pavel Machek via libcamera-devel <libcamera-devel@lists.libcamera.org> writes:

> For some reason latest version uses full sensor resolution.

The same here on Debix Model A (with SoftwareISP-v04-hans1):

  DEBUG SimplePipeline simple.cpp:1005 Picked 3280x2464-SRGGB10_1X10 -> 3280x2464-SRGGB10 for max stream size 3276x2460

Resulting in:

  # ./cam -c 1 --capture=1 -s role=still
  ...
  ERROR V4L2 v4l2_videodevice.cpp:1906 /dev/video1[15:cap]: Failed to start streaming: Cannot allocate memory

The failing line is:

  ret = ioctl(VIDIOC_STREAMON, &bufferType_);

This is with available memory ~1.5 GB.

And:

  # LIBCAMERA_LOG_LEVELS='*:DEBUG' ./cam -c 1 --capture=1 -s role=still,width=640,height=480
  DEBUG SimplePipeline simple.cpp:751 Link 'imx219 1-0010':0 -> 'csis-32e40000.csi':0 configured with format 640x480-SRGGB10_1X10
  ...
  DEBUG SimplePipeline simple.cpp:1005 Picked 640x480-SRGGB10_1X10 -> 640x480-SRGGB10 for max stream size 640x480
  DEBUG SimplePipeline simple.cpp:1045 Adjusting size from 640x480 to 640x480
  ERROR Camera camera.cpp:1171 Can't configure camera with invalid configuration

It's hard for me to distinguish without deeper inspection whether dealing with
features or non-optimal defaults or user errors.

Regards,
Milan
Kieran Bingham Dec. 15, 2023, 10:14 a.m. UTC | #2
Quoting Milan Zamazal via libcamera-devel (2023-12-15 09:47:39)
> Pavel Machek via libcamera-devel <libcamera-devel@lists.libcamera.org> writes:
> 
> > For some reason latest version uses full sensor resolution.
> 
> The same here on Debix Model A (with SoftwareISP-v04-hans1):
> 
>   DEBUG SimplePipeline simple.cpp:1005 Picked 3280x2464-SRGGB10_1X10 -> 3280x2464-SRGGB10 for max stream size 3276x2460
> 
> Resulting in:
> 
>   # ./cam -c 1 --capture=1 -s role=still
>   ...
>   ERROR V4L2 v4l2_videodevice.cpp:1906 /dev/video1[15:cap]: Failed to start streaming: Cannot allocate memory
> 
> The failing line is:
> 
>   ret = ioctl(VIDIOC_STREAMON, &bufferType_);
> 
> This is with available memory ~1.5 GB.

Have you configured any CMA? I think CMA defaults to 32MB which won't be
sufficient for processing large images.

> 
> And:
> 
>   # LIBCAMERA_LOG_LEVELS='*:DEBUG' ./cam -c 1 --capture=1 -s role=still,width=640,height=480
>   DEBUG SimplePipeline simple.cpp:751 Link 'imx219 1-0010':0 -> 'csis-32e40000.csi':0 configured with format 640x480-SRGGB10_1X10
>   ...
>   DEBUG SimplePipeline simple.cpp:1005 Picked 640x480-SRGGB10_1X10 -> 640x480-SRGGB10 for max stream size 640x480
>   DEBUG SimplePipeline simple.cpp:1045 Adjusting size from 640x480 to 640x480
>   ERROR Camera camera.cpp:1171 Can't configure camera with invalid configuration
> 
> It's hard for me to distinguish without deeper inspection whether dealing with
> features or non-optimal defaults or user errors.
> 
> Regards,
> Milan
>
Hans de Goede Dec. 15, 2023, 2:15 p.m. UTC | #3
Hi,

On 12/14/23 22:44, Pavel Machek wrote:
> Hi!
> 
> So... For some reason latest version uses full sensor resolution. That
> gave me 0.1 fps.

That is weird, in recent versions I changed the sizes()
method defined in:

include/libcamera/internal/software_isp/debayer.h

to return an actual SizeRange from for normal debayering
2x2 - (width-4)x(height-4).

Since we can fill a smaller output buffer using cropping,
so this replaces the old sizes function which returned
a SizeRange of:

(width-4)x(height-4) - (width-4)x(height-4)

Does your sensor driver support multiple resolutions /
cropping at the sensor level ?

I wonder if advertising cropping in the software-isp
is causing the simple-pipeline to pick a different
sensor resolution.

You could try undoing the sizes change:

diff --git a/include/libcamera/internal/software_isp/debayer.h b/include/libcamera/internal/software_isp/debayer.h
index 206bc2ac..e2a63f24 100644
--- a/include/libcamera/internal/software_isp/debayer.h
+++ b/include/libcamera/internal/software_isp/debayer.h
@@ -77,10 +77,8 @@ public:
 			return {};
 		}
 
-		return SizeRange(Size(pattern_size.width, pattern_size.height),
-				 Size((inputSize.width - 2 * pattern_size.width) & ~(pattern_size.width - 1),
-				      (inputSize.height - 2 * pattern_size.height) & ~(pattern_size.height - 1)),
-				 pattern_size.width, pattern_size.height);
+		return SizeRange(Size((inputSize.width - 2 * pattern_size.width) & ~(pattern_size.width - 1),
+				      (inputSize.height - 2 * pattern_size.height) & ~(pattern_size.height - 1)));
 	}
 
 	Signal<FrameBuffer *> inputBufferReady;

Regards,

Hans
Milan Zamazal Dec. 15, 2023, 7:38 p.m. UTC | #4
Kieran Bingham <kieran.bingham@ideasonboard.com> writes:

> Quoting Milan Zamazal via libcamera-devel (2023-12-15 09:47:39)
>> Pavel Machek via libcamera-devel <libcamera-devel@lists.libcamera.org> writes:
>> 
>> > For some reason latest version uses full sensor resolution.
>> 
>> The same here on Debix Model A (with SoftwareISP-v04-hans1):
>> 
>>   DEBUG SimplePipeline simple.cpp:1005 Picked 3280x2464-SRGGB10_1X10 -> 3280x2464-SRGGB10 for max
>> stream size 3276x2460
>> 
>> Resulting in:
>> 
>>   # ./cam -c 1 --capture=1 -s role=still
>>   ...
>>   ERROR V4L2 v4l2_videodevice.cpp:1906 /dev/video1[15:cap]: Failed to start streaming: Cannot allocate
>> memory
>> 
>> The failing line is:
>> 
>>   ret = ioctl(VIDIOC_STREAMON, &bufferType_);
>> 
>> This is with available memory ~1.5 GB.
>
> Have you configured any CMA? I think CMA defaults to 32MB which won't be
> sufficient for processing large images.

Right.  CMA size was 64 MB (and part of it was used), after enlarging it to
128 MB, the error is gone.  Thanks for pointing this out.

>> And:
>> 
>>   # LIBCAMERA_LOG_LEVELS='*:DEBUG' ./cam -c 1 --capture=1 -s role=still,width=640,height=480
>>   DEBUG SimplePipeline simple.cpp:751 Link 'imx219 1-0010':0 -> 'csis-32e40000.csi':0 configured with
>> format 640x480-SRGGB10_1X10
>>   ...
>>   DEBUG SimplePipeline simple.cpp:1005 Picked 640x480-SRGGB10_1X10 -> 640x480-SRGGB10 for max stream
>> size 640x480
>>   DEBUG SimplePipeline simple.cpp:1045 Adjusting size from 640x480 to 640x480
>>   ERROR Camera camera.cpp:1171 Can't configure camera with invalid configuration
>> 
>> It's hard for me to distinguish without deeper inspection whether dealing with
>> features or non-optimal defaults or user errors.
>> 
>> Regards,
>> Milan
>>

Patch
diff mbox series

diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
index 52910a03..0b4bf7ff 100644
--- a/src/libcamera/software_isp/debayer_cpu.cpp
+++ b/src/libcamera/software_isp/debayer_cpu.cpp
@@ -483,10 +483,27 @@  void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams
 		return;
 	}
 
+       const uint8_t *uncached_src = in.planes()[0].data();
+       static uint8_t src[7000000];
+       unsigned long size = in.planes()[0].size();
+ 	if (size%8)
+        	 printf("very bad size!\n");
+
+       printf("copy...%ld\n", size);
+       (void) uncached_src;
+	if (1) {      
+	       unsigned int i;
+	       for (i=0; i<size/8; i++) {
+	               ((uint64_t *) src)[i] = ((uint64_t *) uncached_src)[i];
+	       }
+	}
+       printf("process...\n");
+       // debayer is "free" compared to the memcpy.
+
 	if (inputConfig_.patternSize.height == 2)
-		process2(in.planes()[0].data(), out.planes()[0].data());
+		process2(src, out.planes()[0].data());
 	else
-		process4(in.planes()[0].data(), out.planes()[0].data());
+		process4(src, out.planes()[0].data());
 
 	metadata.planes()[0].bytesused = out.planes()[0].size();