[{"id":31775,"web_url":"https://patchwork.libcamera.org/comment/31775/","msgid":"<20241016220827.GG30496@pendragon.ideasonboard.com>","date":"2024-10-16T22:08:27","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"Hi Harvey, Han-Lin,\n\nThank you for the patch. This is taking shape.\n\nOn Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> From: Han-Lin Chen <hanlinchen@chromium.org>\n> \n> Add method to set thread affinity to Thread class.\n> \n> Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> ---\n>  include/libcamera/base/thread.h |  5 ++++\n>  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n>  2 files changed, 51 insertions(+)\n> \n> diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> index 4f33de63d..3209d4f7c 100644\n> --- a/include/libcamera/base/thread.h\n> +++ b/include/libcamera/base/thread.h\n> @@ -15,6 +15,7 @@\n>  \n>  #include <libcamera/base/message.h>\n>  #include <libcamera/base/signal.h>\n> +#include <libcamera/base/span.h>\n>  #include <libcamera/base/utils.h>\n>  \n>  namespace libcamera {\n> @@ -35,6 +36,8 @@ public:\n>  \tvoid exit(int code = 0);\n>  \tbool wait(utils::duration duration = utils::duration::max());\n>  \n> +\tint setThreadAffinity(const Span<const unsigned int> &cpus);\n> +\n>  \tbool isRunning();\n>  \n>  \tSignal<> finished;\n> @@ -54,6 +57,8 @@ private:\n>  \tvoid startThread();\n>  \tvoid finishThread();\n>  \n> +\tvoid setThreadAffinityInternal();\n> +\n>  \tvoid postMessage(std::unique_ptr<Message> msg, Object *receiver);\n>  \tvoid removeMessages(Object *receiver);\n>  \n> diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> index 8735670b8..5dc787dba 100644\n> --- a/src/libcamera/base/thread.cpp\n> +++ b/src/libcamera/base/thread.cpp\n> @@ -9,6 +9,7 @@\n>  \n>  #include <atomic>\n>  #include <list>\n> +#include <optional>\n>  #include <sys/syscall.h>\n>  #include <sys/types.h>\n>  #include <unistd.h>\n> @@ -128,6 +129,8 @@ private:\n>  \tint exitCode_;\n>  \n>  \tMessageQueue messages_;\n> +\n> +\tstd::optional<cpu_set_t> cpuset_ = std::nullopt;\n\nIs there a need to initialize this to std::nullopt, as that's what the\ndefault constructor does ?\n\n>  };\n>  \n>  /**\n> @@ -254,6 +257,8 @@ void Thread::start()\n>  \tdata_->exit_.store(false, std::memory_order_relaxed);\n>  \n>  \tthread_ = std::thread(&Thread::startThread, this);\n> +\n> +\tsetThreadAffinityInternal();\n>  }\n>  \n>  void Thread::startThread()\n> @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n>  \treturn hasFinished;\n>  }\n>  \n> +/**\n> + * \\brief Set the CPU affinity mask of the thread\n> + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> + *\n> + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> + * If any index is invalid, no cpu indices will be set.\n\n * If any index is invalid, this function won't modify the thread affinity and\n * will return an error.\n\n> + *\n> + * \\return 0 if all indices are valid, -EINVAL otherwise\n> + */\n> +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> +{\n> +\tconst unsigned int numCpus = std::thread::hardware_concurrency();\n\nhttps://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\nsays:\n\n  Returns the number of concurrent threads supported by the\n  implementation. The value should be considered only a hint.\n\n  [snip]\n\n  Return value\n\n  Number of concurrent threads supported. If the value is not well\n  defined or not computable, returns 0.\n\nlibcxx seems to have the following implementation:\n\nunsigned thread::hardware_concurrency() noexcept {\n#if defined(_SC_NPROCESSORS_ONLN)\n  long result = sysconf(_SC_NPROCESSORS_ONLN);\n  // sysconf returns -1 if the name is invalid, the option does not exist or\n  // does not have a definite limit.\n  // if sysconf returns some other negative number, we have no idea\n  // what is going on. Default to something safe.\n  if (result < 0)\n    return 0;\n  return static_cast<unsigned>(result);\n#elif defined(_LIBCPP_WIN32API)\n  SYSTEM_INFO info;\n  GetSystemInfo(&info);\n  return info.dwNumberOfProcessors;\n#else // defined(CTL_HW) && defined(HW_NCPU)\n  // TODO: grovel through /proc or check cpuid on x86 and similar\n  // instructions on other architectures.\n#  if defined(_LIBCPP_WARNING)\n  _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n#  else\n#    warning hardware_concurrency not yet implemented\n#  endif\n  return 0; // Means not computable [thread.thread.static]\n#endif // defined(CTL_HW) && defined(HW_NCPU)\n}\n\nwhile libstdc++ uses\n\n#if defined(_GLIBCXX_USE_GET_NPROCS)\n# include <sys/sysinfo.h>\n# define _GLIBCXX_NPROCS get_nprocs()\n#elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n# define _GLIBCXX_NPROCS pthread_num_processors_np()\n#elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n# include <stddef.h>\n# include <sys/sysctl.h>\nstatic inline int get_nprocs()\n{\n int count;\n size_t size = sizeof(count);\n int mib[] = { CTL_HW, HW_NCPU };\n if (!sysctl(mib, 2, &count, &size, NULL, 0))\n   return count;\n return 0;\n}\n# define _GLIBCXX_NPROCS get_nprocs()\n#elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n# include <unistd.h>\n# define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n#elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n# include <unistd.h>\n# define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n#else\n# define _GLIBCXX_NPROCS 0\n#endif\n\n  unsigned int\n  thread::hardware_concurrency() noexcept\n  {\n    int __n = _GLIBCXX_NPROCS;\n    if (__n < 0)\n      __n = 0;\n    return __n;\n  }\n\nget_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\nis a wrapper around the former). So it seems that, while the C++ API\ndoesn't guarantee that thread::hardware_concurrency() will provide an\naccurate value, the implementations we have will do the right thing.\nThey're however a bit expensive, as they open a file in /sys, but that's\nprobably the best we can do.\n\n> +\n> +\tMutexLocker locker(data_->mutex_);\n> +\tdata_->cpuset_ = cpu_set_t();\n> +\tCPU_ZERO(&data_->cpuset_.value());\n> +\n> +\tfor (const unsigned int &cpu : cpus) {\n> +\t\tif (cpu >= numCpus) {\n> +\t\t\tLOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n\nIt's not ignored anymore\n\n\t\t\tLOG(Thread, Error)\n\t\t\t\t<< \"Invalid CPU \" << cpu << \" for thread affinity\";\n\n> +\t\t\treturn -EINVAL;\n> +\t\t}\n> +\n> +\t\tCPU_SET(cpu, &data_->cpuset_.value());\n> +\t}\n> +\n> +\tif (data_->running_)\n> +\t\tsetThreadAffinityInternal();\n> +\n> +\treturn 0;\n> +}\n> +\n> +void Thread::setThreadAffinityInternal()\n> +{\n> +\tif (!data_->cpuset_)\n> +\t\treturn;\n> +\n> +\tconst cpu_set_t &cpuset = data_->cpuset_.value();\n> +\tpthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n\nWith this implementation, it's not possible to set the thread's CPU\naffinity before starting the thread. That's a limitation of the C++\nstd::thread API, it could be done if we used the native pthread API\ninstead. Is that a problem ? Could you actually point me to code that\nuses Thread::setThreadAffinity() ?\n\nEven without the ability to set the affinity before starting the thread,\nwe could set it from within the thread before calling the run()\nfunction. This would avoid the main workload being run on incorrect\nCPUs. Is that something that would be useful ?\n\nRegardless of which option we implement, Thread::setThreadAffinity()\nshould document what happens if it's called before calling\nThread::start().\n\n> +}\n> +\n>  /**\n>   * \\brief Check if the thread is running\n>   *","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 79258C32FC\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 16 Oct 2024 22:08:36 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 446F265383;\n\tThu, 17 Oct 2024 00:08:35 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id ADF5F60537\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 17 Oct 2024 00:08:32 +0200 (CEST)","from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi\n\t[81.175.209.231])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id C7807581;\n\tThu, 17 Oct 2024 00:06:49 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"IKCBfjgw\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1729116410;\n\tbh=oIBdmn1X896226Oc07anWED6doeSno8rIS+uW1CcaE0=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=IKCBfjgwuMmy1Db0OKoIucJUkYjtyLCDlQ3Wx9xDOAAxct7yu43YoYJS3n1097fSH\n\tnwZmD/1m1XvnA1Bd2YiYBfKaNsNqSRUj7sDlkB/CK+zPYHWLrvWcDwjUtlhckjn4IJ\n\t3M+PkTOR2NUgZ2PaKXm3jFCewkyOFHIueYYKRfMo=","Date":"Thu, 17 Oct 2024 01:08:27 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Harvey Yang <chenghaoyang@chromium.org>","Cc":"libcamera-devel@lists.libcamera.org,\n\tHan-Lin Chen <hanlinchen@chromium.org>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","Message-ID":"<20241016220827.GG30496@pendragon.ideasonboard.com>","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20241016084449.3566589-2-chenghaoyang@chromium.org>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31781,"web_url":"https://patchwork.libcamera.org/comment/31781/","msgid":"<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>","date":"2024-10-17T05:59:58","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":117,"url":"https://patchwork.libcamera.org/api/people/117/","name":"Cheng-Hao Yang","email":"chenghaoyang@chromium.org"},"content":"Hi Laurent,\n\nI'll update a new version when all changes are applied.\n\nOn Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart\n<laurent.pinchart@ideasonboard.com> wrote:\n>\n> Hi Harvey, Han-Lin,\n>\n> Thank you for the patch. This is taking shape.\n>\n> On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > From: Han-Lin Chen <hanlinchen@chromium.org>\n> >\n> > Add method to set thread affinity to Thread class.\n> >\n> > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > ---\n> >  include/libcamera/base/thread.h |  5 ++++\n> >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> >  2 files changed, 51 insertions(+)\n> >\n> > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > index 4f33de63d..3209d4f7c 100644\n> > --- a/include/libcamera/base/thread.h\n> > +++ b/include/libcamera/base/thread.h\n> > @@ -15,6 +15,7 @@\n> >\n> >  #include <libcamera/base/message.h>\n> >  #include <libcamera/base/signal.h>\n> > +#include <libcamera/base/span.h>\n> >  #include <libcamera/base/utils.h>\n> >\n> >  namespace libcamera {\n> > @@ -35,6 +36,8 @@ public:\n> >       void exit(int code = 0);\n> >       bool wait(utils::duration duration = utils::duration::max());\n> >\n> > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > +\n> >       bool isRunning();\n> >\n> >       Signal<> finished;\n> > @@ -54,6 +57,8 @@ private:\n> >       void startThread();\n> >       void finishThread();\n> >\n> > +     void setThreadAffinityInternal();\n> > +\n> >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> >       void removeMessages(Object *receiver);\n> >\n> > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > index 8735670b8..5dc787dba 100644\n> > --- a/src/libcamera/base/thread.cpp\n> > +++ b/src/libcamera/base/thread.cpp\n> > @@ -9,6 +9,7 @@\n> >\n> >  #include <atomic>\n> >  #include <list>\n> > +#include <optional>\n> >  #include <sys/syscall.h>\n> >  #include <sys/types.h>\n> >  #include <unistd.h>\n> > @@ -128,6 +129,8 @@ private:\n> >       int exitCode_;\n> >\n> >       MessageQueue messages_;\n> > +\n> > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n>\n> Is there a need to initialize this to std::nullopt, as that's what the\n> default constructor does ?\n\nRemoved.\n\n>\n> >  };\n> >\n> >  /**\n> > @@ -254,6 +257,8 @@ void Thread::start()\n> >       data_->exit_.store(false, std::memory_order_relaxed);\n> >\n> >       thread_ = std::thread(&Thread::startThread, this);\n> > +\n> > +     setThreadAffinityInternal();\n> >  }\n> >\n> >  void Thread::startThread()\n> > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> >       return hasFinished;\n> >  }\n> >\n> > +/**\n> > + * \\brief Set the CPU affinity mask of the thread\n> > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > + *\n> > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > + * If any index is invalid, no cpu indices will be set.\n>\n>  * If any index is invalid, this function won't modify the thread affinity and\n>  * will return an error.\n\nDone.\n\n>\n> > + *\n> > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > + */\n> > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > +{\n> > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n>\n> https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> says:\n>\n>   Returns the number of concurrent threads supported by the\n>   implementation. The value should be considered only a hint.\n>\n>   [snip]\n>\n>   Return value\n>\n>   Number of concurrent threads supported. If the value is not well\n>   defined or not computable, returns 0.\n>\n> libcxx seems to have the following implementation:\n>\n> unsigned thread::hardware_concurrency() noexcept {\n> #if defined(_SC_NPROCESSORS_ONLN)\n>   long result = sysconf(_SC_NPROCESSORS_ONLN);\n>   // sysconf returns -1 if the name is invalid, the option does not exist or\n>   // does not have a definite limit.\n>   // if sysconf returns some other negative number, we have no idea\n>   // what is going on. Default to something safe.\n>   if (result < 0)\n>     return 0;\n>   return static_cast<unsigned>(result);\n> #elif defined(_LIBCPP_WIN32API)\n>   SYSTEM_INFO info;\n>   GetSystemInfo(&info);\n>   return info.dwNumberOfProcessors;\n> #else // defined(CTL_HW) && defined(HW_NCPU)\n>   // TODO: grovel through /proc or check cpuid on x86 and similar\n>   // instructions on other architectures.\n> #  if defined(_LIBCPP_WARNING)\n>   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> #  else\n> #    warning hardware_concurrency not yet implemented\n> #  endif\n>   return 0; // Means not computable [thread.thread.static]\n> #endif // defined(CTL_HW) && defined(HW_NCPU)\n> }\n>\n> while libstdc++ uses\n>\n> #if defined(_GLIBCXX_USE_GET_NPROCS)\n> # include <sys/sysinfo.h>\n> # define _GLIBCXX_NPROCS get_nprocs()\n> #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> # include <stddef.h>\n> # include <sys/sysctl.h>\n> static inline int get_nprocs()\n> {\n>  int count;\n>  size_t size = sizeof(count);\n>  int mib[] = { CTL_HW, HW_NCPU };\n>  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n>    return count;\n>  return 0;\n> }\n> # define _GLIBCXX_NPROCS get_nprocs()\n> #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> # include <unistd.h>\n> # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> # include <unistd.h>\n> # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> #else\n> # define _GLIBCXX_NPROCS 0\n> #endif\n>\n>   unsigned int\n>   thread::hardware_concurrency() noexcept\n>   {\n>     int __n = _GLIBCXX_NPROCS;\n>     if (__n < 0)\n>       __n = 0;\n>     return __n;\n>   }\n>\n> get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> is a wrapper around the former). So it seems that, while the C++ API\n> doesn't guarantee that thread::hardware_concurrency() will provide an\n> accurate value, the implementations we have will do the right thing.\n> They're however a bit expensive, as they open a file in /sys, but that's\n> probably the best we can do.\n>\n> > +\n> > +     MutexLocker locker(data_->mutex_);\n> > +     data_->cpuset_ = cpu_set_t();\n> > +     CPU_ZERO(&data_->cpuset_.value());\n> > +\n> > +     for (const unsigned int &cpu : cpus) {\n> > +             if (cpu >= numCpus) {\n> > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n>\n> It's not ignored anymore\n>\n>                         LOG(Thread, Error)\n>                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n>\n\nRight, updated.\n\n> > +                     return -EINVAL;\n> > +             }\n> > +\n> > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > +     }\n> > +\n> > +     if (data_->running_)\n> > +             setThreadAffinityInternal();\n> > +\n> > +     return 0;\n> > +}\n> > +\n> > +void Thread::setThreadAffinityInternal()\n> > +{\n> > +     if (!data_->cpuset_)\n> > +             return;\n> > +\n> > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n>\n> With this implementation, it's not possible to set the thread's CPU\n> affinity before starting the thread. That's a limitation of the C++\n> std::thread API, it could be done if we used the native pthread API\n> instead. Is that a problem ? Could you actually point me to code that\n> uses Thread::setThreadAffinity() ?\n\nThere are two main usages in mtkisp7:\n1. Setting pipeline handler's thread [1].\n2. Setting IPA's two new threads [2].\n\nThe first one sets thread affinity to a thread that has already been\nstarted, while the second one set new threads.\n\nBy \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\nThis means that we need to implement libcamera::Thread with pthread,\ninstead of std::thread?\n\n[1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n[2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n\n>\n> Even without the ability to set the affinity before starting the thread,\n> we could set it from within the thread before calling the run()\n> function. This would avoid the main workload being run on incorrect\n> CPUs. Is that something that would be useful ?\n\nYou mean calling `setThreadAffinityInterna()l` in `startThread()`\n(before `run()`), instead of at the end of `start()`?\n\n>\n> Regardless of which option we implement, Thread::setThreadAffinity()\n> should document what happens if it's called before calling\n> Thread::start().\n>\n> > +}\n> > +\n> >  /**\n> >   * \\brief Check if the thread is running\n> >   *\n>\n> --\n> Regards,\n>\n> Laurent Pinchart\n\nBR,\nHarvey","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 4E34DC32FC\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 17 Oct 2024 06:00:13 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id D536665384;\n\tThu, 17 Oct 2024 08:00:11 +0200 (CEST)","from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com\n\t[IPv6:2a00:1450:4864:20::22e])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 3FF826537F\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 17 Oct 2024 08:00:10 +0200 (CEST)","by mail-lj1-x22e.google.com with SMTP id\n\t38308e7fff4ca-2fb49510250so4983341fa.0\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 16 Oct 2024 23:00:10 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"ZJ/SeIPr\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=chromium.org; s=google; t=1729144809; x=1729749609;\n\tdarn=lists.libcamera.org; \n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:from:to:cc:subject:date\n\t:message-id:reply-to;\n\tbh=6MbTttLmPTS8LFFTwixHQaCH2RC2bdSKpEr6JL6nKX4=;\n\tb=ZJ/SeIPr3z+Aef1lEIQef4AvbOjolhFf1fJpfQ+0jBfTLkDIGnzGqd3gp7ZvwJvTF5\n\tysgZN0tQvc6JwuCiCRnO8WJJUCpM1JmGBEcjKfWgyVKgcuk1jfcNV6gu0GWHnOpMCrB2\n\tD6Gw4Btmwf0PyCEaow+gAQwG3H1veqbnp3/r4=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20230601; t=1729144809; x=1729749609;\n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc\n\t:subject:date:message-id:reply-to;\n\tbh=6MbTttLmPTS8LFFTwixHQaCH2RC2bdSKpEr6JL6nKX4=;\n\tb=I0GVJVn9aZl0Y45NnI8ssFK9Ekacnms3owuq2tdCxkplhYq1yYQrvx6gPVZQLVMCw4\n\tM7Q+8G0oSYDjHeOFXtbxzzz9FRz4D3Jq/goqIq0KaD7hqBeDGrESf9y+ZCSgBuRa62zO\n\t1/EjXzMrK0XTGInWfd5I+yp/8IGg2EEjUKKEneV/hDx+ChHMSXF0gv/4cZlNSQmwEy+k\n\tld4OSvavPphOB5R2F7TGcBMsYXDW0UOF+1/A5zoWF0bETvIAWrLy/7nhaXqJZwNHnk1N\n\twGOBWUjx7YUbrIoK8nn5M4+iQWD7Bb4dm5FlPwx6Wp9o8ludJKydrfynC621mK20m7hx\n\t6Yqg==","X-Gm-Message-State":"AOJu0YwtGjE6oL6C1Spd+QWeQV9Hbsl3Pgsf864dihJiliwb01yv3Mhm\n\tFi4UaZz5iJVOZum5ErvXsyefBv8H4qtw5Is3gNUhBBT63I+Vy9qZUpMMHLl36IU4T+ibDVEbrfg\n\tnTt95P7QSLGPonnO+jL4ZPi9EwAuILyTkmMrm","X-Google-Smtp-Source":"AGHT+IHsvBUEo8SkneOdEek06swgFeUQO12tRv23/JhBCnbHi9V46X74EoYcv156GwvVMGmkQNZq9q0S0+k7PbzvJ5w=","X-Received":"by 2002:a2e:be27:0:b0:2fb:4b0d:909b with SMTP id\n\t38308e7fff4ca-2fb4b0d9485mr76530791fa.26.1729144809103;\n\tWed, 16 Oct 2024 23:00:09 -0700 (PDT)","MIME-Version":"1.0","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>","In-Reply-To":"<20241016220827.GG30496@pendragon.ideasonboard.com>","From":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Date":"Thu, 17 Oct 2024 13:59:58 +0800","Message-ID":"<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Cc":"libcamera-devel@lists.libcamera.org, \n\tHan-Lin Chen <hanlinchen@chromium.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31883,"web_url":"https://patchwork.libcamera.org/comment/31883/","msgid":"<20241022224901.GI9497@pendragon.ideasonboard.com>","date":"2024-10-22T22:49:01","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> Hi Laurent,\n> \n> I'll update a new version when all changes are applied.\n\nYes, let's finalize the discussion first :-)\n\n> On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > >\n> > > Add method to set thread affinity to Thread class.\n> > >\n> > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > ---\n> > >  include/libcamera/base/thread.h |  5 ++++\n> > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > >  2 files changed, 51 insertions(+)\n> > >\n> > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > index 4f33de63d..3209d4f7c 100644\n> > > --- a/include/libcamera/base/thread.h\n> > > +++ b/include/libcamera/base/thread.h\n> > > @@ -15,6 +15,7 @@\n> > >\n> > >  #include <libcamera/base/message.h>\n> > >  #include <libcamera/base/signal.h>\n> > > +#include <libcamera/base/span.h>\n> > >  #include <libcamera/base/utils.h>\n> > >\n> > >  namespace libcamera {\n> > > @@ -35,6 +36,8 @@ public:\n> > >       void exit(int code = 0);\n> > >       bool wait(utils::duration duration = utils::duration::max());\n> > >\n> > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > +\n> > >       bool isRunning();\n> > >\n> > >       Signal<> finished;\n> > > @@ -54,6 +57,8 @@ private:\n> > >       void startThread();\n> > >       void finishThread();\n> > >\n> > > +     void setThreadAffinityInternal();\n> > > +\n> > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > >       void removeMessages(Object *receiver);\n> > >\n> > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > index 8735670b8..5dc787dba 100644\n> > > --- a/src/libcamera/base/thread.cpp\n> > > +++ b/src/libcamera/base/thread.cpp\n> > > @@ -9,6 +9,7 @@\n> > >\n> > >  #include <atomic>\n> > >  #include <list>\n> > > +#include <optional>\n> > >  #include <sys/syscall.h>\n> > >  #include <sys/types.h>\n> > >  #include <unistd.h>\n> > > @@ -128,6 +129,8 @@ private:\n> > >       int exitCode_;\n> > >\n> > >       MessageQueue messages_;\n> > > +\n> > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> >\n> > Is there a need to initialize this to std::nullopt, as that's what the\n> > default constructor does ?\n> \n> Removed.\n> \n> > >  };\n> > >\n> > >  /**\n> > > @@ -254,6 +257,8 @@ void Thread::start()\n> > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > >\n> > >       thread_ = std::thread(&Thread::startThread, this);\n> > > +\n> > > +     setThreadAffinityInternal();\n> > >  }\n> > >\n> > >  void Thread::startThread()\n> > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > >       return hasFinished;\n> > >  }\n> > >\n> > > +/**\n> > > + * \\brief Set the CPU affinity mask of the thread\n> > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > + *\n> > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > + * If any index is invalid, no cpu indices will be set.\n> >\n> >  * If any index is invalid, this function won't modify the thread affinity and\n> >  * will return an error.\n> \n> Done.\n> \n> > > + *\n> > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > + */\n> > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > +{\n> > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> >\n> > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > says:\n> >\n> >   Returns the number of concurrent threads supported by the\n> >   implementation. The value should be considered only a hint.\n> >\n> >   [snip]\n> >\n> >   Return value\n> >\n> >   Number of concurrent threads supported. If the value is not well\n> >   defined or not computable, returns 0.\n> >\n> > libcxx seems to have the following implementation:\n> >\n> > unsigned thread::hardware_concurrency() noexcept {\n> > #if defined(_SC_NPROCESSORS_ONLN)\n> >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> >   // does not have a definite limit.\n> >   // if sysconf returns some other negative number, we have no idea\n> >   // what is going on. Default to something safe.\n> >   if (result < 0)\n> >     return 0;\n> >   return static_cast<unsigned>(result);\n> > #elif defined(_LIBCPP_WIN32API)\n> >   SYSTEM_INFO info;\n> >   GetSystemInfo(&info);\n> >   return info.dwNumberOfProcessors;\n> > #else // defined(CTL_HW) && defined(HW_NCPU)\n> >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> >   // instructions on other architectures.\n> > #  if defined(_LIBCPP_WARNING)\n> >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > #  else\n> > #    warning hardware_concurrency not yet implemented\n> > #  endif\n> >   return 0; // Means not computable [thread.thread.static]\n> > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > }\n> >\n> > while libstdc++ uses\n> >\n> > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > # include <sys/sysinfo.h>\n> > # define _GLIBCXX_NPROCS get_nprocs()\n> > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > # include <stddef.h>\n> > # include <sys/sysctl.h>\n> > static inline int get_nprocs()\n> > {\n> >  int count;\n> >  size_t size = sizeof(count);\n> >  int mib[] = { CTL_HW, HW_NCPU };\n> >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> >    return count;\n> >  return 0;\n> > }\n> > # define _GLIBCXX_NPROCS get_nprocs()\n> > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > # include <unistd.h>\n> > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > # include <unistd.h>\n> > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > #else\n> > # define _GLIBCXX_NPROCS 0\n> > #endif\n> >\n> >   unsigned int\n> >   thread::hardware_concurrency() noexcept\n> >   {\n> >     int __n = _GLIBCXX_NPROCS;\n> >     if (__n < 0)\n> >       __n = 0;\n> >     return __n;\n> >   }\n> >\n> > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > is a wrapper around the former). So it seems that, while the C++ API\n> > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > accurate value, the implementations we have will do the right thing.\n> > They're however a bit expensive, as they open a file in /sys, but that's\n> > probably the best we can do.\n> >\n> > > +\n> > > +     MutexLocker locker(data_->mutex_);\n> > > +     data_->cpuset_ = cpu_set_t();\n> > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > +\n> > > +     for (const unsigned int &cpu : cpus) {\n> > > +             if (cpu >= numCpus) {\n> > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> >\n> > It's not ignored anymore\n> >\n> >                         LOG(Thread, Error)\n> >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> >\n> \n> Right, updated.\n> \n> > > +                     return -EINVAL;\n> > > +             }\n> > > +\n> > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > +     }\n> > > +\n> > > +     if (data_->running_)\n> > > +             setThreadAffinityInternal();\n> > > +\n> > > +     return 0;\n> > > +}\n> > > +\n> > > +void Thread::setThreadAffinityInternal()\n> > > +{\n> > > +     if (!data_->cpuset_)\n> > > +             return;\n> > > +\n> > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> >\n> > With this implementation, it's not possible to set the thread's CPU\n> > affinity before starting the thread. That's a limitation of the C++\n> > std::thread API, it could be done if we used the native pthread API\n> > instead. Is that a problem ? Could you actually point me to code that\n> > uses Thread::setThreadAffinity() ?\n> \n> There are two main usages in mtkisp7:\n> 1. Setting pipeline handler's thread [1].\n\nHeads up on that one, we probably want to handle this in the libcamera\ncore. Multiple pipeline handlers can run on the same platform, currently\nsharing a single camera manager thread. If they all tried to set thread\naffinity, that would be messy. I'd like to explore how we could\ncoordinate that in a cleaner way.\n\n> 2. Setting IPA's two new threads [2].\n> \n> The first one sets thread affinity to a thread that has already been\n> started, while the second one set new threads.\n\nOut of curiosity, have you measured the difference in performance ? Is\nit a matter of pinning the threads to big cores to ensure they have\nenough CPU time ? Or a matter of avoiding latency caused by migration ?\nOr something else ?\n\n> By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> This means that we need to implement libcamera::Thread with pthread,\n> instead of std::thread?\n\nYes. It doesn't mean we have to do that, I'm only asking if you think\nthat would be a good idea, or overkill.\n\n> [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> \n> > Even without the ability to set the affinity before starting the thread,\n> > we could set it from within the thread before calling the run()\n> > function. This would avoid the main workload being run on incorrect\n> > CPUs. Is that something that would be useful ?\n> \n> You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> (before `run()`), instead of at the end of `start()`?\n\nYes. Again, I'm not saing it has to be done, just exploring options.\n\n> > Regardless of which option we implement, Thread::setThreadAffinity()\n> > should document what happens if it's called before calling\n> > Thread::start().\n> >\n> > > +}\n> > > +\n> > >  /**\n> > >   * \\brief Check if the thread is running\n> > >   *","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 649BFBD16B\n\tfor <parsemail@patchwork.libcamera.org>;\n\tTue, 22 Oct 2024 22:49:10 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 75EA465391;\n\tWed, 23 Oct 2024 00:49:09 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 9A4B1633C8\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 23 Oct 2024 00:49:08 +0200 (CEST)","from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi\n\t[81.175.209.231])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 871843EA;\n\tWed, 23 Oct 2024 00:47:21 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"bUxhZoqK\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1729637241;\n\tbh=tuxQEqLhUmlMBjeJQsbQ+Y3EQsbQM2LYlS6jDoxWohM=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=bUxhZoqKEHpTQvQym43JUpM/3TONyRxJRAlZSI8sC9eImallVYxSsAw0YjynyXfT4\n\tWheOzV/kh2dxXMdPJlARx+qLDfIufstbnd/HpFImNRkBH+wzdAJHbb8gycFwsUOK+u\n\t2aSli0uywGTjZbz9Ezn19nNoerTAPbpdhCH+wsGA=","Date":"Wed, 23 Oct 2024 01:49:01 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Cc":"libcamera-devel@lists.libcamera.org,\n\tHan-Lin Chen <hanlinchen@chromium.org>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","Message-ID":"<20241022224901.GI9497@pendragon.ideasonboard.com>","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31884,"web_url":"https://patchwork.libcamera.org/comment/31884/","msgid":"<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>","date":"2024-10-23T06:46:38","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":117,"url":"https://patchwork.libcamera.org/api/people/117/","name":"Cheng-Hao Yang","email":"chenghaoyang@chromium.org"},"content":"Hi Laurent,\n\nOn Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart\n<laurent.pinchart@ideasonboard.com> wrote:\n>\n> On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> > Hi Laurent,\n> >\n> > I'll update a new version when all changes are applied.\n>\n> Yes, let's finalize the discussion first :-)\n>\n> > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > > >\n> > > > Add method to set thread affinity to Thread class.\n> > > >\n> > > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > ---\n> > > >  include/libcamera/base/thread.h |  5 ++++\n> > > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > > >  2 files changed, 51 insertions(+)\n> > > >\n> > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > > index 4f33de63d..3209d4f7c 100644\n> > > > --- a/include/libcamera/base/thread.h\n> > > > +++ b/include/libcamera/base/thread.h\n> > > > @@ -15,6 +15,7 @@\n> > > >\n> > > >  #include <libcamera/base/message.h>\n> > > >  #include <libcamera/base/signal.h>\n> > > > +#include <libcamera/base/span.h>\n> > > >  #include <libcamera/base/utils.h>\n> > > >\n> > > >  namespace libcamera {\n> > > > @@ -35,6 +36,8 @@ public:\n> > > >       void exit(int code = 0);\n> > > >       bool wait(utils::duration duration = utils::duration::max());\n> > > >\n> > > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > > +\n> > > >       bool isRunning();\n> > > >\n> > > >       Signal<> finished;\n> > > > @@ -54,6 +57,8 @@ private:\n> > > >       void startThread();\n> > > >       void finishThread();\n> > > >\n> > > > +     void setThreadAffinityInternal();\n> > > > +\n> > > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > > >       void removeMessages(Object *receiver);\n> > > >\n> > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > > index 8735670b8..5dc787dba 100644\n> > > > --- a/src/libcamera/base/thread.cpp\n> > > > +++ b/src/libcamera/base/thread.cpp\n> > > > @@ -9,6 +9,7 @@\n> > > >\n> > > >  #include <atomic>\n> > > >  #include <list>\n> > > > +#include <optional>\n> > > >  #include <sys/syscall.h>\n> > > >  #include <sys/types.h>\n> > > >  #include <unistd.h>\n> > > > @@ -128,6 +129,8 @@ private:\n> > > >       int exitCode_;\n> > > >\n> > > >       MessageQueue messages_;\n> > > > +\n> > > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> > >\n> > > Is there a need to initialize this to std::nullopt, as that's what the\n> > > default constructor does ?\n> >\n> > Removed.\n> >\n> > > >  };\n> > > >\n> > > >  /**\n> > > > @@ -254,6 +257,8 @@ void Thread::start()\n> > > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > > >\n> > > >       thread_ = std::thread(&Thread::startThread, this);\n> > > > +\n> > > > +     setThreadAffinityInternal();\n> > > >  }\n> > > >\n> > > >  void Thread::startThread()\n> > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > > >       return hasFinished;\n> > > >  }\n> > > >\n> > > > +/**\n> > > > + * \\brief Set the CPU affinity mask of the thread\n> > > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > > + *\n> > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > > + * If any index is invalid, no cpu indices will be set.\n> > >\n> > >  * If any index is invalid, this function won't modify the thread affinity and\n> > >  * will return an error.\n> >\n> > Done.\n> >\n> > > > + *\n> > > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > > + */\n> > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > > +{\n> > > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> > >\n> > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > > says:\n> > >\n> > >   Returns the number of concurrent threads supported by the\n> > >   implementation. The value should be considered only a hint.\n> > >\n> > >   [snip]\n> > >\n> > >   Return value\n> > >\n> > >   Number of concurrent threads supported. If the value is not well\n> > >   defined or not computable, returns 0.\n> > >\n> > > libcxx seems to have the following implementation:\n> > >\n> > > unsigned thread::hardware_concurrency() noexcept {\n> > > #if defined(_SC_NPROCESSORS_ONLN)\n> > >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> > >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> > >   // does not have a definite limit.\n> > >   // if sysconf returns some other negative number, we have no idea\n> > >   // what is going on. Default to something safe.\n> > >   if (result < 0)\n> > >     return 0;\n> > >   return static_cast<unsigned>(result);\n> > > #elif defined(_LIBCPP_WIN32API)\n> > >   SYSTEM_INFO info;\n> > >   GetSystemInfo(&info);\n> > >   return info.dwNumberOfProcessors;\n> > > #else // defined(CTL_HW) && defined(HW_NCPU)\n> > >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> > >   // instructions on other architectures.\n> > > #  if defined(_LIBCPP_WARNING)\n> > >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > > #  else\n> > > #    warning hardware_concurrency not yet implemented\n> > > #  endif\n> > >   return 0; // Means not computable [thread.thread.static]\n> > > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > > }\n> > >\n> > > while libstdc++ uses\n> > >\n> > > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > > # include <sys/sysinfo.h>\n> > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > > # include <stddef.h>\n> > > # include <sys/sysctl.h>\n> > > static inline int get_nprocs()\n> > > {\n> > >  int count;\n> > >  size_t size = sizeof(count);\n> > >  int mib[] = { CTL_HW, HW_NCPU };\n> > >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> > >    return count;\n> > >  return 0;\n> > > }\n> > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > > # include <unistd.h>\n> > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > > # include <unistd.h>\n> > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > > #else\n> > > # define _GLIBCXX_NPROCS 0\n> > > #endif\n> > >\n> > >   unsigned int\n> > >   thread::hardware_concurrency() noexcept\n> > >   {\n> > >     int __n = _GLIBCXX_NPROCS;\n> > >     if (__n < 0)\n> > >       __n = 0;\n> > >     return __n;\n> > >   }\n> > >\n> > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > > is a wrapper around the former). So it seems that, while the C++ API\n> > > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > > accurate value, the implementations we have will do the right thing.\n> > > They're however a bit expensive, as they open a file in /sys, but that's\n> > > probably the best we can do.\n> > >\n> > > > +\n> > > > +     MutexLocker locker(data_->mutex_);\n> > > > +     data_->cpuset_ = cpu_set_t();\n> > > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > > +\n> > > > +     for (const unsigned int &cpu : cpus) {\n> > > > +             if (cpu >= numCpus) {\n> > > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> > >\n> > > It's not ignored anymore\n> > >\n> > >                         LOG(Thread, Error)\n> > >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> > >\n> >\n> > Right, updated.\n> >\n> > > > +                     return -EINVAL;\n> > > > +             }\n> > > > +\n> > > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > > +     }\n> > > > +\n> > > > +     if (data_->running_)\n> > > > +             setThreadAffinityInternal();\n> > > > +\n> > > > +     return 0;\n> > > > +}\n> > > > +\n> > > > +void Thread::setThreadAffinityInternal()\n> > > > +{\n> > > > +     if (!data_->cpuset_)\n> > > > +             return;\n> > > > +\n> > > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> > >\n> > > With this implementation, it's not possible to set the thread's CPU\n> > > affinity before starting the thread. That's a limitation of the C++\n> > > std::thread API, it could be done if we used the native pthread API\n> > > instead. Is that a problem ? Could you actually point me to code that\n> > > uses Thread::setThreadAffinity() ?\n> >\n> > There are two main usages in mtkisp7:\n> > 1. Setting pipeline handler's thread [1].\n>\n> Heads up on that one, we probably want to handle this in the libcamera\n> core. Multiple pipeline handlers can run on the same platform, currently\n> sharing a single camera manager thread. If they all tried to set thread\n> affinity, that would be messy. I'd like to explore how we could\n> coordinate that in a cleaner way.\n\nYes, your concern makes a lot of sense. If we want to process requests\non a different CPU core in pipeline handlers, would it be better to create\nanother thread in each pipeline handler? That involves an extra context\nswitch per frame though...\n\n>\n> > 2. Setting IPA's two new threads [2].\n> >\n> > The first one sets thread affinity to a thread that has already been\n> > started, while the second one set new threads.\n>\n> Out of curiosity, have you measured the difference in performance ? Is\n> it a matter of pinning the threads to big cores to ensure they have\n> enough CPU time ? Or a matter of avoiding latency caused by migration ?\n> Or something else ?\n\nWe run CTS, and the current config passes the performance tests with\nsufficient fps and low enough latencies.\n\nWithout this functionality though, I think we can still somehow pass CTS\nwith more retries. If it's controversial, we can skip this for now.\n\nWe can also allow setting thread affinity when the thread hasn't been\nstarted yet, which at least fits the second use case in mtkisp7.\n\n>\n> > By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> > This means that we need to implement libcamera::Thread with pthread,\n> > instead of std::thread?\n>\n> Yes. It doesn't mean we have to do that, I'm only asking if you think\n> that would be a good idea, or overkill.\n\nI find it a bit an overkill :)\nThe approach below should be much easier.\n\n>\n> > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> >\n> > > Even without the ability to set the affinity before starting the thread,\n> > > we could set it from within the thread before calling the run()\n> > > function. This would avoid the main workload being run on incorrect\n> > > CPUs. Is that something that would be useful ?\n> >\n> > You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> > (before `run()`), instead of at the end of `start()`?\n>\n> Yes. Again, I'm not saing it has to be done, just exploring options.\n\nThat makes a lot of sense. It works on ciri as well in my trial.\n\nBR,\nHarvey\n\n>\n> > > Regardless of which option we implement, Thread::setThreadAffinity()\n> > > should document what happens if it's called before calling\n> > > Thread::start().\n> > >\n> > > > +}\n> > > > +\n> > > >  /**\n> > > >   * \\brief Check if the thread is running\n> > > >   *\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 27D84C0DA4\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 23 Oct 2024 06:46:54 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 03C4065391;\n\tWed, 23 Oct 2024 08:46:52 +0200 (CEST)","from mail-lj1-x236.google.com (mail-lj1-x236.google.com\n\t[IPv6:2a00:1450:4864:20::236])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 34C35618C1\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 23 Oct 2024 08:46:51 +0200 (CEST)","by mail-lj1-x236.google.com with SMTP id\n\t38308e7fff4ca-2fb559b0b00so56698841fa.0\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue, 22 Oct 2024 23:46:51 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"HUrlf5fY\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=chromium.org; s=google; t=1729666010; x=1730270810;\n\tdarn=lists.libcamera.org; \n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:from:to:cc:subject:date\n\t:message-id:reply-to;\n\tbh=ZIye/rBsKrrBWZUvg9YjW1EUER/p+RspfUrbrIMTTSs=;\n\tb=HUrlf5fYmgV1/Maa7OjJ9XSbMi6jltI6bkKhHVGwi1yDo73/ub9sfE046B8iB/qc5y\n\tv5d8FzRCSrNgsZbl7UogTM4mnXzgpakGzavXFUuY3GRXeyGPI1BihF4PaNLoGVVWTlvR\n\tMxEI2VszZsWlMlCO/nsngJEOppFbjznouRit8=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20230601; t=1729666010; x=1730270810;\n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc\n\t:subject:date:message-id:reply-to;\n\tbh=ZIye/rBsKrrBWZUvg9YjW1EUER/p+RspfUrbrIMTTSs=;\n\tb=JYX3q9/044YnLfz5+t62Hj4yDaAZXrIbhc9OXW9O0XDgcSdIsv9ZZE1aS6HOKZD6Ty\n\tkdlpwF7Lz3NZY/nx7iJKxEG9Z1rEnfP3CqdWNuqJaXOQQ97v07NchH0TTiGtyUIZchYR\n\tus3+CkSge56LClSwLKDMri25VvSW+A0OCfli/8wwwaLcK3wMqb/dJW/P8bJQOwxsDnEm\n\tCHLbkZkbx08Uzy5HWTOP9gWlTVk2KgKeynw50+Tiroat12hYw1RrMRs7zvaUFN4NQCEP\n\toJfvHZNfkz9ZuJWa7uc53kGPyQ3DuThEm3H9U1QEGTtG53ie20MxCDgF0wQtDta09oK2\n\tiqAQ==","X-Gm-Message-State":"AOJu0YxsE4mvXIcnMcQXsiDafUCqIoFRQd74MIy5+lSQCeIqt1IscFJn\n\tpdd3dfCQ0PzGWzfNKdlQoitXv94/PVHb/kaoyHpOOTyaIRsquIL5z4oJ8uqmXxT72v5/FfDmnGy\n\tYs/5yMTNLw+CMqORTfUkox7NK6cxeYxSm8L/5","X-Google-Smtp-Source":"AGHT+IFWw/TPrDJNdL2ViF7IYAe/4cjNtxRkmRr8Yf3tXzKyj10qh9z89Eun6vZQM6aDdsf6bw6qUw6hCsIzYZMkCUA=","X-Received":"by 2002:a2e:8242:0:b0:2fa:d7bf:6f43 with SMTP id\n\t38308e7fff4ca-2fc9d375eb5mr5648111fa.27.1729666010067;\n\tTue, 22 Oct 2024 23:46:50 -0700 (PDT)","MIME-Version":"1.0","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>\n\t<20241022224901.GI9497@pendragon.ideasonboard.com>","In-Reply-To":"<20241022224901.GI9497@pendragon.ideasonboard.com>","From":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Date":"Wed, 23 Oct 2024 14:46:38 +0800","Message-ID":"<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Cc":"libcamera-devel@lists.libcamera.org, \n\tHan-Lin Chen <hanlinchen@chromium.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31894,"web_url":"https://patchwork.libcamera.org/comment/31894/","msgid":"<20241023234351.GD5873@pendragon.ideasonboard.com>","date":"2024-10-23T23:43:51","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"On Wed, Oct 23, 2024 at 02:46:38PM +0800, Cheng-Hao Yang wrote:\n> On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart wrote:\n> > On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> > > Hi Laurent,\n> > >\n> > > I'll update a new version when all changes are applied.\n> >\n> > Yes, let's finalize the discussion first :-)\n> >\n> > > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > >\n> > > > > Add method to set thread affinity to Thread class.\n> > > > >\n> > > > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > ---\n> > > > >  include/libcamera/base/thread.h |  5 ++++\n> > > > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > > > >  2 files changed, 51 insertions(+)\n> > > > >\n> > > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > > > index 4f33de63d..3209d4f7c 100644\n> > > > > --- a/include/libcamera/base/thread.h\n> > > > > +++ b/include/libcamera/base/thread.h\n> > > > > @@ -15,6 +15,7 @@\n> > > > >\n> > > > >  #include <libcamera/base/message.h>\n> > > > >  #include <libcamera/base/signal.h>\n> > > > > +#include <libcamera/base/span.h>\n> > > > >  #include <libcamera/base/utils.h>\n> > > > >\n> > > > >  namespace libcamera {\n> > > > > @@ -35,6 +36,8 @@ public:\n> > > > >       void exit(int code = 0);\n> > > > >       bool wait(utils::duration duration = utils::duration::max());\n> > > > >\n> > > > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > > > +\n> > > > >       bool isRunning();\n> > > > >\n> > > > >       Signal<> finished;\n> > > > > @@ -54,6 +57,8 @@ private:\n> > > > >       void startThread();\n> > > > >       void finishThread();\n> > > > >\n> > > > > +     void setThreadAffinityInternal();\n> > > > > +\n> > > > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > > > >       void removeMessages(Object *receiver);\n> > > > >\n> > > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > > > index 8735670b8..5dc787dba 100644\n> > > > > --- a/src/libcamera/base/thread.cpp\n> > > > > +++ b/src/libcamera/base/thread.cpp\n> > > > > @@ -9,6 +9,7 @@\n> > > > >\n> > > > >  #include <atomic>\n> > > > >  #include <list>\n> > > > > +#include <optional>\n> > > > >  #include <sys/syscall.h>\n> > > > >  #include <sys/types.h>\n> > > > >  #include <unistd.h>\n> > > > > @@ -128,6 +129,8 @@ private:\n> > > > >       int exitCode_;\n> > > > >\n> > > > >       MessageQueue messages_;\n> > > > > +\n> > > > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> > > >\n> > > > Is there a need to initialize this to std::nullopt, as that's what the\n> > > > default constructor does ?\n> > >\n> > > Removed.\n> > >\n> > > > >  };\n> > > > >\n> > > > >  /**\n> > > > > @@ -254,6 +257,8 @@ void Thread::start()\n> > > > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > > > >\n> > > > >       thread_ = std::thread(&Thread::startThread, this);\n> > > > > +\n> > > > > +     setThreadAffinityInternal();\n> > > > >  }\n> > > > >\n> > > > >  void Thread::startThread()\n> > > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > > > >       return hasFinished;\n> > > > >  }\n> > > > >\n> > > > > +/**\n> > > > > + * \\brief Set the CPU affinity mask of the thread\n> > > > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > > > + *\n> > > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > > > + * If any index is invalid, no cpu indices will be set.\n> > > >\n> > > >  * If any index is invalid, this function won't modify the thread affinity and\n> > > >  * will return an error.\n> > >\n> > > Done.\n> > >\n> > > > > + *\n> > > > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > > > + */\n> > > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > > > +{\n> > > > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> > > >\n> > > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > > > says:\n> > > >\n> > > >   Returns the number of concurrent threads supported by the\n> > > >   implementation. The value should be considered only a hint.\n> > > >\n> > > >   [snip]\n> > > >\n> > > >   Return value\n> > > >\n> > > >   Number of concurrent threads supported. If the value is not well\n> > > >   defined or not computable, returns 0.\n> > > >\n> > > > libcxx seems to have the following implementation:\n> > > >\n> > > > unsigned thread::hardware_concurrency() noexcept {\n> > > > #if defined(_SC_NPROCESSORS_ONLN)\n> > > >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> > > >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> > > >   // does not have a definite limit.\n> > > >   // if sysconf returns some other negative number, we have no idea\n> > > >   // what is going on. Default to something safe.\n> > > >   if (result < 0)\n> > > >     return 0;\n> > > >   return static_cast<unsigned>(result);\n> > > > #elif defined(_LIBCPP_WIN32API)\n> > > >   SYSTEM_INFO info;\n> > > >   GetSystemInfo(&info);\n> > > >   return info.dwNumberOfProcessors;\n> > > > #else // defined(CTL_HW) && defined(HW_NCPU)\n> > > >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> > > >   // instructions on other architectures.\n> > > > #  if defined(_LIBCPP_WARNING)\n> > > >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > > > #  else\n> > > > #    warning hardware_concurrency not yet implemented\n> > > > #  endif\n> > > >   return 0; // Means not computable [thread.thread.static]\n> > > > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > > > }\n> > > >\n> > > > while libstdc++ uses\n> > > >\n> > > > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > > > # include <sys/sysinfo.h>\n> > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > > > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > > > # include <stddef.h>\n> > > > # include <sys/sysctl.h>\n> > > > static inline int get_nprocs()\n> > > > {\n> > > >  int count;\n> > > >  size_t size = sizeof(count);\n> > > >  int mib[] = { CTL_HW, HW_NCPU };\n> > > >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> > > >    return count;\n> > > >  return 0;\n> > > > }\n> > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > > > # include <unistd.h>\n> > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > > > # include <unistd.h>\n> > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > > > #else\n> > > > # define _GLIBCXX_NPROCS 0\n> > > > #endif\n> > > >\n> > > >   unsigned int\n> > > >   thread::hardware_concurrency() noexcept\n> > > >   {\n> > > >     int __n = _GLIBCXX_NPROCS;\n> > > >     if (__n < 0)\n> > > >       __n = 0;\n> > > >     return __n;\n> > > >   }\n> > > >\n> > > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > > > is a wrapper around the former). So it seems that, while the C++ API\n> > > > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > > > accurate value, the implementations we have will do the right thing.\n> > > > They're however a bit expensive, as they open a file in /sys, but that's\n> > > > probably the best we can do.\n> > > >\n> > > > > +\n> > > > > +     MutexLocker locker(data_->mutex_);\n> > > > > +     data_->cpuset_ = cpu_set_t();\n> > > > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > > > +\n> > > > > +     for (const unsigned int &cpu : cpus) {\n> > > > > +             if (cpu >= numCpus) {\n> > > > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> > > >\n> > > > It's not ignored anymore\n> > > >\n> > > >                         LOG(Thread, Error)\n> > > >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> > > >\n> > >\n> > > Right, updated.\n> > >\n> > > > > +                     return -EINVAL;\n> > > > > +             }\n> > > > > +\n> > > > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > > > +     }\n> > > > > +\n> > > > > +     if (data_->running_)\n> > > > > +             setThreadAffinityInternal();\n> > > > > +\n> > > > > +     return 0;\n> > > > > +}\n> > > > > +\n> > > > > +void Thread::setThreadAffinityInternal()\n> > > > > +{\n> > > > > +     if (!data_->cpuset_)\n> > > > > +             return;\n> > > > > +\n> > > > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> > > >\n> > > > With this implementation, it's not possible to set the thread's CPU\n> > > > affinity before starting the thread. That's a limitation of the C++\n> > > > std::thread API, it could be done if we used the native pthread API\n> > > > instead. Is that a problem ? Could you actually point me to code that\n> > > > uses Thread::setThreadAffinity() ?\n> > >\n> > > There are two main usages in mtkisp7:\n> > > 1. Setting pipeline handler's thread [1].\n> >\n> > Heads up on that one, we probably want to handle this in the libcamera\n> > core. Multiple pipeline handlers can run on the same platform, currently\n> > sharing a single camera manager thread. If they all tried to set thread\n> > affinity, that would be messy. I'd like to explore how we could\n> > coordinate that in a cleaner way.\n> \n> Yes, your concern makes a lot of sense. If we want to process requests\n> on a different CPU core in pipeline handlers, would it be better to create\n> another thread in each pipeline handler? That involves an extra context\n> switch per frame though...\n\nOne thread per pipeline handler is an option, yes. I don't think it\nrequires a context switch though, we don't have to go from the\napplication thread to the camera manager thread to the pipeline handler\nthread, we could bypass the camera manager thread.\n\nAnother option that was floated around was to use a thread pool in case\none thread per pipeline handler would create too many threads. It's very\ntheoretical at this point whether or not this would help improve\nperformance.\n\nI'm fine keeping a single internal thread in libcamera, but I'd like its\naffinity to be managed in a way that doesn't let different pipeline\nhandlers compete to set different affinities.\n\n> > > 2. Setting IPA's two new threads [2].\n> > >\n> > > The first one sets thread affinity to a thread that has already been\n> > > started, while the second one set new threads.\n> >\n> > Out of curiosity, have you measured the difference in performance ? Is\n> > it a matter of pinning the threads to big cores to ensure they have\n> > enough CPU time ? Or a matter of avoiding latency caused by migration ?\n> > Or something else ?\n> \n> We run CTS, and the current config passes the performance tests with\n> sufficient fps and low enough latencies.\n> \n> Without this functionality though, I think we can still somehow pass CTS\n> with more retries.\n\nIt would be interesting to see what kind of time targets are missed\nwithout this change.\n\n> If it's controversial, we can skip this for now.\n\nNo, it's not controversial, I'd just like to design it right :-)\n\n> We can also allow setting thread affinity when the thread hasn't been\n> started yet, which at least fits the second use case in mtkisp7.\n> \n> > > By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> > > This means that we need to implement libcamera::Thread with pthread,\n> > > instead of std::thread?\n> >\n> > Yes. It doesn't mean we have to do that, I'm only asking if you think\n> > that would be a good idea, or overkill.\n> \n> I find it a bit an overkill :)\n> The approach below should be much easier.\n\nFine with me.\n\n> > > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> > > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> > >\n> > > > Even without the ability to set the affinity before starting the thread,\n> > > > we could set it from within the thread before calling the run()\n> > > > function. This would avoid the main workload being run on incorrect\n> > > > CPUs. Is that something that would be useful ?\n> > >\n> > > You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> > > (before `run()`), instead of at the end of `start()`?\n> >\n> > Yes. Again, I'm not saing it has to be done, just exploring options.\n> \n> That makes a lot of sense. It works on ciri as well in my trial.\n\nWill you submit a new patch with this ?\n\n> > > > Regardless of which option we implement, Thread::setThreadAffinity()\n> > > > should document what happens if it's called before calling\n> > > > Thread::start().\n> > > >\n> > > > > +}\n> > > > > +\n> > > > >  /**\n> > > > >   * \\brief Check if the thread is running\n> > > > >   *","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 86680C3213\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 23 Oct 2024 23:44:00 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 59A6965393;\n\tThu, 24 Oct 2024 01:43:59 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 8A5516053E\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 01:43:58 +0200 (CEST)","from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi\n\t[81.175.209.231])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id BDCB26BE;\n\tThu, 24 Oct 2024 01:42:10 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"SMOS2+/u\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1729726930;\n\tbh=Hlr06pEomIACuBZ0cKlT3VmYihzBX/K4PbiPmSHYuVw=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=SMOS2+/u7YNmoDfMF59gB7tn5sL1XFWlFkMrYA/P3kH7H8Q3WQKzooJRlJEBWUFmM\n\tWno0OCCM4bjGF9JAsd1YFle47lgMqdZSR6157OWdKDWHsOywy71zfj8oaDouuhyg8I\n\tC6Tbs1PMxN02wh9a6yunQa1GJjm3GQ3LQ5w7gm1I=","Date":"Thu, 24 Oct 2024 02:43:51 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Cc":"libcamera-devel@lists.libcamera.org,\n\tHan-Lin Chen <hanlinchen@chromium.org>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","Message-ID":"<20241023234351.GD5873@pendragon.ideasonboard.com>","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>\n\t<20241022224901.GI9497@pendragon.ideasonboard.com>\n\t<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31898,"web_url":"https://patchwork.libcamera.org/comment/31898/","msgid":"<CAEB1ahsA+4CYFQW2k5AWSYRKhisLN6Kxnnx8=r54hUg4n_-yYw@mail.gmail.com>","date":"2024-10-24T08:16:03","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":117,"url":"https://patchwork.libcamera.org/api/people/117/","name":"Cheng-Hao Yang","email":"chenghaoyang@chromium.org"},"content":"Hi Laurent,\n\nOn Thu, Oct 24, 2024 at 7:43 AM Laurent Pinchart\n<laurent.pinchart@ideasonboard.com> wrote:\n>\n> On Wed, Oct 23, 2024 at 02:46:38PM +0800, Cheng-Hao Yang wrote:\n> > On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart wrote:\n> > > On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> > > > Hi Laurent,\n> > > >\n> > > > I'll update a new version when all changes are applied.\n> > >\n> > > Yes, let's finalize the discussion first :-)\n> > >\n> > > > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > > > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > > > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > >\n> > > > > > Add method to set thread affinity to Thread class.\n> > > > > >\n> > > > > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > ---\n> > > > > >  include/libcamera/base/thread.h |  5 ++++\n> > > > > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > > > > >  2 files changed, 51 insertions(+)\n> > > > > >\n> > > > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > > > > index 4f33de63d..3209d4f7c 100644\n> > > > > > --- a/include/libcamera/base/thread.h\n> > > > > > +++ b/include/libcamera/base/thread.h\n> > > > > > @@ -15,6 +15,7 @@\n> > > > > >\n> > > > > >  #include <libcamera/base/message.h>\n> > > > > >  #include <libcamera/base/signal.h>\n> > > > > > +#include <libcamera/base/span.h>\n> > > > > >  #include <libcamera/base/utils.h>\n> > > > > >\n> > > > > >  namespace libcamera {\n> > > > > > @@ -35,6 +36,8 @@ public:\n> > > > > >       void exit(int code = 0);\n> > > > > >       bool wait(utils::duration duration = utils::duration::max());\n> > > > > >\n> > > > > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > > > > +\n> > > > > >       bool isRunning();\n> > > > > >\n> > > > > >       Signal<> finished;\n> > > > > > @@ -54,6 +57,8 @@ private:\n> > > > > >       void startThread();\n> > > > > >       void finishThread();\n> > > > > >\n> > > > > > +     void setThreadAffinityInternal();\n> > > > > > +\n> > > > > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > > > > >       void removeMessages(Object *receiver);\n> > > > > >\n> > > > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > > > > index 8735670b8..5dc787dba 100644\n> > > > > > --- a/src/libcamera/base/thread.cpp\n> > > > > > +++ b/src/libcamera/base/thread.cpp\n> > > > > > @@ -9,6 +9,7 @@\n> > > > > >\n> > > > > >  #include <atomic>\n> > > > > >  #include <list>\n> > > > > > +#include <optional>\n> > > > > >  #include <sys/syscall.h>\n> > > > > >  #include <sys/types.h>\n> > > > > >  #include <unistd.h>\n> > > > > > @@ -128,6 +129,8 @@ private:\n> > > > > >       int exitCode_;\n> > > > > >\n> > > > > >       MessageQueue messages_;\n> > > > > > +\n> > > > > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> > > > >\n> > > > > Is there a need to initialize this to std::nullopt, as that's what the\n> > > > > default constructor does ?\n> > > >\n> > > > Removed.\n> > > >\n> > > > > >  };\n> > > > > >\n> > > > > >  /**\n> > > > > > @@ -254,6 +257,8 @@ void Thread::start()\n> > > > > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > > > > >\n> > > > > >       thread_ = std::thread(&Thread::startThread, this);\n> > > > > > +\n> > > > > > +     setThreadAffinityInternal();\n> > > > > >  }\n> > > > > >\n> > > > > >  void Thread::startThread()\n> > > > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > > > > >       return hasFinished;\n> > > > > >  }\n> > > > > >\n> > > > > > +/**\n> > > > > > + * \\brief Set the CPU affinity mask of the thread\n> > > > > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > > > > + *\n> > > > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > > > > + * If any index is invalid, no cpu indices will be set.\n> > > > >\n> > > > >  * If any index is invalid, this function won't modify the thread affinity and\n> > > > >  * will return an error.\n> > > >\n> > > > Done.\n> > > >\n> > > > > > + *\n> > > > > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > > > > + */\n> > > > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > > > > +{\n> > > > > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> > > > >\n> > > > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > > > > says:\n> > > > >\n> > > > >   Returns the number of concurrent threads supported by the\n> > > > >   implementation. The value should be considered only a hint.\n> > > > >\n> > > > >   [snip]\n> > > > >\n> > > > >   Return value\n> > > > >\n> > > > >   Number of concurrent threads supported. If the value is not well\n> > > > >   defined or not computable, returns 0.\n> > > > >\n> > > > > libcxx seems to have the following implementation:\n> > > > >\n> > > > > unsigned thread::hardware_concurrency() noexcept {\n> > > > > #if defined(_SC_NPROCESSORS_ONLN)\n> > > > >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> > > > >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> > > > >   // does not have a definite limit.\n> > > > >   // if sysconf returns some other negative number, we have no idea\n> > > > >   // what is going on. Default to something safe.\n> > > > >   if (result < 0)\n> > > > >     return 0;\n> > > > >   return static_cast<unsigned>(result);\n> > > > > #elif defined(_LIBCPP_WIN32API)\n> > > > >   SYSTEM_INFO info;\n> > > > >   GetSystemInfo(&info);\n> > > > >   return info.dwNumberOfProcessors;\n> > > > > #else // defined(CTL_HW) && defined(HW_NCPU)\n> > > > >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> > > > >   // instructions on other architectures.\n> > > > > #  if defined(_LIBCPP_WARNING)\n> > > > >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > > > > #  else\n> > > > > #    warning hardware_concurrency not yet implemented\n> > > > > #  endif\n> > > > >   return 0; // Means not computable [thread.thread.static]\n> > > > > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > > > > }\n> > > > >\n> > > > > while libstdc++ uses\n> > > > >\n> > > > > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > > > > # include <sys/sysinfo.h>\n> > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > > > > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > > > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > > > > # include <stddef.h>\n> > > > > # include <sys/sysctl.h>\n> > > > > static inline int get_nprocs()\n> > > > > {\n> > > > >  int count;\n> > > > >  size_t size = sizeof(count);\n> > > > >  int mib[] = { CTL_HW, HW_NCPU };\n> > > > >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> > > > >    return count;\n> > > > >  return 0;\n> > > > > }\n> > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > > > > # include <unistd.h>\n> > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > > > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > > > > # include <unistd.h>\n> > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > > > > #else\n> > > > > # define _GLIBCXX_NPROCS 0\n> > > > > #endif\n> > > > >\n> > > > >   unsigned int\n> > > > >   thread::hardware_concurrency() noexcept\n> > > > >   {\n> > > > >     int __n = _GLIBCXX_NPROCS;\n> > > > >     if (__n < 0)\n> > > > >       __n = 0;\n> > > > >     return __n;\n> > > > >   }\n> > > > >\n> > > > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > > > > is a wrapper around the former). So it seems that, while the C++ API\n> > > > > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > > > > accurate value, the implementations we have will do the right thing.\n> > > > > They're however a bit expensive, as they open a file in /sys, but that's\n> > > > > probably the best we can do.\n> > > > >\n> > > > > > +\n> > > > > > +     MutexLocker locker(data_->mutex_);\n> > > > > > +     data_->cpuset_ = cpu_set_t();\n> > > > > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > > > > +\n> > > > > > +     for (const unsigned int &cpu : cpus) {\n> > > > > > +             if (cpu >= numCpus) {\n> > > > > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> > > > >\n> > > > > It's not ignored anymore\n> > > > >\n> > > > >                         LOG(Thread, Error)\n> > > > >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> > > > >\n> > > >\n> > > > Right, updated.\n> > > >\n> > > > > > +                     return -EINVAL;\n> > > > > > +             }\n> > > > > > +\n> > > > > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > > > > +     }\n> > > > > > +\n> > > > > > +     if (data_->running_)\n> > > > > > +             setThreadAffinityInternal();\n> > > > > > +\n> > > > > > +     return 0;\n> > > > > > +}\n> > > > > > +\n> > > > > > +void Thread::setThreadAffinityInternal()\n> > > > > > +{\n> > > > > > +     if (!data_->cpuset_)\n> > > > > > +             return;\n> > > > > > +\n> > > > > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > > > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> > > > >\n> > > > > With this implementation, it's not possible to set the thread's CPU\n> > > > > affinity before starting the thread. That's a limitation of the C++\n> > > > > std::thread API, it could be done if we used the native pthread API\n> > > > > instead. Is that a problem ? Could you actually point me to code that\n> > > > > uses Thread::setThreadAffinity() ?\n> > > >\n> > > > There are two main usages in mtkisp7:\n> > > > 1. Setting pipeline handler's thread [1].\n> > >\n> > > Heads up on that one, we probably want to handle this in the libcamera\n> > > core. Multiple pipeline handlers can run on the same platform, currently\n> > > sharing a single camera manager thread. If they all tried to set thread\n> > > affinity, that would be messy. I'd like to explore how we could\n> > > coordinate that in a cleaner way.\n> >\n> > Yes, your concern makes a lot of sense. If we want to process requests\n> > on a different CPU core in pipeline handlers, would it be better to create\n> > another thread in each pipeline handler? That involves an extra context\n> > switch per frame though...\n>\n> One thread per pipeline handler is an option, yes. I don't think it\n> requires a context switch though, we don't have to go from the\n> application thread to the camera manager thread to the pipeline handler\n> thread, we could bypass the camera manager thread.\n\nAh right, libcamera::Camera APIs are mostly thread-safe, I forgot.\n\n>\n> Another option that was floated around was to use a thread pool in case\n> one thread per pipeline handler would create too many threads. It's very\n> theoretical at this point whether or not this would help improve\n> performance.\n\nYeah true. We have much work to do in this approach though.\n\n>\n> I'm fine keeping a single internal thread in libcamera, but I'd like its\n> affinity to be managed in a way that doesn't let different pipeline\n> handlers compete to set different affinities.\n\nDefinitely :)\nLet me know if I should prevent setting thread affinity in a thread\nthat's already started.\n\n>\n> > > > 2. Setting IPA's two new threads [2].\n> > > >\n> > > > The first one sets thread affinity to a thread that has already been\n> > > > started, while the second one set new threads.\n> > >\n> > > Out of curiosity, have you measured the difference in performance ? Is\n> > > it a matter of pinning the threads to big cores to ensure they have\n> > > enough CPU time ? Or a matter of avoiding latency caused by migration ?\n> > > Or something else ?\n> >\n> > We run CTS, and the current config passes the performance tests with\n> > sufficient fps and low enough latencies.\n> >\n> > Without this functionality though, I think we can still somehow pass CTS\n> > with more retries.\n>\n> It would be interesting to see what kind of time targets are missed\n> without this change.\n\nFor example:\n```\nandroid.camera.cts\nandroid.hardware.camera2.cts.RecordingTest#testRecordingFromPersistentSurface[1]\nFAILURE: junit.framewo\nrk.AssertionFailedError: Camera 1: Video frame drop rate too high:\n6.976744%, tolerance 5.000000%. Video size: 1280x720, expectedDuration\n[2866.666504,2866.666504], expectedFrameDur\nation 33.333332, frameDropCnt 6, frameCount 86\n```\n\n>\n> > If it's controversial, we can skip this for now.\n>\n> No, it's not controversial, I'd just like to design it right :-)\n>\n> > We can also allow setting thread affinity when the thread hasn't been\n> > started yet, which at least fits the second use case in mtkisp7.\n> >\n> > > > By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> > > > This means that we need to implement libcamera::Thread with pthread,\n> > > > instead of std::thread?\n> > >\n> > > Yes. It doesn't mean we have to do that, I'm only asking if you think\n> > > that would be a good idea, or overkill.\n> >\n> > I find it a bit an overkill :)\n> > The approach below should be much easier.\n>\n> Fine with me.\n>\n> > > > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> > > > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> > > >\n> > > > > Even without the ability to set the affinity before starting the thread,\n> > > > > we could set it from within the thread before calling the run()\n> > > > > function. This would avoid the main workload being run on incorrect\n> > > > > CPUs. Is that something that would be useful ?\n> > > >\n> > > > You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> > > > (before `run()`), instead of at the end of `start()`?\n> > >\n> > > Yes. Again, I'm not saing it has to be done, just exploring options.\n> >\n> > That makes a lot of sense. It works on ciri as well in my trial.\n>\n> Will you submit a new patch with this ?\n\nSure. Uploaded.\n\nBR,\nHarvey\n\n>\n> > > > > Regardless of which option we implement, Thread::setThreadAffinity()\n> > > > > should document what happens if it's called before calling\n> > > > > Thread::start().\n> > > > >\n> > > > > > +}\n> > > > > > +\n> > > > > >  /**\n> > > > > >   * \\brief Check if the thread is running\n> > > > > >   *\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 1A8C9C3213\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 24 Oct 2024 08:16:18 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id B7D7865394;\n\tThu, 24 Oct 2024 10:16:17 +0200 (CEST)","from mail-lj1-x230.google.com (mail-lj1-x230.google.com\n\t[IPv6:2a00:1450:4864:20::230])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 6716A6538A\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 10:16:15 +0200 (CEST)","by mail-lj1-x230.google.com with SMTP id\n\t38308e7fff4ca-2fb443746b8so5761021fa.0\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 01:16:15 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"ibJwoGCn\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=chromium.org; s=google; t=1729757775; x=1730362575;\n\tdarn=lists.libcamera.org; \n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:from:to:cc:subject:date\n\t:message-id:reply-to;\n\tbh=6OS+dck/5a3sx5Nqqm2cTiwNwzuQmKnhiQPzHMyfMK8=;\n\tb=ibJwoGCn+edAUvYOs+j91WOBUKdqLj5jnQddB2wFPnRSrQJY/tYxsKYeBG0NaAKrof\n\tqWlYpqbs7oeJvSDBUHVGBIWyH3TJ3oNSWBkub7+jgamO+M8LRKBlM9UZiwDokqe97Jcm\n\trRDrmYJM+caqZr1oq4BGvtZNwe8SNsJ7YOjo4=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20230601; t=1729757775; x=1730362575;\n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc\n\t:subject:date:message-id:reply-to;\n\tbh=6OS+dck/5a3sx5Nqqm2cTiwNwzuQmKnhiQPzHMyfMK8=;\n\tb=bQHKNCyfpio4HkSjUcf7PZHHZ2qgRCs7Yq2w9V5NBtX6uTyoT6gTDdzWUD1cSkTd1L\n\tw/KAc4GbzDdhitp9e9KItKza3kPnURl4rXSbY2UIxLKCOn6e+VkiKACSeKQD8ru2e3Kc\n\tCwiFFWmBht/DLP07xgeHqtYFDOJ63lW5KHOltQYJm1NuLNLRU6zLc/1HMmbFWUZhwfS3\n\tkTa16yS94ZoNXueiTMmJ13mVgkOkqhYZA/3UfsZaow29x1HgRrTyqDN8qf9//ztSPpgR\n\taq9hppPkKeMBLNjfxJuevCMrx1QvHwUgP9vqKLaMdZk6+j/ev5znA9JdTLFtDDk/3vK0\n\t8MZg==","X-Gm-Message-State":"AOJu0Yzpk8pPv2PsL2zYoEUd728ApdCLWI9QBuOnmu+LOGRZiYhWcu6n\n\thOa0Kdm2bGKFNp2ZLgmbWT5DVpxf0uEfDRNzxzIcHonyDyKgAPlKNHr+9qdadSMTkKq98ibv9dB\n\t14mk9RXMd+5nCp3rhOVBLpF9oobQ7/TVMkVNJ35dFo3pJQV8ABA==","X-Google-Smtp-Source":"AGHT+IFDfxHkJEsQr5nUFQqG6U0P7qfxRyCpDXdW5sHA9ABvz7fWDQRMDeEA0O3/E1Xz0sJnSQtRBms3D3U96R7CnYw=","X-Received":"by 2002:a2e:be24:0:b0:2ef:17f7:6e1d with SMTP id\n\t38308e7fff4ca-2fca81c2448mr7832091fa.4.1729757774385; Thu, 24 Oct 2024\n\t01:16:14 -0700 (PDT)","MIME-Version":"1.0","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>\n\t<20241022224901.GI9497@pendragon.ideasonboard.com>\n\t<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>\n\t<20241023234351.GD5873@pendragon.ideasonboard.com>","In-Reply-To":"<20241023234351.GD5873@pendragon.ideasonboard.com>","From":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Date":"Thu, 24 Oct 2024 16:16:03 +0800","Message-ID":"<CAEB1ahsA+4CYFQW2k5AWSYRKhisLN6Kxnnx8=r54hUg4n_-yYw@mail.gmail.com>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Cc":"libcamera-devel@lists.libcamera.org, \n\tHan-Lin Chen <hanlinchen@chromium.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31899,"web_url":"https://patchwork.libcamera.org/comment/31899/","msgid":"<20241024083148.GE5873@pendragon.ideasonboard.com>","date":"2024-10-24T08:31:48","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"On Thu, Oct 24, 2024 at 04:16:03PM +0800, Cheng-Hao Yang wrote:\n> On Thu, Oct 24, 2024 at 7:43 AM Laurent Pinchart wrote:\n> > On Wed, Oct 23, 2024 at 02:46:38PM +0800, Cheng-Hao Yang wrote:\n> > > On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart wrote:\n> > > > On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> > > > > Hi Laurent,\n> > > > >\n> > > > > I'll update a new version when all changes are applied.\n> > > >\n> > > > Yes, let's finalize the discussion first :-)\n> > > >\n> > > > > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > > > > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > > > > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > > >\n> > > > > > > Add method to set thread affinity to Thread class.\n> > > > > > >\n> > > > > > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > > ---\n> > > > > > >  include/libcamera/base/thread.h |  5 ++++\n> > > > > > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > > > > > >  2 files changed, 51 insertions(+)\n> > > > > > >\n> > > > > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > > > > > index 4f33de63d..3209d4f7c 100644\n> > > > > > > --- a/include/libcamera/base/thread.h\n> > > > > > > +++ b/include/libcamera/base/thread.h\n> > > > > > > @@ -15,6 +15,7 @@\n> > > > > > >\n> > > > > > >  #include <libcamera/base/message.h>\n> > > > > > >  #include <libcamera/base/signal.h>\n> > > > > > > +#include <libcamera/base/span.h>\n> > > > > > >  #include <libcamera/base/utils.h>\n> > > > > > >\n> > > > > > >  namespace libcamera {\n> > > > > > > @@ -35,6 +36,8 @@ public:\n> > > > > > >       void exit(int code = 0);\n> > > > > > >       bool wait(utils::duration duration = utils::duration::max());\n> > > > > > >\n> > > > > > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > > > > > +\n> > > > > > >       bool isRunning();\n> > > > > > >\n> > > > > > >       Signal<> finished;\n> > > > > > > @@ -54,6 +57,8 @@ private:\n> > > > > > >       void startThread();\n> > > > > > >       void finishThread();\n> > > > > > >\n> > > > > > > +     void setThreadAffinityInternal();\n> > > > > > > +\n> > > > > > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > > > > > >       void removeMessages(Object *receiver);\n> > > > > > >\n> > > > > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > > > > > index 8735670b8..5dc787dba 100644\n> > > > > > > --- a/src/libcamera/base/thread.cpp\n> > > > > > > +++ b/src/libcamera/base/thread.cpp\n> > > > > > > @@ -9,6 +9,7 @@\n> > > > > > >\n> > > > > > >  #include <atomic>\n> > > > > > >  #include <list>\n> > > > > > > +#include <optional>\n> > > > > > >  #include <sys/syscall.h>\n> > > > > > >  #include <sys/types.h>\n> > > > > > >  #include <unistd.h>\n> > > > > > > @@ -128,6 +129,8 @@ private:\n> > > > > > >       int exitCode_;\n> > > > > > >\n> > > > > > >       MessageQueue messages_;\n> > > > > > > +\n> > > > > > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> > > > > >\n> > > > > > Is there a need to initialize this to std::nullopt, as that's what the\n> > > > > > default constructor does ?\n> > > > >\n> > > > > Removed.\n> > > > >\n> > > > > > >  };\n> > > > > > >\n> > > > > > >  /**\n> > > > > > > @@ -254,6 +257,8 @@ void Thread::start()\n> > > > > > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > > > > > >\n> > > > > > >       thread_ = std::thread(&Thread::startThread, this);\n> > > > > > > +\n> > > > > > > +     setThreadAffinityInternal();\n> > > > > > >  }\n> > > > > > >\n> > > > > > >  void Thread::startThread()\n> > > > > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > > > > > >       return hasFinished;\n> > > > > > >  }\n> > > > > > >\n> > > > > > > +/**\n> > > > > > > + * \\brief Set the CPU affinity mask of the thread\n> > > > > > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > > > > > + *\n> > > > > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > > > > > + * If any index is invalid, no cpu indices will be set.\n> > > > > >\n> > > > > >  * If any index is invalid, this function won't modify the thread affinity and\n> > > > > >  * will return an error.\n> > > > >\n> > > > > Done.\n> > > > >\n> > > > > > > + *\n> > > > > > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > > > > > + */\n> > > > > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > > > > > +{\n> > > > > > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> > > > > >\n> > > > > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > > > > > says:\n> > > > > >\n> > > > > >   Returns the number of concurrent threads supported by the\n> > > > > >   implementation. The value should be considered only a hint.\n> > > > > >\n> > > > > >   [snip]\n> > > > > >\n> > > > > >   Return value\n> > > > > >\n> > > > > >   Number of concurrent threads supported. If the value is not well\n> > > > > >   defined or not computable, returns 0.\n> > > > > >\n> > > > > > libcxx seems to have the following implementation:\n> > > > > >\n> > > > > > unsigned thread::hardware_concurrency() noexcept {\n> > > > > > #if defined(_SC_NPROCESSORS_ONLN)\n> > > > > >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> > > > > >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> > > > > >   // does not have a definite limit.\n> > > > > >   // if sysconf returns some other negative number, we have no idea\n> > > > > >   // what is going on. Default to something safe.\n> > > > > >   if (result < 0)\n> > > > > >     return 0;\n> > > > > >   return static_cast<unsigned>(result);\n> > > > > > #elif defined(_LIBCPP_WIN32API)\n> > > > > >   SYSTEM_INFO info;\n> > > > > >   GetSystemInfo(&info);\n> > > > > >   return info.dwNumberOfProcessors;\n> > > > > > #else // defined(CTL_HW) && defined(HW_NCPU)\n> > > > > >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> > > > > >   // instructions on other architectures.\n> > > > > > #  if defined(_LIBCPP_WARNING)\n> > > > > >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > > > > > #  else\n> > > > > > #    warning hardware_concurrency not yet implemented\n> > > > > > #  endif\n> > > > > >   return 0; // Means not computable [thread.thread.static]\n> > > > > > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > > > > > }\n> > > > > >\n> > > > > > while libstdc++ uses\n> > > > > >\n> > > > > > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > > > > > # include <sys/sysinfo.h>\n> > > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > > > > > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > > > > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > > > > > # include <stddef.h>\n> > > > > > # include <sys/sysctl.h>\n> > > > > > static inline int get_nprocs()\n> > > > > > {\n> > > > > >  int count;\n> > > > > >  size_t size = sizeof(count);\n> > > > > >  int mib[] = { CTL_HW, HW_NCPU };\n> > > > > >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> > > > > >    return count;\n> > > > > >  return 0;\n> > > > > > }\n> > > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > > > > > # include <unistd.h>\n> > > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > > > > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > > > > > # include <unistd.h>\n> > > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > > > > > #else\n> > > > > > # define _GLIBCXX_NPROCS 0\n> > > > > > #endif\n> > > > > >\n> > > > > >   unsigned int\n> > > > > >   thread::hardware_concurrency() noexcept\n> > > > > >   {\n> > > > > >     int __n = _GLIBCXX_NPROCS;\n> > > > > >     if (__n < 0)\n> > > > > >       __n = 0;\n> > > > > >     return __n;\n> > > > > >   }\n> > > > > >\n> > > > > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > > > > > is a wrapper around the former). So it seems that, while the C++ API\n> > > > > > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > > > > > accurate value, the implementations we have will do the right thing.\n> > > > > > They're however a bit expensive, as they open a file in /sys, but that's\n> > > > > > probably the best we can do.\n> > > > > >\n> > > > > > > +\n> > > > > > > +     MutexLocker locker(data_->mutex_);\n> > > > > > > +     data_->cpuset_ = cpu_set_t();\n> > > > > > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > > > > > +\n> > > > > > > +     for (const unsigned int &cpu : cpus) {\n> > > > > > > +             if (cpu >= numCpus) {\n> > > > > > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> > > > > >\n> > > > > > It's not ignored anymore\n> > > > > >\n> > > > > >                         LOG(Thread, Error)\n> > > > > >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> > > > > >\n> > > > >\n> > > > > Right, updated.\n> > > > >\n> > > > > > > +                     return -EINVAL;\n> > > > > > > +             }\n> > > > > > > +\n> > > > > > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > > > > > +     }\n> > > > > > > +\n> > > > > > > +     if (data_->running_)\n> > > > > > > +             setThreadAffinityInternal();\n> > > > > > > +\n> > > > > > > +     return 0;\n> > > > > > > +}\n> > > > > > > +\n> > > > > > > +void Thread::setThreadAffinityInternal()\n> > > > > > > +{\n> > > > > > > +     if (!data_->cpuset_)\n> > > > > > > +             return;\n> > > > > > > +\n> > > > > > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > > > > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> > > > > >\n> > > > > > With this implementation, it's not possible to set the thread's CPU\n> > > > > > affinity before starting the thread. That's a limitation of the C++\n> > > > > > std::thread API, it could be done if we used the native pthread API\n> > > > > > instead. Is that a problem ? Could you actually point me to code that\n> > > > > > uses Thread::setThreadAffinity() ?\n> > > > >\n> > > > > There are two main usages in mtkisp7:\n> > > > > 1. Setting pipeline handler's thread [1].\n> > > >\n> > > > Heads up on that one, we probably want to handle this in the libcamera\n> > > > core. Multiple pipeline handlers can run on the same platform, currently\n> > > > sharing a single camera manager thread. If they all tried to set thread\n> > > > affinity, that would be messy. I'd like to explore how we could\n> > > > coordinate that in a cleaner way.\n> > >\n> > > Yes, your concern makes a lot of sense. If we want to process requests\n> > > on a different CPU core in pipeline handlers, would it be better to create\n> > > another thread in each pipeline handler? That involves an extra context\n> > > switch per frame though...\n> >\n> > One thread per pipeline handler is an option, yes. I don't think it\n> > requires a context switch though, we don't have to go from the\n> > application thread to the camera manager thread to the pipeline handler\n> > thread, we could bypass the camera manager thread.\n> \n> Ah right, libcamera::Camera APIs are mostly thread-safe, I forgot.\n> \n> >\n> > Another option that was floated around was to use a thread pool in case\n> > one thread per pipeline handler would create too many threads. It's very\n> > theoretical at this point whether or not this would help improve\n> > performance.\n> \n> Yeah true. We have much work to do in this approach though.\n> \n> >\n> > I'm fine keeping a single internal thread in libcamera, but I'd like its\n> > affinity to be managed in a way that doesn't let different pipeline\n> > handlers compete to set different affinities.\n> \n> Definitely :)\n> Let me know if I should prevent setting thread affinity in a thread\n> that's already started.\n\nI was thinking about that. Can I ask you to weight the pros and cons and\nmake a proposal, with a rationale ? Generally speaking, this is what I\nusually expect from patches for libcamera. When multiple options have\nbeen considered, the reason why a particular one was picked is useful to\ndocument, either in the commit message, or the cover letter. This speeds\nup reviews as we then don't have to wonder why particular decisions were\nmade and investigate alternatives.\n\n> > > > > 2. Setting IPA's two new threads [2].\n> > > > >\n> > > > > The first one sets thread affinity to a thread that has already been\n> > > > > started, while the second one set new threads.\n> > > >\n> > > > Out of curiosity, have you measured the difference in performance ? Is\n> > > > it a matter of pinning the threads to big cores to ensure they have\n> > > > enough CPU time ? Or a matter of avoiding latency caused by migration ?\n> > > > Or something else ?\n> > >\n> > > We run CTS, and the current config passes the performance tests with\n> > > sufficient fps and low enough latencies.\n> > >\n> > > Without this functionality though, I think we can still somehow pass CTS\n> > > with more retries.\n> >\n> > It would be interesting to see what kind of time targets are missed\n> > without this change.\n> \n> For example:\n> ```\n> android.camera.cts\n> android.hardware.camera2.cts.RecordingTest#testRecordingFromPersistentSurface[1]\n> FAILURE: junit.framewo\n> rk.AssertionFailedError: Camera 1: Video frame drop rate too high:\n> 6.976744%, tolerance 5.000000%. Video size: 1280x720, expectedDuration\n> [2866.666504,2866.666504], expectedFrameDur\n> ation 33.333332, frameDropCnt 6, frameCount 86\n> ```\n\nWhat I meant is information about why these failures occurred and how\nthis has been debugged. If you don't have that information that's fine,\nsometimes one tries a semi-random change that fixes a problem and moves\non without further analysis of the issue.\n\n> > > If it's controversial, we can skip this for now.\n> >\n> > No, it's not controversial, I'd just like to design it right :-)\n> >\n> > > We can also allow setting thread affinity when the thread hasn't been\n> > > started yet, which at least fits the second use case in mtkisp7.\n> > >\n> > > > > By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> > > > > This means that we need to implement libcamera::Thread with pthread,\n> > > > > instead of std::thread?\n> > > >\n> > > > Yes. It doesn't mean we have to do that, I'm only asking if you think\n> > > > that would be a good idea, or overkill.\n> > >\n> > > I find it a bit an overkill :)\n> > > The approach below should be much easier.\n> >\n> > Fine with me.\n> >\n> > > > > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> > > > > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> > > > >\n> > > > > > Even without the ability to set the affinity before starting the thread,\n> > > > > > we could set it from within the thread before calling the run()\n> > > > > > function. This would avoid the main workload being run on incorrect\n> > > > > > CPUs. Is that something that would be useful ?\n> > > > >\n> > > > > You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> > > > > (before `run()`), instead of at the end of `start()`?\n> > > >\n> > > > Yes. Again, I'm not saing it has to be done, just exploring options.\n> > >\n> > > That makes a lot of sense. It works on ciri as well in my trial.\n> >\n> > Will you submit a new patch with this ?\n> \n> Sure. Uploaded.\n> \n> > > > > > Regardless of which option we implement, Thread::setThreadAffinity()\n> > > > > > should document what happens if it's called before calling\n> > > > > > Thread::start().\n> > > > > >\n> > > > > > > +}\n> > > > > > > +\n> > > > > > >  /**\n> > > > > > >   * \\brief Check if the thread is running\n> > > > > > >   *","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id EB9AABE08B\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 24 Oct 2024 08:31:56 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 0A1A365393;\n\tThu, 24 Oct 2024 10:31:56 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 7A6E7603ED\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 10:31:54 +0200 (CEST)","from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi\n\t[81.175.209.231])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 47598C8E;\n\tThu, 24 Oct 2024 10:30:06 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"rutAsrZA\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1729758606;\n\tbh=SQOFo8Uahw2Tapt91NEtGDhUnpDjn7DcWZJY/v7pos4=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=rutAsrZA/9pdtVvpDzUFmWGNer+hEVGTsFKJ7nyjtCqZvBwag6Z2gGwDZHdIpj9GN\n\t9ym1MU7FHYh7pNgxcwJSN9V1GkA85xuxivuKLzfcfLJ5pObPRpiQ8jJ8kNTeRZzOVF\n\tU0hg4vOCGda72UmpDOR25KmMb6kVGX3jmAbNrN5Y=","Date":"Thu, 24 Oct 2024 11:31:48 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Cc":"libcamera-devel@lists.libcamera.org,\n\tHan-Lin Chen <hanlinchen@chromium.org>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","Message-ID":"<20241024083148.GE5873@pendragon.ideasonboard.com>","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>\n\t<20241022224901.GI9497@pendragon.ideasonboard.com>\n\t<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>\n\t<20241023234351.GD5873@pendragon.ideasonboard.com>\n\t<CAEB1ahsA+4CYFQW2k5AWSYRKhisLN6Kxnnx8=r54hUg4n_-yYw@mail.gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<CAEB1ahsA+4CYFQW2k5AWSYRKhisLN6Kxnnx8=r54hUg4n_-yYw@mail.gmail.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":31907,"web_url":"https://patchwork.libcamera.org/comment/31907/","msgid":"<CAEB1ahsayxd+f=bxO=6c4xwAEMqXyVQCfF3m-97091fDbJNFOg@mail.gmail.com>","date":"2024-10-24T13:25:30","subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","submitter":{"id":117,"url":"https://patchwork.libcamera.org/api/people/117/","name":"Cheng-Hao Yang","email":"chenghaoyang@chromium.org"},"content":"Hi Laurent,\n\nOn Thu, Oct 24, 2024 at 4:31 PM Laurent Pinchart\n<laurent.pinchart@ideasonboard.com> wrote:\n>\n> On Thu, Oct 24, 2024 at 04:16:03PM +0800, Cheng-Hao Yang wrote:\n> > On Thu, Oct 24, 2024 at 7:43 AM Laurent Pinchart wrote:\n> > > On Wed, Oct 23, 2024 at 02:46:38PM +0800, Cheng-Hao Yang wrote:\n> > > > On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart wrote:\n> > > > > On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:\n> > > > > > Hi Laurent,\n> > > > > >\n> > > > > > I'll update a new version when all changes are applied.\n> > > > >\n> > > > > Yes, let's finalize the discussion first :-)\n> > > > >\n> > > > > > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:\n> > > > > > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:\n> > > > > > > > From: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > > > >\n> > > > > > > > Add method to set thread affinity to Thread class.\n> > > > > > > >\n> > > > > > > > Signed-off-by: Han-Lin Chen <hanlinchen@chromium.org>\n> > > > > > > > Co-developed-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > > > Signed-off-by: Harvey Yang <chenghaoyang@chromium.org>\n> > > > > > > > ---\n> > > > > > > >  include/libcamera/base/thread.h |  5 ++++\n> > > > > > > >  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++\n> > > > > > > >  2 files changed, 51 insertions(+)\n> > > > > > > >\n> > > > > > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h\n> > > > > > > > index 4f33de63d..3209d4f7c 100644\n> > > > > > > > --- a/include/libcamera/base/thread.h\n> > > > > > > > +++ b/include/libcamera/base/thread.h\n> > > > > > > > @@ -15,6 +15,7 @@\n> > > > > > > >\n> > > > > > > >  #include <libcamera/base/message.h>\n> > > > > > > >  #include <libcamera/base/signal.h>\n> > > > > > > > +#include <libcamera/base/span.h>\n> > > > > > > >  #include <libcamera/base/utils.h>\n> > > > > > > >\n> > > > > > > >  namespace libcamera {\n> > > > > > > > @@ -35,6 +36,8 @@ public:\n> > > > > > > >       void exit(int code = 0);\n> > > > > > > >       bool wait(utils::duration duration = utils::duration::max());\n> > > > > > > >\n> > > > > > > > +     int setThreadAffinity(const Span<const unsigned int> &cpus);\n> > > > > > > > +\n> > > > > > > >       bool isRunning();\n> > > > > > > >\n> > > > > > > >       Signal<> finished;\n> > > > > > > > @@ -54,6 +57,8 @@ private:\n> > > > > > > >       void startThread();\n> > > > > > > >       void finishThread();\n> > > > > > > >\n> > > > > > > > +     void setThreadAffinityInternal();\n> > > > > > > > +\n> > > > > > > >       void postMessage(std::unique_ptr<Message> msg, Object *receiver);\n> > > > > > > >       void removeMessages(Object *receiver);\n> > > > > > > >\n> > > > > > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp\n> > > > > > > > index 8735670b8..5dc787dba 100644\n> > > > > > > > --- a/src/libcamera/base/thread.cpp\n> > > > > > > > +++ b/src/libcamera/base/thread.cpp\n> > > > > > > > @@ -9,6 +9,7 @@\n> > > > > > > >\n> > > > > > > >  #include <atomic>\n> > > > > > > >  #include <list>\n> > > > > > > > +#include <optional>\n> > > > > > > >  #include <sys/syscall.h>\n> > > > > > > >  #include <sys/types.h>\n> > > > > > > >  #include <unistd.h>\n> > > > > > > > @@ -128,6 +129,8 @@ private:\n> > > > > > > >       int exitCode_;\n> > > > > > > >\n> > > > > > > >       MessageQueue messages_;\n> > > > > > > > +\n> > > > > > > > +     std::optional<cpu_set_t> cpuset_ = std::nullopt;\n> > > > > > >\n> > > > > > > Is there a need to initialize this to std::nullopt, as that's what the\n> > > > > > > default constructor does ?\n> > > > > >\n> > > > > > Removed.\n> > > > > >\n> > > > > > > >  };\n> > > > > > > >\n> > > > > > > >  /**\n> > > > > > > > @@ -254,6 +257,8 @@ void Thread::start()\n> > > > > > > >       data_->exit_.store(false, std::memory_order_relaxed);\n> > > > > > > >\n> > > > > > > >       thread_ = std::thread(&Thread::startThread, this);\n> > > > > > > > +\n> > > > > > > > +     setThreadAffinityInternal();\n> > > > > > > >  }\n> > > > > > > >\n> > > > > > > >  void Thread::startThread()\n> > > > > > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)\n> > > > > > > >       return hasFinished;\n> > > > > > > >  }\n> > > > > > > >\n> > > > > > > > +/**\n> > > > > > > > + * \\brief Set the CPU affinity mask of the thread\n> > > > > > > > + * \\param[in] cpus The list of CPU indices that the thread is set affinity to\n> > > > > > > > + *\n> > > > > > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).\n> > > > > > > > + * If any index is invalid, no cpu indices will be set.\n> > > > > > >\n> > > > > > >  * If any index is invalid, this function won't modify the thread affinity and\n> > > > > > >  * will return an error.\n> > > > > >\n> > > > > > Done.\n> > > > > >\n> > > > > > > > + *\n> > > > > > > > + * \\return 0 if all indices are valid, -EINVAL otherwise\n> > > > > > > > + */\n> > > > > > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)\n> > > > > > > > +{\n> > > > > > > > +     const unsigned int numCpus = std::thread::hardware_concurrency();\n> > > > > > >\n> > > > > > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency\n> > > > > > > says:\n> > > > > > >\n> > > > > > >   Returns the number of concurrent threads supported by the\n> > > > > > >   implementation. The value should be considered only a hint.\n> > > > > > >\n> > > > > > >   [snip]\n> > > > > > >\n> > > > > > >   Return value\n> > > > > > >\n> > > > > > >   Number of concurrent threads supported. If the value is not well\n> > > > > > >   defined or not computable, returns 0.\n> > > > > > >\n> > > > > > > libcxx seems to have the following implementation:\n> > > > > > >\n> > > > > > > unsigned thread::hardware_concurrency() noexcept {\n> > > > > > > #if defined(_SC_NPROCESSORS_ONLN)\n> > > > > > >   long result = sysconf(_SC_NPROCESSORS_ONLN);\n> > > > > > >   // sysconf returns -1 if the name is invalid, the option does not exist or\n> > > > > > >   // does not have a definite limit.\n> > > > > > >   // if sysconf returns some other negative number, we have no idea\n> > > > > > >   // what is going on. Default to something safe.\n> > > > > > >   if (result < 0)\n> > > > > > >     return 0;\n> > > > > > >   return static_cast<unsigned>(result);\n> > > > > > > #elif defined(_LIBCPP_WIN32API)\n> > > > > > >   SYSTEM_INFO info;\n> > > > > > >   GetSystemInfo(&info);\n> > > > > > >   return info.dwNumberOfProcessors;\n> > > > > > > #else // defined(CTL_HW) && defined(HW_NCPU)\n> > > > > > >   // TODO: grovel through /proc or check cpuid on x86 and similar\n> > > > > > >   // instructions on other architectures.\n> > > > > > > #  if defined(_LIBCPP_WARNING)\n> > > > > > >   _LIBCPP_WARNING(\"hardware_concurrency not yet implemented\")\n> > > > > > > #  else\n> > > > > > > #    warning hardware_concurrency not yet implemented\n> > > > > > > #  endif\n> > > > > > >   return 0; // Means not computable [thread.thread.static]\n> > > > > > > #endif // defined(CTL_HW) && defined(HW_NCPU)\n> > > > > > > }\n> > > > > > >\n> > > > > > > while libstdc++ uses\n> > > > > > >\n> > > > > > > #if defined(_GLIBCXX_USE_GET_NPROCS)\n> > > > > > > # include <sys/sysinfo.h>\n> > > > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)\n> > > > > > > # define _GLIBCXX_NPROCS pthread_num_processors_np()\n> > > > > > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)\n> > > > > > > # include <stddef.h>\n> > > > > > > # include <sys/sysctl.h>\n> > > > > > > static inline int get_nprocs()\n> > > > > > > {\n> > > > > > >  int count;\n> > > > > > >  size_t size = sizeof(count);\n> > > > > > >  int mib[] = { CTL_HW, HW_NCPU };\n> > > > > > >  if (!sysctl(mib, 2, &count, &size, NULL, 0))\n> > > > > > >    return count;\n> > > > > > >  return 0;\n> > > > > > > }\n> > > > > > > # define _GLIBCXX_NPROCS get_nprocs()\n> > > > > > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)\n> > > > > > > # include <unistd.h>\n> > > > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)\n> > > > > > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)\n> > > > > > > # include <unistd.h>\n> > > > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)\n> > > > > > > #else\n> > > > > > > # define _GLIBCXX_NPROCS 0\n> > > > > > > #endif\n> > > > > > >\n> > > > > > >   unsigned int\n> > > > > > >   thread::hardware_concurrency() noexcept\n> > > > > > >   {\n> > > > > > >     int __n = _GLIBCXX_NPROCS;\n> > > > > > >     if (__n < 0)\n> > > > > > >       __n = 0;\n> > > > > > >     return __n;\n> > > > > > >   }\n> > > > > > >\n> > > > > > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter\n> > > > > > > is a wrapper around the former). So it seems that, while the C++ API\n> > > > > > > doesn't guarantee that thread::hardware_concurrency() will provide an\n> > > > > > > accurate value, the implementations we have will do the right thing.\n> > > > > > > They're however a bit expensive, as they open a file in /sys, but that's\n> > > > > > > probably the best we can do.\n> > > > > > >\n> > > > > > > > +\n> > > > > > > > +     MutexLocker locker(data_->mutex_);\n> > > > > > > > +     data_->cpuset_ = cpu_set_t();\n> > > > > > > > +     CPU_ZERO(&data_->cpuset_.value());\n> > > > > > > > +\n> > > > > > > > +     for (const unsigned int &cpu : cpus) {\n> > > > > > > > +             if (cpu >= numCpus) {\n> > > > > > > > +                     LOG(Thread, Error) << \"Ignore an invalid cpu index: \" << cpu;\n> > > > > > >\n> > > > > > > It's not ignored anymore\n> > > > > > >\n> > > > > > >                         LOG(Thread, Error)\n> > > > > > >                                 << \"Invalid CPU \" << cpu << \" for thread affinity\";\n> > > > > > >\n> > > > > >\n> > > > > > Right, updated.\n> > > > > >\n> > > > > > > > +                     return -EINVAL;\n> > > > > > > > +             }\n> > > > > > > > +\n> > > > > > > > +             CPU_SET(cpu, &data_->cpuset_.value());\n> > > > > > > > +     }\n> > > > > > > > +\n> > > > > > > > +     if (data_->running_)\n> > > > > > > > +             setThreadAffinityInternal();\n> > > > > > > > +\n> > > > > > > > +     return 0;\n> > > > > > > > +}\n> > > > > > > > +\n> > > > > > > > +void Thread::setThreadAffinityInternal()\n> > > > > > > > +{\n> > > > > > > > +     if (!data_->cpuset_)\n> > > > > > > > +             return;\n> > > > > > > > +\n> > > > > > > > +     const cpu_set_t &cpuset = data_->cpuset_.value();\n> > > > > > > > +     pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);\n> > > > > > >\n> > > > > > > With this implementation, it's not possible to set the thread's CPU\n> > > > > > > affinity before starting the thread. That's a limitation of the C++\n> > > > > > > std::thread API, it could be done if we used the native pthread API\n> > > > > > > instead. Is that a problem ? Could you actually point me to code that\n> > > > > > > uses Thread::setThreadAffinity() ?\n> > > > > >\n> > > > > > There are two main usages in mtkisp7:\n> > > > > > 1. Setting pipeline handler's thread [1].\n> > > > >\n> > > > > Heads up on that one, we probably want to handle this in the libcamera\n> > > > > core. Multiple pipeline handlers can run on the same platform, currently\n> > > > > sharing a single camera manager thread. If they all tried to set thread\n> > > > > affinity, that would be messy. I'd like to explore how we could\n> > > > > coordinate that in a cleaner way.\n> > > >\n> > > > Yes, your concern makes a lot of sense. If we want to process requests\n> > > > on a different CPU core in pipeline handlers, would it be better to create\n> > > > another thread in each pipeline handler? That involves an extra context\n> > > > switch per frame though...\n> > >\n> > > One thread per pipeline handler is an option, yes. I don't think it\n> > > requires a context switch though, we don't have to go from the\n> > > application thread to the camera manager thread to the pipeline handler\n> > > thread, we could bypass the camera manager thread.\n> >\n> > Ah right, libcamera::Camera APIs are mostly thread-safe, I forgot.\n> >\n> > >\n> > > Another option that was floated around was to use a thread pool in case\n> > > one thread per pipeline handler would create too many threads. It's very\n> > > theoretical at this point whether or not this would help improve\n> > > performance.\n> >\n> > Yeah true. We have much work to do in this approach though.\n> >\n> > >\n> > > I'm fine keeping a single internal thread in libcamera, but I'd like its\n> > > affinity to be managed in a way that doesn't let different pipeline\n> > > handlers compete to set different affinities.\n> >\n> > Definitely :)\n> > Let me know if I should prevent setting thread affinity in a thread\n> > that's already started.\n>\n> I was thinking about that. Can I ask you to weight the pros and cons and\n> make a proposal, with a rationale ? Generally speaking, this is what I\n> usually expect from patches for libcamera. When multiple options have\n> been considered, the reason why a particular one was picked is useful to\n> document, either in the commit message, or the cover letter. This speeds\n> up reviews as we then don't have to wonder why particular decisions were\n> made and investigate alternatives.\n\nSure, while I still have some questions on the other approach:\nIf we prevent setting thread affinity in a thread that's already started,\nhow could we set a pipeline handler's owner thread's affinity?\n\nI assume we were discussing having a separate thread for each pipeline\nhandler instance. If this is the case, how could we maintain the lifetime\nof the thread? (Keeping the thread as a member variable of PipelineHandler\ndoesn't seem to work) I guess we need to change the lifetime of\nPipelineHandler as well, which is currently held by Camera::Private's\nshared_ptr.\nOtherwise, we need to have an internal thread in PipelineHandler that's not\nthe PipelineHandler instance (as an Object)'s owner thread.\n\nWDYT?\n\n\n>\n> > > > > > 2. Setting IPA's two new threads [2].\n> > > > > >\n> > > > > > The first one sets thread affinity to a thread that has already been\n> > > > > > started, while the second one set new threads.\n> > > > >\n> > > > > Out of curiosity, have you measured the difference in performance ? Is\n> > > > > it a matter of pinning the threads to big cores to ensure they have\n> > > > > enough CPU time ? Or a matter of avoiding latency caused by migration ?\n> > > > > Or something else ?\n> > > >\n> > > > We run CTS, and the current config passes the performance tests with\n> > > > sufficient fps and low enough latencies.\n> > > >\n> > > > Without this functionality though, I think we can still somehow pass CTS\n> > > > with more retries.\n> > >\n> > > It would be interesting to see what kind of time targets are missed\n> > > without this change.\n> >\n> > For example:\n> > ```\n> > android.camera.cts\n> > android.hardware.camera2.cts.RecordingTest#testRecordingFromPersistentSurface[1]\n> > FAILURE: junit.framewo\n> > rk.AssertionFailedError: Camera 1: Video frame drop rate too high:\n> > 6.976744%, tolerance 5.000000%. Video size: 1280x720, expectedDuration\n> > [2866.666504,2866.666504], expectedFrameDur\n> > ation 33.333332, frameDropCnt 6, frameCount 86\n> > ```\n>\n> What I meant is information about why these failures occurred and how\n> this has been debugged. If you don't have that information that's fine,\n> sometimes one tries a semi-random change that fixes a problem and moves\n> on without further analysis of the issue.\n\nHmm, I think the failures are quite straight-forward to me that the fps is too\nlow that there are more frame drops occurring. The fps is low because the tasks\ntake longer time to be processed.\n\nBut yeah, we didn't spend much time being very sure about that. We used big\ncores and the issues were solved :)\n\nBR,\nHarvey\n\n>\n> > > > If it's controversial, we can skip this for now.\n> > >\n> > > No, it's not controversial, I'd just like to design it right :-)\n> > >\n> > > > We can also allow setting thread affinity when the thread hasn't been\n> > > > started yet, which at least fits the second use case in mtkisp7.\n> > > >\n> > > > > > By \"the native pthread API\", you mean `pthread_attr_setaffinity_np`?\n> > > > > > This means that we need to implement libcamera::Thread with pthread,\n> > > > > > instead of std::thread?\n> > > > >\n> > > > > Yes. It doesn't mean we have to do that, I'm only asking if you think\n> > > > > that would be a good idea, or overkill.\n> > > >\n> > > > I find it a bit an overkill :)\n> > > > The approach below should be much easier.\n> > >\n> > > Fine with me.\n> > >\n> > > > > > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511\n> > > > > > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119\n> > > > > >\n> > > > > > > Even without the ability to set the affinity before starting the thread,\n> > > > > > > we could set it from within the thread before calling the run()\n> > > > > > > function. This would avoid the main workload being run on incorrect\n> > > > > > > CPUs. Is that something that would be useful ?\n> > > > > >\n> > > > > > You mean calling `setThreadAffinityInterna()l` in `startThread()`\n> > > > > > (before `run()`), instead of at the end of `start()`?\n> > > > >\n> > > > > Yes. Again, I'm not saing it has to be done, just exploring options.\n> > > >\n> > > > That makes a lot of sense. It works on ciri as well in my trial.\n> > >\n> > > Will you submit a new patch with this ?\n> >\n> > Sure. Uploaded.\n> >\n> > > > > > > Regardless of which option we implement, Thread::setThreadAffinity()\n> > > > > > > should document what happens if it's called before calling\n> > > > > > > Thread::start().\n> > > > > > >\n> > > > > > > > +}\n> > > > > > > > +\n> > > > > > > >  /**\n> > > > > > > >   * \\brief Check if the thread is running\n> > > > > > > >   *\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 5D95AC3213\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 24 Oct 2024 13:25:46 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 8197D65393;\n\tThu, 24 Oct 2024 15:25:45 +0200 (CEST)","from mail-lj1-x233.google.com (mail-lj1-x233.google.com\n\t[IPv6:2a00:1450:4864:20::233])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id DF760618C1\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 15:25:43 +0200 (CEST)","by mail-lj1-x233.google.com with SMTP id\n\t38308e7fff4ca-2f7657f9f62so8628651fa.3\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 24 Oct 2024 06:25:43 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"jveMICBZ\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=chromium.org; s=google; t=1729776343; x=1730381143;\n\tdarn=lists.libcamera.org; \n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:from:to:cc:subject:date\n\t:message-id:reply-to;\n\tbh=4J1RtDVi6ybT6C6+NDpDxrJcq6sVJpGNcvxhfwFICX0=;\n\tb=jveMICBZmFhFjr7jizZ8w/c4Wpummv/WoDZqUyUtZhH7xm86lqOn3IEk/WIggKgG8h\n\tlF6eLx69HnSqir9Ht8XQsFWuul/gpwkiMkn86fVryCy4qalJKEURahIB1VbiwXlJ5OFG\n\tYIrLJ+xPl2GhX+rFlFKRkJZrNBCfLUayDhjrg=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20230601; t=1729776343; x=1730381143;\n\th=content-transfer-encoding:cc:to:subject:message-id:date:from\n\t:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc\n\t:subject:date:message-id:reply-to;\n\tbh=4J1RtDVi6ybT6C6+NDpDxrJcq6sVJpGNcvxhfwFICX0=;\n\tb=GH0HTdR52gxTF36SI2UJR0rUV3XUsn2yukIMK3AJqAwFSjwRkRfoW6629AVBtcPUGz\n\t4S42Wvn94keF2BbSM0zXptV3PDz5uyUKun3wBZoo4HmpOprQrw97Ye9naAUemMxJWTfu\n\tnrez1h0P1WWIybkJS4I2tm1Az4Kf58XBHRhCZbjbX9U57Z9ABWtgsVuConpNihSOOAVd\n\t8UT0+ymAG2eihP5KFGm0kTbKSf4MM07X/cnFZUAGh3L0iOUQh08rKwLALj5HM2z5hHQo\n\tU6EaAS2KRPPO+RuLBl9XZyDs7OQTdHEfL2UhbaYlJUGZIrKcpQlXa+LcBFcNrLGlNbXd\n\t8CRw==","X-Gm-Message-State":"AOJu0YwPlb4+36QsDWt6wHIF+3xGyI2KytgQOD8LSFGFF5V19ln1bVR7\n\te7K0idIt1jJTqiznrg1lq41PyG6LU2cB6/9nFr+e2hSxVchtn6j8KvJgsEPmV/hjUlIGJy78Foc\n\tPT4sqLeSWyinUkP78IutFCQ1k/YtqK7vbTAn5V4oqws4dHW4=","X-Google-Smtp-Source":"AGHT+IHKuSDg3vNzel96KvOBlvJKjyu5jwmyNS/4vA9Oj8esrxWAy6Kgl75EznaaZOMdy5Kidv8QrLBXOIXON3DXMlI=","X-Received":"by 2002:a2e:a541:0:b0:2fb:4b0d:9092 with SMTP id\n\t38308e7fff4ca-2fca81c243dmr13269251fa.1.1729776342683;\n\tThu, 24 Oct 2024 06:25:42 -0700 (PDT)","MIME-Version":"1.0","References":"<20241016084449.3566589-1-chenghaoyang@chromium.org>\n\t<20241016084449.3566589-2-chenghaoyang@chromium.org>\n\t<20241016220827.GG30496@pendragon.ideasonboard.com>\n\t<CAEB1ahv=j=5zmi+qTrFMfq1XFM1o2RhATpY8Qd-XueAJoo=f3A@mail.gmail.com>\n\t<20241022224901.GI9497@pendragon.ideasonboard.com>\n\t<CAEB1ahuXYEY=39WA30ZLjurF90QhWLxOfhp926_eTXs71zAYiA@mail.gmail.com>\n\t<20241023234351.GD5873@pendragon.ideasonboard.com>\n\t<CAEB1ahsA+4CYFQW2k5AWSYRKhisLN6Kxnnx8=r54hUg4n_-yYw@mail.gmail.com>\n\t<20241024083148.GE5873@pendragon.ideasonboard.com>","In-Reply-To":"<20241024083148.GE5873@pendragon.ideasonboard.com>","From":"Cheng-Hao Yang <chenghaoyang@chromium.org>","Date":"Thu, 24 Oct 2024 21:25:30 +0800","Message-ID":"<CAEB1ahsayxd+f=bxO=6c4xwAEMqXyVQCfF3m-97091fDbJNFOg@mail.gmail.com>","Subject":"Re: [PATCH v5 1/1] libcamera: add method to set thread affinity","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Cc":"libcamera-devel@lists.libcamera.org, \n\tHan-Lin Chen <hanlinchen@chromium.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]