commit 3911af778f208e5f49d43ce739332b91e26bc48e Author: Peter Wang Date: Mon Jul 15 14:38:31 2024 +0800 scsi: ufs: core: Fix deadlock during RTC update There is a deadlock when runtime suspend waits for the flush of RTC work, and the RTC work calls ufshcd_rpm_get_sync() to wait for runtime resume. Here is deadlock backtrace: kworker/0:1 D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367 ptr f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff __switch_to+0x1a8/0x2d4 __schedule+0x684/0xa98 schedule+0x48/0xc8 schedule_timeout+0x48/0x170 do_wait_for_common+0x108/0x1b0 wait_for_completion+0x44/0x60 __flush_work+0x39c/0x424 __cancel_work_sync+0xd8/0x208 cancel_delayed_work_sync+0x14/0x28 __ufshcd_wl_suspend+0x19c/0x480 ufshcd_wl_runtime_suspend+0x3c/0x1d4 scsi_runtime_suspend+0x78/0xc8 __rpm_callback+0x94/0x3e0 rpm_suspend+0x2d4/0x65c __pm_runtime_suspend+0x80/0x114 scsi_runtime_idle+0x38/0x6c rpm_idle+0x264/0x338 __pm_runtime_idle+0x80/0x110 ufshcd_rtc_work+0x128/0x1e4 process_one_work+0x26c/0x650 worker_thread+0x260/0x3d8 kthread+0x110/0x134 ret_from_fork+0x10/0x20 Skip updating RTC if RPM state is not RPM_ACTIVE. Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org # 6.9.x Signed-off-by: Peter Wang Link: https://lore.kernel.org/r/20240715063831.29792-1-peter.wang@mediatek.com Reviewed-by: Bean Huo Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 022587d8aec3da1d1698ddae9fb8cfe35f3ad49c Author: Peter Wang Date: Fri Jul 12 17:45:06 2024 +0800 scsi: ufs: core: Bypass quick recovery if force reset is needed If force_reset is true, bypass quick recovery. This will shorten error recovery time. Signed-off-by: Peter Wang Link: https://lore.kernel.org/r/20240712094506.11284-1-peter.wang@mediatek.com Reviewed-by: Bean Huo Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 0c60eb0cc320fffbb8b10329d276af14f6f5e6bf Author: Kyoungrul Kim Date: Wed Jul 10 08:25:20 2024 +0900 scsi: ufs: core: Check LSDBS cap when !mcq If the user sets use_mcq_mode to 0, the host will try to activate the LSDB mode unconditionally even when the LSDBS of device HCI cap is 1. This makes commands time out and causes device probing to fail. To prevent that problem, check the LSDBS cap when MCQ is not supported. Signed-off-by: Kyoungrul Kim Link: https://lore.kernel.org/r/20240709232520epcms2p8ebdb5c4fccc30a6221390566589bf122@epcms2p8 Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 23cef42d17413d099f44ea42b622fbf23b04646f Author: Zhongqiu Han Date: Fri Jul 5 18:36:14 2024 +0800 scsi: aha152x: Use DECLARE_COMPLETION_ONSTACK for non-constant completion The _ONSTACK variant should be used for on-stack completion, otherwise it will break lockdep. See also commit 6e9a4738c9fa ("[PATCH] completions: lockdep annotate on stack completions"). Signed-off-by: Zhongqiu Han Link: https://lore.kernel.org/r/20240705103614.3650637-1-quic_zhonhan@quicinc.com Signed-off-by: Martin K. Petersen commit 6ca9fede7c73b3c83f4cc8ecb3c29e1d501107c6 Author: Chen Ni Date: Thu Jul 11 08:57:24 2024 +0800 scsi: qla2xxx: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Fixes: d4523bd6fd5d ("scsi: qla2xxx: Refactor asynchronous command initialization") Signed-off-by: Chen Ni Link: https://lore.kernel.org/r/20240711005724.2358446-1-nichen@iscas.ac.cn Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit a1392b19ca59256cda627b0ed04657e8e84702b4 Author: Nilesh Javali Date: Wed Jul 10 22:40:57 2024 +0530 scsi: qla2xxx: Update version to 10.02.09.300-k Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-12-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit c449b4198701d828e40d60a2abd30970b74a1d75 Author: Quinn Tran Date: Wed Jul 10 22:40:56 2024 +0530 scsi: qla2xxx: Use QP lock to search for bsg On bsg timeout, hardware_lock is used as part of search for the srb. Instead, qpair lock should be used to iterate through different qpair. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-11-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit beafd692461443e0fb1d61aa56886bf85ef6f5e4 Author: Quinn Tran Date: Wed Jul 10 22:40:55 2024 +0530 scsi: qla2xxx: Reduce fabric scan duplicate code For fabric scan, current code uses switch scan opcode and flags as the method to iterate through different commands to carry out the process. This makes it hard to read. This patch convert those opcode and flags into steps. In addition, this help reduce some duplicate code. Consolidate routines that handle GPNFT & GNNFT. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-10-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 348744f27a35e087acc9378bf53537fbfb072775 Author: Shreyas Deodhar Date: Wed Jul 10 22:40:54 2024 +0530 scsi: qla2xxx: Fix optrom version displayed in FDMI Bios version was popluated for FDMI response. Systems with EFI would show optrom version as 0. EFI version is populated here and BIOS version is already displayed under FDMI_HBA_BOOT_BIOS_NAME. Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-9-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 76f480d7c717368f29a3870f7d64471ce0ff8fb2 Author: Manish Rangankar Date: Wed Jul 10 22:40:53 2024 +0530 scsi: qla2xxx: During vport delete send async logout explicitly During vport delete, it is observed that during unload we hit a crash because of stale entries in outstanding command array. For all these stale I/O entries, eh_abort was issued and aborted (fast_fail_io = 2009h) but I/Os could not complete while vport delete is in process of deleting. BUG: kernel NULL pointer dereference, address: 000000000000001c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Workqueue: qla2xxx_wq qla_do_work [qla2xxx] RIP: 0010:dma_direct_unmap_sg+0x51/0x1e0 RSP: 0018:ffffa1e1e150fc68 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000021 RCX: 0000000000000001 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff8ce208a7a0d0 RBP: ffff8ce208a7a0d0 R08: 0000000000000000 R09: ffff8ce378aac9c8 R10: ffff8ce378aac8a0 R11: ffffa1e1e150f9d8 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8ce378aac9c8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8d217f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 0000002089acc000 CR4: 0000000000350ee0 Call Trace: qla2xxx_qpair_sp_free_dma+0x417/0x4e0 ? qla2xxx_qpair_sp_compl+0x10d/0x1a0 ? qla2x00_status_entry+0x768/0x2830 ? newidle_balance+0x2f0/0x430 ? dequeue_entity+0x100/0x3c0 ? qla24xx_process_response_queue+0x6a1/0x19e0 ? __schedule+0x2d5/0x1140 ? qla_do_work+0x47/0x60 ? process_one_work+0x267/0x440 ? process_one_work+0x440/0x440 ? worker_thread+0x2d/0x3d0 ? process_one_work+0x440/0x440 ? kthread+0x156/0x180 ? set_kthread_struct+0x50/0x50 ? ret_from_fork+0x22/0x30 Send out async logout explicitly for all the ports during vport delete. Cc: stable@vger.kernel.org Signed-off-by: Manish Rangankar Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-8-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 4475afa2646d3fec176fc4d011d3879b26cb26e3 Author: Shreyas Deodhar Date: Wed Jul 10 22:40:52 2024 +0530 scsi: qla2xxx: Complete command early within lock A crash was observed while performing NPIV and FW reset, BUG: kernel NULL pointer dereference, address: 000000000000001c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 1 PREEMPT_RT SMP NOPTI RIP: 0010:dma_direct_unmap_sg+0x51/0x1e0 RSP: 0018:ffffc90026f47b88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000021 RCX: 0000000000000002 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff8881041130d0 RBP: ffff8881041130d0 R08: 0000000000000000 R09: 0000000000000034 R10: ffffc90026f47c48 R11: 0000000000000031 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8881565e4a20 R15: 0000000000000000 FS: 00007f4c69ed3d00(0000) GS:ffff889faac80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 0000000288a50002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __die_body+0x1a/0x60 ? page_fault_oops+0x16f/0x4a0 ? do_user_addr_fault+0x174/0x7f0 ? exc_page_fault+0x69/0x1a0 ? asm_exc_page_fault+0x22/0x30 ? dma_direct_unmap_sg+0x51/0x1e0 ? preempt_count_sub+0x96/0xe0 qla2xxx_qpair_sp_free_dma+0x29f/0x3b0 [qla2xxx] qla2xxx_qpair_sp_compl+0x60/0x80 [qla2xxx] __qla2x00_abort_all_cmds+0xa2/0x450 [qla2xxx] The command completion was done early while aborting the commands in driver unload path but outside lock to avoid the WARN_ON condition of performing dma_free_attr within the lock. However this caused race condition while command completion via multiple paths causing system crash. Hence complete the command early in unload path but within the lock to avoid race condition. Fixes: 0367076b0817 ("scsi: qla2xxx: Perform lockless command completion in abort path") Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-7-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 29e222085d8907ccff18ecd931bdd4c6b1f11b92 Author: Quinn Tran Date: Wed Jul 10 22:40:51 2024 +0530 scsi: qla2xxx: Fix flash read failure Link up failure is observed as a result of flash read failure. Current code does not check flash read return code where it relies on FW checksum to detect the problem. Add check of flash read failure to detect the problem sooner. Reported-by: kernel test robot Reported-by: Dan Carpenter Closes: https://lore.kernel.org/all/202406210815.rPDRDMBi-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-6-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit ce2065c4cc4f05635413f63f6dc038d7d4842e31 Author: Saurav Kashyap Date: Wed Jul 10 22:40:50 2024 +0530 scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmds Firmware only supports single DSDs in ELS Pass-through IOCB (0x53h), sg cnt is decided by the SCSI ML. User is not aware of the cause of an acutal error. Return the appropriate return code that will be decoded by API and application and proper error message will be displayed to user. Fixes: 6e98016ca077 ("[SCSI] qla2xxx: Re-organized BSG interface specific code.") Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-5-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit c03d740152f78e86945a75b2ad541bf972fab92a Author: Shreyas Deodhar Date: Wed Jul 10 22:40:49 2024 +0530 scsi: qla2xxx: Fix for possible memory corruption Init Control Block is dereferenced incorrectly. Correctly dereference ICB Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-4-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit eb1d4ce2609584eeb7694866f34d4b213caa3af9 Author: Nilesh Javali Date: Wed Jul 10 22:40:48 2024 +0530 scsi: qla2xxx: validate nvme_local_port correctly The driver load failed with error message, qla2xxx [0000:04:00.0]-ffff:0: register_localport failed: ret=ffffffef and with a kernel crash, BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 Workqueue: events_unbound qla_register_fcport_fn [qla2xxx] RIP: 0010:nvme_fc_register_remoteport+0x16/0x430 [nvme_fc] RSP: 0018:ffffaaa040eb3d98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9dfb46b78c00 RCX: 0000000000000000 RDX: ffff9dfb46b78da8 RSI: ffffaaa040eb3e08 RDI: 0000000000000000 RBP: ffff9dfb612a0a58 R08: ffffffffaf1d6270 R09: 3a34303a30303030 R10: 34303a303030305b R11: 2078787832616c71 R12: ffff9dfb46b78dd4 R13: ffff9dfb46b78c24 R14: ffff9dfb41525300 R15: ffff9dfb46b78da8 FS: 0000000000000000(0000) GS:ffff9dfc67c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000018da10004 CR4: 00000000000206f0 Call Trace: qla_nvme_register_remote+0xeb/0x1f0 [qla2xxx] ? qla2x00_dfs_create_rport+0x231/0x270 [qla2xxx] qla2x00_update_fcport+0x2a1/0x3c0 [qla2xxx] qla_register_fcport_fn+0x54/0xc0 [qla2xxx] Exit the qla_nvme_register_remote() function when qla_nvme_register_hba() fails and correctly validate nvme_local_port. Cc: stable@vger.kernel.org Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-3-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit c3d98b12eef8db436e32f1a8c5478be57dc15621 Author: Quinn Tran Date: Wed Jul 10 22:40:47 2024 +0530 scsi: qla2xxx: Unable to act on RSCN for port online The device does not come online when the target port is online. There were multiple RSCNs indicating multiple devices were affected. Driver is in the process of finishing a fabric scan. A new RSCN (device up) arrived at the tail end of the last fabric scan. Driver mistakenly thinks the new RSCN is being taken care of by the previous fabric scan, where this notification is cleared and not acted on. The laser needs to be blinked again to get the device to show up. To prevent driver from accidentally clearing the RSCN notification, each RSCN is given a generation value. A fabric scan will scan for that generation(s). Any new RSCN arrive after the scan start will have a new generation value. This will trigger another scan to get latest data. The RSCN notification flag will be cleared when the scan is associate to that generation. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202406210538.w875N70K-lkp@intel.com/ Fixes: bb2ca6b3f09a ("scsi: qla2xxx: Relogin during fabric disturbance") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240710171057.35066-2-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit c96499fcb403b18b0ae0bf2e19a39f24e62dd3ac Author: Eric Biggers Date: Mon Jul 8 16:53:30 2024 -0700 scsi: ufs: exynos: Add support for Flash Memory Protector (FMP) Add support for Flash Memory Protector (FMP), which is the inline encryption hardware on Exynos and Exynos-based SoCs. Specifically, add support for the "traditional FMP mode" that works on many Exynos-based SoCs including gs101. This is the mode that uses "software keys" and is compatible with the upstream kernel's existing inline encryption framework in the block and filesystem layers. I plan to add support for the wrapped key support on gs101 at a later time. Tested on gs101 (specifically Pixel 6) by running the 'encrypt' group of xfstests on a filesystem mounted with the 'inlinecrypt' mount option. Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-7-ebiggers@kernel.org Reviewed-by: Peter Griffin Tested-by: Peter Griffin Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit 4c45dba50a3750a0834353c4187e7896b158bc0c Author: Eric Biggers Date: Mon Jul 8 16:53:29 2024 -0700 scsi: ufs: core: Add UFSHCD_QUIRK_KEYS_IN_PRDT Since the nonstandard inline encryption support on Exynos SoCs requires that raw cryptographic keys be copied into the PRDT, it is desirable to zeroize those keys after each request to keep them from being left in memory. Therefore, add a quirk bit that enables the zeroization. We could instead do the zeroization unconditionally. However, using a quirk bit avoids adding the zeroization overhead to standard devices. Reviewed-by: Bart Van Assche Reviewed-by: Peter Griffin Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-6-ebiggers@kernel.org Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit 8ecea3da1567e0648b5d37a6faec73fc9c8571ba Author: Eric Biggers Date: Mon Jul 8 16:53:28 2024 -0700 scsi: ufs: core: Add fill_crypto_prdt variant op Add a variant op to allow host drivers to initialize nonstandard crypto-related fields in the PRDT. This is needed to support inline encryption on the "Exynos" UFS controller. Note that this will be used together with the support for overriding the PRDT entry size that was already added by commit ada1e653a5ea ("scsi: ufs: core: Allow UFS host drivers to override the sg entry size"). Reviewed-by: Bart Van Assche Reviewed-by: Peter Griffin Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-5-ebiggers@kernel.org Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit e95881e0081a30e132b5ca087f1e07fc08608a7e Author: Eric Biggers Date: Mon Jul 8 16:53:27 2024 -0700 scsi: ufs: core: Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE which tells the UFS core to not use the crypto enable bit defined by the UFS specification. This is needed to support inline encryption on the "Exynos" UFS controller. Reviewed-by: Bart Van Assche Reviewed-by: Peter Griffin Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-4-ebiggers@kernel.org Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit ec99818afb03b1ebeb0b6ed0d5fd42143be79586 Author: Eric Biggers Date: Mon Jul 8 16:53:26 2024 -0700 scsi: ufs: core: fold ufshcd_clear_keyslot() into its caller Fold ufshcd_clear_keyslot() into its only remaining caller. Reviewed-by: Bart Van Assche Reviewed-by: Peter Griffin Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-3-ebiggers@kernel.org Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit c2a90eee29f41630225c9a64d26c425e1d50b401 Author: Eric Biggers Date: Mon Jul 8 16:53:25 2024 -0700 scsi: ufs: core: Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE which lets UFS host drivers initialize the blk_crypto_profile themselves rather than have it be initialized by ufshcd-core according to the UFSHCI standard. This is needed to support inline encryption on the "Exynos" UFS controller which has a nonstandard interface. Reviewed-by: Bart Van Assche Reviewed-by: Peter Griffin Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20240708235330.103590-2-ebiggers@kernel.org Reviewed-by: Alim Akhtar Signed-off-by: Martin K. Petersen commit af568c7e8292bd96f5a4f3d81dcc099bcfe7cb73 Author: Bart Van Assche Date: Mon Jul 8 14:16:05 2024 -0700 scsi: ufs: mcq: Make .get_hba_mac() optional UFSHCI controllers that are compliant with the UFSHCI 4.0 standard report the maximum number of supported commands in the controller capabilities register. Use that value if .get_hba_mac == NULL. Reviewed-by: Peter Wang Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-11-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen commit 5e2053a41984f6086460a6b377a53af46ebe2b0f Author: Bart Van Assche Date: Mon Jul 8 14:16:04 2024 -0700 scsi: ufs: mcq: Inline ufshcd_mcq_vops_get_hba_mac() Make ufshcd_mcq_decide_queue_depth() easier to read by inlining ufshcd_mcq_vops_get_hba_mac(). Reviewed-by: Peter Wang Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-10-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen commit 7e2c268dc3069b3d51a6dcd035d9b2ed3e6b1676 Author: Bart Van Assche Date: Mon Jul 8 14:16:03 2024 -0700 scsi: ufs: mcq: Move the ufshcd_mcq_enable() call Move the ufshcd_mcq_enable() call from inside ufshcd_config_mcq() to the callers of this function. No functionality is changed by this patch. This patch makes a later patch easier to read ("scsi: ufs: Make .get_hba_mac() optional"). Cc: Peter Wang Cc: Manivannan Sadhasivam Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-9-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen commit 4a8c859b44da0040a02fb12aec3384f6279c17d7 Author: Bart Van Assche Date: Mon Jul 8 14:16:02 2024 -0700 scsi: ufs: mcq: Move the "hba->mcq_enabled = true" assignment Move the "hba->mcq_enabled = true" assignment to prevent that it gets duplicated by a later patch that will introduce more ufshcd_mcq_enable() calls. No functionality is changed by this patch. Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-8-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen commit 0fca3318e550d48e9a6c844763bf9eab91c30f07 Author: Bart Van Assche Date: Mon Jul 8 14:16:01 2024 -0700 scsi: ufs: core: Inline is_mcq_enabled() Improve code readability by inlining is_mcq_enabled(). Cc: Peter Wang Cc: Manivannan Sadhasivam Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-7-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen commit f4750af7081d0466c187f5b72272d07148d75e67 Author: Bart Van Assche Date: Mon Jul 8 14:16:00 2024 -0700 scsi: ufs: core: Initialize hba->reserved_slot earlier Move the hba->reserved_slot and the host->can_queue assignments from ufshcd_config_mcq() into ufshcd_alloc_mcq(). The advantages of this change are as follows: - It becomes easier to verify that these two parameters are updated if hba->nutrs is updated. - It prevents unnecessary assignments to these two parameters. While ufshcd_config_mcq() is called during host reset, ufshcd_alloc_mcq() is not. Cc: Can Guo Reviewed-by: Peter Wang Reviewed-by: Manivannan Sadhasivam Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-6-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit b53eb9a050d76fc695258e04384e71b3fbb1e7d5 Author: Bart Van Assche Date: Mon Jul 8 14:15:59 2024 -0700 scsi: ufs: core: Rename the MASK_TRANSFER_REQUESTS_SLOTS constant Rename this constant to prepare for the introduction of the MASK_TRANSFER_REQUESTS_SLOTS_MCQ constant. The acronym "SDB" stands for "single doorbell" (mode). Reviewed-by: Peter Wang Reviewed-by: Manivannan Sadhasivam Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-5-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit 92c0b10fefe2cdae4e57ad5ff023345ea4a7134d Author: Bart Van Assche Date: Mon Jul 8 14:15:58 2024 -0700 scsi: ufs: core: Remove two constants The SCSI host template members .cmd_per_lun and .can_queue are copied into the SCSI host data structure. Before these are used, these are overwritten by ufshcd_init(). Hence, this patch does not change any functionality. Reviewed-by: Manivannan Sadhasivam Reviewed-by: Keoseong Park Reviewed-by: Peter Wang Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-4-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit 93ef12d92f65facf8e02ad6578a898a190aa6cf4 Author: Bart Van Assche Date: Mon Jul 8 14:15:57 2024 -0700 scsi: ufs: core: Initialize struct uic_command once Instead of first zero-initializing struct uic_command and next initializing it memberwise, initialize all members at once. Reviewed-by: Daejun Park Reviewed-by: Avri Altman Reviewed-by: Manivannan Sadhasivam Reviewed-by: Peter Wang Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit d502dac69ac0a81231bb41f0e4e2845d7414191b Author: Bart Van Assche Date: Mon Jul 8 14:15:56 2024 -0700 scsi: ufs: core: Declare functions once Several functions are declared in include/ufs/ufshcd.h and also in drivers/ufs/core/ufshcd-priv.h. Remove the duplicate declarations. Reviewed-by: Peter Wang Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240708211716.2827751-2-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam Reviewed-by: Keoseong Park Signed-off-by: Martin K. Petersen commit cf82b9e866b644d5eca38b55403e1efa8f1cc8da Author: Sumit Saxena Date: Thu Jun 27 15:47:35 2024 +0530 scsi: mpi3mr: Driver version update Signed-off-by: Sumit Saxena Link: https://lore.kernel.org/r/20240627101735.18286-4-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen commit 1c342b0548e3f53a4478f8d7bcb5af34b361787f Author: Sumit Saxena Date: Thu Jun 27 15:47:34 2024 +0530 scsi: mpi3mr: Prevent PCI writes from driver during PCI error recovery Prevent interaction with the hardware while the error recovery in progress. Co-developed-by: Sathya Prakash Signed-off-by: Sathya Prakash Co-developed-by: Ranjan Kumar Signed-off-by: Ranjan Kumar Signed-off-by: Sumit Saxena Link: https://lore.kernel.org/r/20240627101735.18286-3-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen commit 30bafe1774f0aa17d3247466a2d51ce1351c3331 Author: Sumit Saxena Date: Thu Jun 27 15:47:33 2024 +0530 scsi: mpi3mr: Support PCI Error Recovery callback handlers PCI Error recovery support is required to recover the controller upon detection of PCI errors. Add support for the PCI error recovery callback handlers in mpi3mr driver. Co-developed-by: Sathya Prakash Signed-off-by: Sathya Prakash Co-developed-by: Ranjan Kumar Signed-off-by: Ranjan Kumar Signed-off-by: Sumit Saxena Link: https://lore.kernel.org/r/20240627101735.18286-2-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen commit 41972df1a56bd926a14d9670936f5278cf351968 Author: Justin Tee Date: Fri Jun 28 10:20:11 2024 -0700 scsi: lpfc: Update lpfc version to 14.4.0.3 Update lpfc version to 14.4.0.3. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 8bc7c617642db6d8d20ee671fb6c4513017e7a7e Author: Justin Tee Date: Fri Jun 28 10:20:10 2024 -0700 scsi: lpfc: Revise lpfc_prep_embed_io routine with proper endian macro usages On big endian architectures, it is possible to run into a memory out of bounds pointer dereference when FCP targets are zoned. In lpfc_prep_embed_io, the memcpy(ptr, fcp_cmnd, sgl->sge_len) is referencing a little endian formatted sgl->sge_len value. So, the memcpy can cause big endian systems to crash. Redefine the *sgl ptr as a struct sli4_sge_le to make it clear that we are referring to a little endian formatted data structure. And, update the routine with proper le32_to_cpu macro usages. Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs") Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit f65f31ac120bd2c308bf16b18e5ee74445c06446 Author: Justin Tee Date: Fri Jun 28 10:20:09 2024 -0700 scsi: lpfc: Fix incorrect request len mbox field when setting trunking via sysfs When setting trunk modes through sysfs, the SLI_CONFIG mailbox command's command payload length is incorrectly hardcoded to 12 bytes. SLI_CONFIG's payload length field should be specified large enough to encompass both the submailbox command header and the submailbox request itself. Thus, replace the hardcoded 12 bytes with a clearer calculation by way of sizeof(struct lpfc_mbx_set_trunk_mode) - sizeof(struct lpfc_sli4_cfg_mhdr). Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit ede596b1434b57c0b3fd5c02b326efe5c54f6e48 Author: Justin Tee Date: Fri Jun 28 10:20:08 2024 -0700 scsi: lpfc: Handle mailbox timeouts in lpfc_get_sfp_info The MBX_TIMEOUT return code is not handled in lpfc_get_sfp_info and the routine unconditionally frees submitted mailbox commands regardless of return status. The issue is that for MBX_TIMEOUT cases, when firmware returns SFP information at a later time, that same mailbox memory region references previously freed memory in its cmpl routine. Fix by adding checks for the MBX_TIMEOUT return code. During mailbox resource cleanup, check the mbox flag to make sure that the wait did not timeout. If the MBOX_WAKE flag is not set, then do not free the resources because it will be freed when firmware completes the mailbox at a later time in its cmpl routine. Also, increase the timeout from 30 to 60 seconds to accommodate boot scripts requiring longer timeouts. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 15e21dc6d6b7d7feb6b0262422b45da5e3c4b34f Author: Justin Tee Date: Fri Jun 28 10:20:07 2024 -0700 scsi: lpfc: Fix handling of fully recovered fabric node in dev_loss callbk In rare cases when a fabric node is recovered after a link bounce and before dev_loss_tmo callbk is reached, the driver may leave the fabric node in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set. In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node. If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric nodes is updated to clear the NLP_IN_DEV_LOSS flag. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit aeaf117cc7d28b994652491c87b929d853519d20 Author: Justin Tee Date: Fri Jun 28 10:20:06 2024 -0700 scsi: lpfc: Relax PRLI issue conditions after GID_FT response If previously in REG_LOGIN_ISSUE state, then remove the requirement that PLOGI must have been received from the remote port before issuing a PRLI. After GID_FT completes, it does not matter whether the driver itself sent a PLOGI or received one. The fact that we're in REG_LOGIN_ISSUE state simply means that the next state should be issuing the PRLI to continue discovery of the remote port. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 9609385dd91b26751019b22ca9bfa4bec7602ae1 Author: Justin Tee Date: Fri Jun 28 10:20:05 2024 -0700 scsi: lpfc: Allow DEVICE_RECOVERY mode after RSCN receipt if in PRLI_ISSUE state Certain vendor specific targets initially register with the fabric as an initiator function first and then re-register as a target function afterwards. The timing of the target function re-registration can cause a race condition such that the driver is stuck assuming the remote port as an initiator function and never discovers the target's hosted LUNs. Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because the state means that PRLI was issued but we have not quite reached MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state, then we should restart discovery again by going into DEVICE_RECOVERY. Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN") Cc: # v6.6+ Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit e999ef15423b35473c0d94ff3dd2011a0f7ddb04 Author: Justin Tee Date: Fri Jun 28 10:20:04 2024 -0700 scsi: lpfc: Cancel ELS WQE instead of issuing abort when SLI port is inactive During SLI port errata events, there should be no expectation that submitted outstanding WQEs will return back CQEs. In these situations, the driver should not rely on receiving CQEs from the SLI port to signal WQE resource clean up. Put an sli_flag LPFC_SLI_ACTIVE check in lpfc_els_flush_cmd() when walking the txcmplq. The sli_flag check helps determine whether to issue an abort or driver based cancel on outstanding WQEs. If !LPFC_SLI_ACTIVE, then there's no point to issue anything to the SLI port. Instead, let the driver based cancel logic clean up the submitted WQE resources. Also, enhance some abort log messages that help with future debugging. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240628172011.25921-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4 Author: Damien Le Moal Date: Tue Jul 2 06:53:26 2024 +0900 scsi: sd: Do not repeat the starting disk message The SCSI disk message "Starting disk" to signal resuming of a suspended disk is printed in both sd_resume() and sd_resume_common() which results in this message being printed twice when resuming from e.g. autosuspend: $ echo 5000 > /sys/block/sda/device/power/autosuspend_delay_ms $ echo auto > /sys/block/sda/device/power/control [ 4962.438293] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 4962.501121] sd 0:0:0:0: [sda] Stopping disk $ echo on > /sys/block/sda/device/power/control [ 4972.805851] sd 0:0:0:0: [sda] Starting disk [ 4980.558806] sd 0:0:0:0: [sda] Starting disk Fix this double print by removing the call to sd_printk() from sd_resume() and moving the call to sd_printk() in sd_resume_common() earlier in the function, before the check using sd_do_start_stop(). Doing so, the message is printed once regardless if sd_resume_common() actually executes sd_start_stop_device() (i.e. SCSI device case) or not (libsas and libata managed ATA devices case). Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240701215326.128067-1-dlemoal@kernel.org Reviewed-by: Bart Van Assche Reviewed-by: John Garry Signed-off-by: Martin K. Petersen commit 74736103fb4123c71bf11fb7a6abe7c884c5269e Author: Peter Wang Date: Fri Jun 28 15:00:30 2024 +0800 scsi: ufs: core: Fix ufshcd_abort_one racing issue When ufshcd_abort_one is racing with the completion ISR, the completed tag of the request's mq_hctx pointer will be set to NULL by ISR. Return success when request is completed by ISR because ufshcd_abort_one does not need to do anything. The racing flow is: Thread A ufshcd_err_handler step 1 ... ufshcd_abort_one ufshcd_try_to_abort_task ufshcd_cmd_inflight(true) step 3 ufshcd_mcq_req_to_hwq blk_mq_unique_tag rq->mq_hctx->queue_num step 5 Thread B ufs_mtk_mcq_intr(cq complete ISR) step 2 scsi_done ... __blk_mq_free_request rq->mq_hctx = NULL; step 4 Below is KE back trace. ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device. ufshcd_try_to_abort_task: cmd at tag=41 is cleared. Aborting tag 41 / CDB 0x28 succeeded Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise] do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode") Suggested-by: Bart Van Assche Signed-off-by: Peter Wang Link: https://lore.kernel.org/r/20240628070030.30929-3-peter.wang@mediatek.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 9307a998cb9846a2557fdca286997430bee36a2a Author: Peter Wang Date: Fri Jun 28 15:00:29 2024 +0800 scsi: ufs: core: Fix ufshcd_clear_cmd racing issue When ufshcd_clear_cmd is racing with the completion ISR, the completed tag of the request's mq_hctx pointer will be set to NULL by the ISR. And ufshcd_clear_cmd's call to ufshcd_mcq_req_to_hwq will get NULL pointer KE. Return success when the request is completed by ISR because sq does not need cleanup. The racing flow is: Thread A ufshcd_err_handler step 1 ufshcd_try_to_abort_task ufshcd_cmd_inflight(true) step 3 ufshcd_clear_cmd ... ufshcd_mcq_req_to_hwq blk_mq_unique_tag rq->mq_hctx->queue_num step 5 Thread B ufs_mtk_mcq_intr(cq complete ISR) step 2 scsi_done ... __blk_mq_free_request rq->mq_hctx = NULL; step 4 Below is KE back trace: ufshcd_try_to_abort_task: cmd pending in the device. tag = 6 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise] Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise] Call trace: dump_backtrace+0xf8/0x148 show_stack+0x18/0x24 dump_stack_lvl+0x60/0x7c dump_stack+0x18/0x3c mrdump_common_die+0x24c/0x398 [mrdump] ipanic_die+0x20/0x34 [mrdump] notify_die+0x80/0xd8 die+0x94/0x2b8 __do_kernel_fault+0x264/0x298 do_page_fault+0xa4/0x4b8 do_translation_fault+0x38/0x54 do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise] ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise] ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort") Suggested-by: Bart Van Assche Signed-off-by: Peter Wang Link: https://lore.kernel.org/r/20240628070030.30929-2-peter.wang@mediatek.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 76a20140ef768133697c04dc19bf7e81706905c7 Author: Terrence Adams Date: Thu Jun 27 15:59:24 2024 +0000 scsi: pm8001: Update log level when reading config table Reading the main config table occurs as a part of initialization in pm80xx_chip_init(). Because of this it makes more sense to have it be a part of the INIT logging. Signed-off-by: Terrence Adams Link: https://lore.kernel.org/r/20240627155924.2361370-3-tadamsjr@google.com Acked-by: Jack Wang Signed-off-by: Martin K. Petersen commit e4f949ef1516c0d74745ee54a0f4882c1f6c7aea Author: Igor Pylypiv Date: Thu Jun 27 15:59:23 2024 +0000 scsi: pm80xx: Set phy->enable_completion only when we wait for it pm8001_phy_control() populates the enable_completion pointer with a stack address, sends a PHY_LINK_RESET / PHY_HARD_RESET, waits 300 ms, and returns. The problem arises when a phy control response comes late. After 300 ms the pm8001_phy_control() function returns and the passed enable_completion stack address is no longer valid. Late phy control response invokes complete() on a dangling enable_completion pointer which leads to a kernel crash. Signed-off-by: Igor Pylypiv Signed-off-by: Terrence Adams Link: https://lore.kernel.org/r/20240627155924.2361370-2-tadamsjr@google.com Acked-by: Jack Wang Signed-off-by: Martin K. Petersen commit 7cbff570dbe8907e23bba06f6414899a0fbb2fcc Author: Kyoungrul Kim Date: Thu Jun 27 17:51:04 2024 +0900 scsi: ufs: core: Remove SCSI host only if added If host tries to remove ufshcd driver from a UFS device it would cause a kernel panic if ufshcd_async_scan fails during ufshcd_probe_hba before adding a SCSI host with scsi_add_host and MCQ is enabled since SCSI host has been defered after MCQ configuration introduced by commit 0cab4023ec7b ("scsi: ufs: core: Defer adding host to SCSI if MCQ is supported"). To guarantee that SCSI host is removed only if it has been added, set the scsi_host_added flag to true after adding a SCSI host and check whether it is set or not before removing it. Signed-off-by: Kyoungrul Kim Signed-off-by: Minwoo Im Link: https://lore.kernel.org/r/20240627085104epcms2p5897a3870ea5c6416aa44f94df6c543d7@epcms2p5 Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit ed7dac86f1406d73aed21d0cd1563922031a2fd8 Author: Ram Prakash Gupta Date: Thu Jun 27 14:07:56 2024 +0530 scsi: ufs: qcom: Enable suspending clk scaling on no request Enable suspending clk scaling on no request for Qualcomm SoC. Signed-off-by: Ram Prakash Gupta Link: https://lore.kernel.org/r/20240627083756.25340-3-quic_rampraka@quicinc.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 50183ac2cfb54e027dd36fb22ea1bd1e91e3a08b Author: Ram Prakash Gupta Date: Thu Jun 27 14:07:55 2024 +0530 scsi: ufs: core: Suspend clk scaling on no request Currently UFS clk scaling is getting suspended only when the clks are scaled down. When high load is generated, a huge amount of latency is added due to scaling up the clk and completing the request post that. Suspending the scaling in its existing state when high load is generated improves the random performance KPI by 28%. So suspending the scaling when there are no requests. And the clk would be put in low scaled state when the actual request load is low. Make this change optional by having the check enabled using vops since for some devices suspending without bringing the clk in low scaled state might have impact on power consumption of the SoC. Signed-off-by: Ram Prakash Gupta Link: https://lore.kernel.org/r/20240627083756.25340-2-quic_rampraka@quicinc.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit de24085328c09e5a6dd8a8d95ad65876f097e9c1 Author: Tomas Henzl Date: Thu Jun 27 09:48:27 2024 +0200 scsi: mpi3mr: Correct a test in mpi3mr_sas_port_add() The test for a possible shift overflow is not correct. Fix it by replacing the '>' with a '>='. Signed-off-by: Tomas Henzl Link: https://lore.kernel.org/r/20240627074827.13672-1-thenzl@redhat.com Suggested-by: Dan Carpenter Signed-off-by: Martin K. Petersen commit 3f7e469987f85a16d968ae056c886ecb6207305f Author: Ranjan Kumar Date: Wed Jun 26 15:56:46 2024 +0530 scsi: mpi3mr: Update driver version to 8.9.1.0.50 Update driver version to 8.9.1.0.50 Signed-off-by: Ranjan Kumar Link: https://lore.kernel.org/r/20240626102646.14298-5-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen commit 78b506984ebeaefaf63172059422fc606dc23e68 Author: Ranjan Kumar Date: Wed Jun 26 15:56:45 2024 +0530 scsi: mpi3mr: Add ioctl support for HDB Add interface for applications to manage the host diagnostic buffers and update the automatic diag buffer capture triggers. Co-developed-by: Sathya Prakash Signed-off-by: Sathya Prakash Signed-off-by: Ranjan Kumar Link: https://lore.kernel.org/r/20240626102646.14298-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen commit d8d08d1638ce1c237ce9c3f14649a3a4cf1b1749 Author: Ranjan Kumar Date: Wed Jun 26 15:56:44 2024 +0530 scsi: mpi3mr: Trigger support Add functions to process automatic diag triggers. If a condition defined in the triggers is met, the driver will call appropriate controller functions to save the diagnostic information. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405151955.BiAWI1SY-lkp@intel.com/ Co-developed-by: Sathya Prakash Signed-off-by: Sathya Prakash Signed-off-by: Ranjan Kumar Link: https://lore.kernel.org/r/20240626102646.14298-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen commit fc4444941140c80088102b39eb130d2a7a048e58 Author: Ranjan Kumar Date: Wed Jun 26 15:56:43 2024 +0530 scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers To be able to debug controller problems it is beneficial to allocate and configure system/host memory buffers which can be used to capture hardware and firmware diagnostic information. Add functions required to allocate and post firmware and hardware diagnostic buffers to the controller and to set up automatic diagnostic capture triggers. Captures will be triggered under the following circumstances: 1. Firmware is in FAULT state. 2. Admin commands time out. 3. Controller reset caused due to I/O timeout Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405151758.7xrJz6rp-lkp@intel.com/ Co-developed-by: Sathya Prakash Signed-off-by: Sathya Prakash Signed-off-by: Ranjan Kumar Link: https://lore.kernel.org/r/20240626102646.14298-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen commit bdee2f1dcd84dcd76d9557cf55e8f3662c6506eb Author: Adrian Hunter Date: Tue Jun 18 10:31:58 2024 +0300 scsi: ufs: ufs-pci: Add support for Intel Panther Lake Add PCI ID to support Intel Panther Lake, same as MTL. Signed-off-by: Adrian Hunter Link: https://lore.kernel.org/r/20240618073158.38504-1-adrian.hunter@intel.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 4d66ecc6e5a5df965b43edace2d41d19ed15a2ea Author: Jeff Johnson Date: Tue Jun 25 09:53:45 2024 -0700 scsi: ufs: qcom: Add missing MODULE_DESCRIPTION() macro With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/ufs/host/ufs-qcom.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson Link: https://lore.kernel.org/r/20240625-md-drivers-ufs-host-v2-1-59a56974b05a@quicinc.com Signed-off-by: Martin K. Petersen commit 5e0bf3e8aec2cbc51123f84b29aaacbd91fc56fa Author: Huai-Yuan Liu Date: Fri Jun 21 16:25:45 2024 +0800 scsi: lpfc: Fix a possible null pointer dereference In function lpfc_xcvr_data_show, the memory allocation with kmalloc might fail, thereby making rdp_context a null pointer. In the following context and functions that use this pointer, there are dereferencing operations, leading to null pointer dereference. To fix this issue, a null pointer check should be added. If it is null, use scnprintf to notify the user and return len. Fixes: 479b0917e447 ("scsi: lpfc: Create a sysfs entry called lpfc_xcvr_data for transceiver info") Signed-off-by: Huai-Yuan Liu Link: https://lore.kernel.org/r/20240621082545.449170-1-qq810974084@gmail.com Reviewed-by: Justin Tee Signed-off-by: Martin K. Petersen commit ab2068a6fb84751836a84c26ca72b3beb349619d Author: Xingui Yang Date: Wed Jun 19 09:17:42 2024 +0000 scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed The expander phy will be treated as broadcast flutter in the next revalidation after the exp-attached end device probe failed, as follows: [78779.654026] sas: broadcast received: 0 [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp) [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 ... [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19 [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone [78835.187487] sas: broadcast received: 0 [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp) [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 The cause of the problem is that the related ex_phy's attached_sas_addr was not cleared after the end device probe failed, so reset it. Signed-off-by: Xingui Yang Link: https://lore.kernel.org/r/20240619091742.25465-1-yangxingui@huawei.com Reviewed-by: John Garry Signed-off-by: Martin K. Petersen commit b402a0dce64aa3e14a9bd15ab1dd87a93967f90c Author: Ming Lei Date: Wed Jun 19 09:38:03 2024 +0800 scsi: scsi_debug: Fix create target debugfs failure Target debugfs entry is removed via async_schedule() which isn't drained when adding same name target, so failure of "Directory 'target11:0:0' with parent 'scsi_debug' already present!" can be triggered easily. Fix it by switching to domain async schedule, and draining it before adding new target debugfs entry. Cc: Wenchao Hao Fixes: f084fe52c640 ("scsi: scsi_debug: Add debugfs interface to fail target reset") Signed-off-by: Ming Lei Acked-by: Wenchao Hao Link: https://lore.kernel.org/r/20240619013803.3008857-1-ming.lei@redhat.com Signed-off-by: Martin K. Petersen commit 57619f3cdeb5ae9f4252833b0ed600e9f81da722 Author: Bart Van Assche Date: Thu Jun 13 14:18:27 2024 -0700 scsi: usb: uas: Do not query the IO Advice Hints Grouping mode page for USB/UAS devices Recently it was reported that the following USB storage devices are unusable with Linux kernel 6.9: * Kingston DataTraveler G2 * Garmin FR35 This is because attempting to read the IO Advice Hints Grouping mode page causes these devices to reset. Hence do not read the IO Advice Hints Grouping mode page from USB/UAS storage devices. Acked-by: Alan Stern Cc: stable@vger.kernel.org Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information") Reported-by: Joao Machado Closes: https://lore.kernel.org/linux-scsi/20240130214911.1863909-1-bvanassche@acm.org/T/#mf4e3410d8f210454d7e4c3d1fb5c0f41e651b85f Tested-by: Andy Shevchenko Bisected-by: Christian Heusel Reported-by: Andy Shevchenko Closes: https://lore.kernel.org/linux-scsi/CACLx9VdpUanftfPo2jVAqXdcWe8Y43MsDeZmMPooTzVaVJAh2w@mail.gmail.com/ Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240613211828.2077477-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit 633aeefafc9c2a07a76a62be6aac1d73c3e3defa Author: Bart Van Assche Date: Thu Jun 13 14:18:26 2024 -0700 scsi: core: Introduce the BLIST_SKIP_IO_HINTS flag Prepare for skipping the IO Advice Hints Grouping mode page for USB storage devices. Cc: Alan Stern Cc: Joao Machado Cc: Andy Shevchenko Cc: Christian Heusel Cc: stable@vger.kernel.org Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information") Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240613211828.2077477-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit 135c6eb27a85c8b261a2cc1f5093abcda6ee9010 Author: Joel Slebodnick Date: Thu Jun 13 14:27:28 2024 -0400 scsi: ufs: core: Free memory allocated for model before reinit Under the conditions that a device is to be reinitialized within ufshcd_probe_hba(), the device must first be fully reset. Resetting the device should include freeing U8 model (member of dev_info) but does not, and this causes a memory leak. ufs_put_device_desc() is responsible for freeing model. unreferenced object 0xffff3f63008bee60 (size 32): comm "kworker/u33:1", pid 60, jiffies 4294892642 hex dump (first 32 bytes): 54 48 47 4a 46 47 54 30 54 32 35 42 41 5a 5a 41 THGJFGT0T25BAZZA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ed7ff1a9): [] kmemleak_alloc+0x34/0x40 [] __kmalloc_noprof+0x1e4/0x2fc [] ufshcd_read_string_desc+0x94/0x190 [] ufshcd_device_init+0x480/0xdf8 [] ufshcd_probe_hba+0x3c/0x404 [] ufshcd_async_scan+0x40/0x370 [] async_run_entry_fn+0x34/0xe0 [] process_one_work+0x154/0x298 [] worker_thread+0x2f8/0x408 [] kthread+0x114/0x118 [] ret_from_fork+0x10/0x20 Fixes: 96a7141da332 ("scsi: ufs: core: Add support for reinitializing the UFS device") Cc: Reviewed-by: Andrew Halaney Reviewed-by: Bart Van Assche Signed-off-by: Joel Slebodnick Link: https://lore.kernel.org/r/20240613200202.2524194-1-jslebodn@redhat.com Signed-off-by: Martin K. Petersen commit 14d38356ec335b97a04c00719d7b6e73b136d291 Author: Bart Van Assche Date: Wed Jun 12 10:15:21 2024 -0700 scsi: core: Fix an incorrect comment The comment that scsi_static_device_list would go away was added more than 18 years ago. Today, that list is still there and a large number of additional entries have been added. This shows that this comment is incorrect. Hence fix that comment. Cc: Christoph Hellwig Cc: Avri Altman Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240612171522.2677600-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen commit 90e6f08915ec6efe46570420412a65050ec826b2 Author: Damien Le Moal Date: Tue Jun 11 17:34:35 2024 +0900 scsi: mpi3mr: Fix ATA NCQ priority support The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to the HBA if a read or write command directed at an ATA device should be translated to an NCQ read/write command with the high prioiryt bit set when the request uses the RT priority class and the user has enabled NCQ priority through sysfs. However, unlike the mpt3sas driver, the mpi3mr driver does not define the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never actually set and NCQ Priority cannot ever be used. Fix this by defining these missing atributes to allow a user to check if an ATA device supports NCQ priority and to enable/disable the use of NCQ priority. To do this, lift the function scsih_ncq_prio_supp() out of the mpt3sas driver and make it the generic SCSI SAS transport function sas_ata_ncq_prio_supported(). Nothing in that function is hardware specific, so this function can be used in both the mpt3sas driver and the mpi3mr driver. Reported-by: Scott McCoy Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel Signed-off-by: Martin K. Petersen commit 95f8bf932b46cd5c17c681d67be9234551234eac Author: Jeff Johnson Date: Mon Jun 10 09:16:15 2024 -0700 scsi: Add missing MODULE_DESCRIPTION() macros On x86, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/scsi_common.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/advansys.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/BusLogic.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/aha1740.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/isci/isci.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/elx/efct.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/atp870u.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/ppa.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/imm.o Add all missing invocations of the MODULE_DESCRIPTION() macro. This updates all files which have a MODULE_LICENSE() but which do not have a MODULE_DESCRIPTION(), even ones which did not produce the x86 allmodconfig warnings. Acked-by: Finn Thain Signed-off-by: Jeff Johnson Link: https://lore.kernel.org/r/20240610-md-drivers-scsi-v3-1-055da78d66b2@quicinc.com Signed-off-by: Martin K. Petersen commit 77691af484e28af7a692e511b9ed5ca63012ec6e Author: Ziqi Chen Date: Fri Jun 7 18:06:23 2024 +0800 scsi: ufs: core: Quiesce request queues before checking pending cmds In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked, ufshcd_pending_cmds() is called to check whether there are pending transactions or not. And only if there are no pending transactions can we proceed to kickstart the clock scaling sequence. ufshcd_pending_cmds() traverses over all SCSI devices and calls sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down to three steps: 1. Calculate the nr outstanding bits set in the 'word' bitmap. 2. Calculate the nr outstanding bits set in the 'cleared' bitmap. 3. Subtract the result from step 1 by the result from step 2. This can lead to a race condition as outlined below: Assume there is one pending transaction in the request queue of one SCSI device, say sda, and the budget token of this request is 0, the 'word' is 0x1 and the 'cleared' is 0x0. 1. When step 1 executes, it gets the result as 1. 2. Before step 2 executes, block layer tries to dispatch a new request to sda. Since the SCSI layer is blocked, the request cannot pass through SCSI but the block layer would do budget_get() and budget_put() to sda's budget map regardless, so the 'word' has become 0x3 and 'cleared' has become 0x2 (assume the new request got budget token 1). 3. When step 2 executes, it gets the result as 1. 4. When step 3 executes, it gets the result as 0, meaning there is no pending transactions, which is wrong. Thread A Thread B ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests() | | sbitmap_weight(word) | | scsi_mq_get_budget() | | | scsi_mq_put_budget() | | sbitmap_weight(cleared) ... When this race condition happens, the clock scaling sequence is started with transactions still in flight, leading to subsequent hibernate enter failure, broken link, task abort and back to back error recovery. Fix this race condition by quiescing the request queues before calling ufshcd_pending_cmds() so that block layer won't touch the budget map when ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer blocking/unblocking to reduce redundancies and latencies. Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Co-developed-by: Can Guo Signed-off-by: Can Guo Signed-off-by: Ziqi Chen Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 52912ca87e2b810e5acdcdc452593d30c9187d8f Author: Damien Le Moal Date: Fri Jun 7 10:25:07 2024 +0900 scsi: core: Disable CDL by default For SCSI devices supporting the Command Duration Limits feature set, the user can enable/disable this feature use through the sysfs device attribute "cdl_enable". This attribute modification triggers a call to scsi_cdl_enable() to enable and disable the feature for ATA devices and set the scsi device cdl_enable field to the user provided bool value. For SCSI devices supporting CDL, the feature set is always enabled and scsi_cdl_enable() is reduced to setting the cdl_enable field. However, for ATA devices, a drive may spin-up with the CDL feature enabled by default. But the SCSI device cdl_enable field is always initialized to false (CDL disabled), regardless of the actual device CDL feature state. For ATA devices managed by libata (or libsas), libata-core always disables the CDL feature set when the device is attached, thus syncing the state of the CDL feature on the device and of the SCSI device cdl_enable field. However, for ATA devices connected to a SAS HBA, the CDL feature is not disabled on scan for ATA devices that have this feature enabled by default, leading to an inconsistent state of the feature on the device with the SCSI device cdl_enable field. Avoid this inconsistency by adding a call to scsi_cdl_enable() in scsi_cdl_check() to make sure that the device-side state of the CDL feature set always matches the scsi device cdl_enable field state. This implies that CDL will always be disabled for ATA devices connected to SAS HBAs, which is consistent with libata/libsas initialization of the device. Reported-by: Scott McCoy Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel Reviewed-by: Igor Pylypiv Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen commit 4254dfeda82f20844299dca6c38cbffcfd499f41 Author: Breno Leitao Date: Wed Jun 5 01:55:29 2024 -0700 scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory There is a potential out-of-bounds access when using test_bit() on a single word. The test_bit() and set_bit() functions operate on long values, and when testing or setting a single word, they can exceed the word boundary. KASAN detects this issue and produces a dump: BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965 For full log, please look at [1]. Make the allocation at least the size of sizeof(unsigned long) so that set_bit() and test_bit() have sufficient room for read/write operations without overwriting unallocated memory. [1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/ Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path") Cc: stable@vger.kernel.org Suggested-by: Keith Busch Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org Reviewed-by: Keith Busch Signed-off-by: Martin K. Petersen commit 7926d51f73e0434a6250c2fd1a0555f98d9a62da Author: Martin K. Petersen Date: Tue Jun 4 22:25:21 2024 -0400 scsi: sd: Use READ(16) when reading block zero on large capacity disks Commit 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") triggered a read to LBA 0 before attempting to inquire about device characteristics. This was done because some protocol bridge devices will return generic values until an attached storage device's media has been accessed. Pierre Tomon reported that this change caused problems on a large capacity external drive connected via a bridge device. The bridge in question does not appear to implement the READ(10) command. Issue a READ(16) instead of READ(10) when a device has been identified as preferring 16-byte commands (use_16_for_rw heuristic). Link: https://bugzilla.kernel.org/show_bug.cgi?id=218890 Link: https://lore.kernel.org/r/70dd7ae0-b6b1-48e1-bb59-53b7c7f18274@rowland.harvard.edu Link: https://lore.kernel.org/r/20240605022521.3960956-1-martin.petersen@oracle.com Fixes: 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") Cc: stable@vger.kernel.org Reported-by: Pierre Tomon Suggested-by: Alan Stern Tested-by: Pierre Tomon Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit daf613331c9388dec1b8c56565583afcdf87a053 Author: Bart Van Assche Date: Mon Jun 3 10:23:11 2024 -0700 scsi: powertec: Declare local function static Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240603172311.1587589-5-bvanassche@acm.org Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen commit 1dc98be418149feb79779af45abe0fe6243243e9 Author: Bart Van Assche Date: Mon Jun 3 10:23:10 2024 -0700 scsi: eesox: Declare local function static Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240603172311.1587589-4-bvanassche@acm.org Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen commit 1414045725a00f40d52ffb4c866d6efeab02c37a Author: Bart Van Assche Date: Mon Jun 3 10:23:09 2024 -0700 scsi: cumana: Declare local function static Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240603172311.1587589-3-bvanassche@acm.org Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen commit f5a954bbf2f4309a222f56162f2cd576b7b27f48 Author: Bart Van Assche Date: Mon Jun 3 10:23:08 2024 -0700 scsi: acornscsi: Declare local functions static Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20240603172311.1587589-2-bvanassche@acm.org Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen commit a420a8ed0a92488a04b34dfc262101c87940c800 Author: Minwoo Im Date: Sat Jun 1 06:22:44 2024 +0900 scsi: ufs: mcq: Prevent no I/O queue case for MCQ If hba_maxq equals poll_queues, which means there are no I/O queues (HCTX_TYPE_DEFAULT, HCTX_TYPE_READ), the very first hw queue will be allocated as HCTX_TYPE_POLL and it will be used as the dev_cmd_queue. In this case, device commands such as QUERY cannot be properly handled. This patch prevents the initialization of MCQ when the number of I/O queues is not set and only the number of POLL queues is set. Signed-off-by: Minwoo Im Link: https://lore.kernel.org/r/20240531212244.1593535-3-minwoo.im@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 175d1825ca4d2288fee734ada0955a1e36dd50e6 Author: Minwoo Im Date: Sat Jun 1 06:22:43 2024 +0900 scsi: ufs: pci: Add support MCQ for QEMU-based UFS Recently, ufs-mcq feature has been introduced to QEMU hw/ufs device [1]. This patch adds MCQ support for upstream QEMU UFS PCI controller. This patch provides mandatory vops callbacks to make UFS controller work properly on MCQ mode. Operation and Runtime Config register stride is fixed to 48bytes which is implemented by qemu. [1] https://lore.kernel.org/qemu-devel/cover.1716876237.git.jeuk20.kim@samsung.com/ Signed-off-by: Minwoo Im Link: https://lore.kernel.org/r/20240531212244.1593535-2-minwoo.im@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit e8a1d87b7983b461d1d625e2973cdaadc0bd8ff5 Author: Minwoo Im Date: Mon May 20 07:14:57 2024 +0900 scsi: ufs: mcq: Convert MCQ_CFG_n to an inline function Inline functions are preferred over macros. Convert the MCQ_CFG_n macro to an inline function. Signed-off-by: Minwoo Im Link: https://lore.kernel.org/r/20240519221457.772346-3-minwoo.im@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 2fc39848952dfb91a9233563cc1444669b8e79c3 Author: Minwoo Im Date: Mon May 20 07:14:56 2024 +0900 scsi: ufs: mcq: Fix missing argument 'hba' in MCQ_OPR_OFFSET_n The MCQ_OPR_OFFSET_n macro takes 'hba' in the caller context without receiving 'hba' instance as an argument. To prevent potential bugs in future use cases, add an argument 'hba'. Fixes: 2468da61ea09 ("scsi: ufs: core: mcq: Configure operation and runtime interface") Cc: Asutosh Das Signed-off-by: Minwoo Im Link: https://lore.kernel.org/r/20240519221457.772346-2-minwoo.im@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit d53b681ce9ca7db5ef4ecb8d2cf465ae4a031264 Author: Chanwoo Lee Date: Fri May 24 10:59:04 2024 +0900 scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort() An error unrelated to ufshcd_try_to_abort_task is being logged and can cause confusion. Modify ufshcd_mcq_abort() to print the result of the abort failure. For readability, return immediately instead of 'goto'. Fixes: f1304d442077 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()") Signed-off-by: Chanwoo Lee Link: https://lore.kernel.org/r/20240524015904.1116005-1-cw9316.lee@samsung.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 600edc6620a4380b9f6027f293dac09eb0f22048 Author: Avri Altman Date: Thu May 30 17:25:09 2024 +0300 scsi: ufs: sysfs: Make max_number_of_rtt read-write Given the importance of the RTT parameter, we want to be able to configure it via sysfs. This is because UFS users should be discouraged from change UFS device parameters without the UFSHCI driver being aware of these changes. Signed-off-by: Avri Altman Link: https://lore.kernel.org/r/20240530142510.734-4-avri.altman@wdc.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit e75ff63300c5e8fac31649f438ebad6af88e0032 Author: Avri Altman Date: Thu May 30 17:25:08 2024 +0300 scsi: ufs: core: Maximum RTT supported by the host driver Allow platform vendors to take precedence having their own max rtt support. This makes sense because the host controller's nortt characteristic may vary among vendors. while at it, set this value for Mediatek, as requested by Peter - https://lore.kernel.org/all/0a57d6bab739d6a10584f2baba115d00dfc9c94c.camel@mediatek.com/ Signed-off-by: Avri Altman Link: https://lore.kernel.org/r/20240530142510.734-3-avri.altman@wdc.com Reviewed-by: Peter Wang Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 9ec54934ce857065e38523a2010e20182e76f515 Author: Avri Altman Date: Thu May 30 17:25:07 2024 +0300 scsi: ufs: core: Allow RTT negotiation The rtt-upiu packets precede any data-out upiu packets, thus synchronizing the data input to the device: this mostly applies to write operations, but there are other operations that requires rtt as well. There are several rules binding this rtt - data-out dialog, specifically There can be at most outstanding bMaxNumOfRTT such packets. This might have an effect on write performance (sequential write in particular), as each data-out upiu must wait for its rtt sibling. UFSHCI expects bMaxNumOfRTT to be min(bDeviceRTTCap, NORTT). However, as of today, there does not appears to be no-one who sets it: not the host controller nor the driver. It wasn't an issue up to now: bMaxNumOfRTT is set to 2 after manufacturing, and wasn't limiting the write performance. UFS4.0, and specifically gear 5 changes this, and requires the device to be more attentive. This doesn't come free - the device has to allocate more resources to that end, but the sequential write performance improvement is significant. Early measurements shows 25% gain when moving from rtt 2 to 9. Therefore, set bMaxNumOfRTT to be min(bDeviceRTTCap, NORTT) as UFSHCI expects. Signed-off-by: Avri Altman Link: https://lore.kernel.org/r/20240530142510.734-2-avri.altman@wdc.com Reviewed-by: Bean Huo Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 96281dfa266d333522c004205acc5ff1e9e3a337 Author: Dr. David Alan Gilbert Date: Tue May 28 22:56:40 2024 +0100 scsi: qla2xxx: Remove unused struct 'scsi_dif_tuple' 'scsi_dif_tuple' is unused since commit 8cb2049c7448 ("[SCSI] qla2xxx: T10 DIF - Handle uninitalized sectors."). Remove it. Signed-off-by: Dr. David Alan Gilbert Link: https://lore.kernel.org/r/20240528215640.91771-1-linux@treblig.org Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 41b757425203a73ba5aa401cf00feeccc1555f0c Author: John Garry Date: Fri May 24 08:48:29 2024 +0000 scsi: bsg: Pass dev to blk_mq_alloc_queue() When calling bsg_setup_queue() -> blk_mq_alloc_queue(), we don't pass the dev as the queuedata, but rather manually set it afterwards. Just pass dev to blk_mq_alloc_queue() to have automatically set. Signed-off-by: John Garry Link: https://lore.kernel.org/r/20240524084829.2132555-3-john.g.garry@oracle.com Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Tested-by: Himanshu Madhani Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit e7c09df178f740b74a077bbc16ed0bd872ad0581 Author: John Garry Date: Fri May 24 08:48:28 2024 +0000 scsi: core: Pass sdev to blk_mq_alloc_queue() When calling scsi_alloc_sdev() -> blk_mq_alloc_queue(), we don't pass the sdev as the queuedata, but rather manually set it afterwards. Just pass to blk_mq_alloc_queue() to have automatically set. Signed-off-by: John Garry Link: https://lore.kernel.org/r/20240524084829.2132555-2-john.g.garry@oracle.com Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Tested-by: Himanshu Madhani Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit d09c05aa35909adb7d29f92f0cd79fdcd1338ef0 Author: Martin K. Petersen Date: Mon May 20 22:30:40 2024 -0400 scsi: core: Handle devices which return an unusually large VPD page count Peter Schneider reported that a system would no longer boot after updating to 6.8.4. Peter bisected the issue and identified commit b5fc07a5fb56 ("scsi: core: Consult supported VPD page list prior to fetching page") as being the culprit. Turns out the enclosure device in Peter's system reports a byteswapped page length for VPD page 0. It reports "02 00" as page length instead of "00 02". This causes us to attempt to access 516 bytes (page length + header) of information despite only 2 pages being present. Limit the page search scope to the size of our VPD buffer to guard against devices returning a larger page count than requested. Link: https://lore.kernel.org/r/20240521023040.2703884-1-martin.petersen@oracle.com Fixes: b5fc07a5fb56 ("scsi: core: Consult supported VPD page list prior to fetching page") Cc: stable@vger.kernel.org Reported-by: Peter Schneider Closes: https://lore.kernel.org/all/eec6ebbf-061b-4a7b-96dc-ea748aa4d035@googlemail.com/ Tested-by: Peter Schneider Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit e4f5f8298cf6ddae43210d236ad65ac2c6379559 Author: Deming Wang Date: Mon May 13 07:59:56 2024 -0400 scsi: mpt3sas: Add missing kerneldoc parameter descriptions Add missing kerneldoc parameter descriptions to _scsih_set_debug_level(). Signed-off-by: Deming Wang Link: https://lore.kernel.org/r/20240513115956.1576-1-wangdeming@inspur.com Signed-off-by: Martin K. Petersen commit 6c3bb589debd763dc4b94803ddf3c13b4fcca776 Author: Saurav Kashyap Date: Wed May 15 14:41:01 2024 +0530 scsi: qedf: Set qed_slowpath_params to zero before use Zero qed_slowpath_params before use. Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240515091101.18754-4-skashyap@marvell.com Signed-off-by: Martin K. Petersen commit 78e88472b60936025b83eba57cffa59d3501dc07 Author: Saurav Kashyap Date: Wed May 15 14:41:00 2024 +0530 scsi: qedf: Wait for stag work during unload If stag work is already scheduled and unload is called, it can lead to issues as unload cleans up the work element. Wait for stag work to get completed before cleanup during unload. Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240515091101.18754-3-skashyap@marvell.com Signed-off-by: Martin K. Petersen commit 51071f0831ea975fc045526dd7e17efe669dc6e1 Author: Saurav Kashyap Date: Wed May 15 14:40:59 2024 +0530 scsi: qedf: Don't process stag work during unload and recovery Stag work can cause issues during unload and recovery, hence don't process it. Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240515091101.18754-2-skashyap@marvell.com Signed-off-by: Martin K. Petersen commit 9fad9d560af5c654bb38e0b07ee54a4e9acdc5cd Author: Justin Stitt Date: Wed May 8 17:22:51 2024 +0000 scsi: sr: Fix unintentional arithmetic wraparound Running syzkaller with the newly reintroduced signed integer overflow sanitizer produces this report: [ 65.194362] ------------[ cut here ]------------ [ 65.197752] UBSAN: signed-integer-overflow in ../drivers/scsi/sr_ioctl.c:436:9 [ 65.203607] -2147483648 * 177 cannot be represented in type 'int' [ 65.207911] CPU: 2 PID: 10416 Comm: syz-executor.1 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 65.213585] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 65.219923] Call Trace: [ 65.221556] [ 65.223029] dump_stack_lvl+0x93/0xd0 [ 65.225573] handle_overflow+0x171/0x1b0 [ 65.228219] sr_select_speed+0xeb/0xf0 [ 65.230786] ? __pm_runtime_resume+0xe6/0x130 [ 65.233606] sr_block_ioctl+0x15d/0x1d0 ... Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9b ("ubsan: Reintroduce signed overflow sanitizer"). Firstly, let's change the type of "speed" to unsigned long as sr_select_speed()'s only caller passes in an unsigned long anyways. $ git grep '\.select_speed' | drivers/scsi/sr.c: .select_speed = sr_select_speed, ... | static int cdrom_ioctl_select_speed(struct cdrom_device_info *cdi, | unsigned long arg) | { | ... | return cdi->ops->select_speed(cdi, arg); | } Next, let's add an extra check to make sure we don't exceed 0xffff/177 (350) since 0xffff is the max speed. This has two benefits: 1) we deal with integer overflow before it happens and 2) we properly respect the max speed of 0xffff. There are some "magic" numbers here but I did not want to change more than what was necessary. Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/357 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt Link: https://lore.kernel.org/r/20240508-b4-b4-sio-sr_select_speed-v2-1-00b68f724290@google.com Reviewed-by: Kees Cook Signed-off-by: Martin K. Petersen commit 10157b1fc1a762293381e9145041253420dfc6ad Author: Martin Wilck Date: Tue May 14 16:03:44 2024 +0200 scsi: core: alua: I/O errors for ALUA state transitions When a host is configured with a few LUNs and I/O is running, injecting FC faults repeatedly leads to path recovery problems. The LUNs have 4 paths each and 3 of them come back active after say an FC fault which makes 2 of the paths go down, instead of all 4. This happens after several iterations of continuous FC faults. Reason here is that we're returning an I/O error whenever we're encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. [mwilck: The original patch was developed by Rajashekhar M A and Hannes Reinecke. I moved the code to alua_check_sense() as suggested by Mike Christie [1]. Evan Milne had raised the question whether pg->state should be set to transitioning in the UA case [2]. I believe that doing this is correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O errors. Our handler schedules an RTPG, which will only result in an I/O error condition if the transitioning timeout expires.] [1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/ [2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/ Co-developed-by: Rajashekhar M A Co-developed-by: Hannes Reinecke Signed-off-by: Hannes Reinecke Signed-off-by: Martin Wilck Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com Reviewed-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen commit 9f365cb8bbd0162963d6852651d7c9e30adcb7b5 Author: Nathan Chancellor Date: Tue May 14 13:47:23 2024 -0700 scsi: mpi3mr: Use proper format specifier in mpi3mr_sas_port_add() When building for a 32-bit platform such as ARM or i386, for which size_t is unsigned int, there is a warning due to using an unsigned long format specifier: drivers/scsi/mpi3mr/mpi3mr_transport.c:1370:11: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat] 1369 | ioc_warn(mrioc, "skipping port %u, max allowed value is %lu\n", | ~~~ | %u 1370 | i, sizeof(mr_sas_port->phy_mask) * 8); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use the proper format specifier for size_t, %zu, to resolve the warning for all platforms. Fixes: 3668651def2c ("scsi: mpi3mr: Sanitise num_phys") Signed-off-by: Nathan Chancellor Link: https://lore.kernel.org/r/20240514-mpi3mr-fix-wformat-v1-1-f1ad49217e5e@kernel.org Signed-off-by: Martin K. Petersen