文字の一致と、部分一致について

働かない暇人 · #1

ｃ言語を始めたばかりの初心者です。
if文で、文字での一致と部分一致がやりたいのですが、
”男”という文字でif文が反応してくれず、プログラムが終わってしまいます。
sexのつく所に[20]を付けたり消したり試しましたがだめでした。。。
どう改善すればいいでしょうか？
あと、部分一致のやり方もお願いします。

一致できない文

コード:

#include <stdio.h>

int main(void)
{
	char sex[20];
	
	printf("性別を入力してください。\n");
	scanf("%s",&sex);
	
	if(sex == "男" )　　　
	{
		printf("了解\n");
	}
	
	return 0;
	
}

#2

文字列の比較にはstrcmp()を使用すればよいです。
また部分一致にはstrncmp()を使用すればよいと思います。

usao · #3

>scanf("%s",&sex);

ここもだめかと．

#4

コード:

scanf("%19s",sex); /* 正しい。読み込む文字列の最大の長さ(領域の大きさ-終端の分1文字)を指定することで、バッファオーバーランのリスクを減らしている */
scanf("%s",sex); /* 間違っていない。バッファオーバーランのリスクがあるが、仕様内の入力なら問題ない */
scanf("%s",&sex); /* 間違い。警告が出ることがあるし、未定義動作になりそうな気がする[要出典]が、動いてしまうことが多い */
scanf("%s",sex[20]); /* 大間違い。確保した領域の範囲外にアクセスしているし、ポインタとして無効な値が渡されるのでアクセス違反で強制終了になる可能性が高い */
scanf("%s",&sex[20]); /* 大間違い。確保した領域の範囲外にアクセスしている上、アクセス違反が出ないことがあるのでたちが悪い */

#5

働かない暇人さんが書きました:あと、部分一致のやり方もお願いします。

「部分一致」とは具体的にどういう処理でしょうか？
strstr()関数が使えるかもしれません。

通しすがりの猫 · #6

C＋＋で日本語いじった事ないので、参考にだけしてください。

まずC++はカジュアルな文字列処理にはイマイチ向いていません。
うろ覚えですが、C++には文字列という概念がなく、全てをbit sequenceとして扱います。
従って、file encodingとかcompiler parameterとかを基に、コード内の文字列はbit sequenceに変換されて、ハードコードされた数字としてプログラム内部に格納されます。

コンソールから入力がある場合、windows日本語環境だとｃｐ932というencodingで確か入力がされるんですが、ｃｐ932は日本語の全文字種をカヴァーしてないんですよね（あとencodingは幾つもあります）。

一方utf-8を使うと、ウィンドウズのdos promptが文句を言います。さらにutf-8は一つの文字を1byteから4byes（1charから4char）の可変長のメモリー領域を使って表現する規格なので、program内部でutf-8からutf-16に変換して、其々の文字にmy_str[6]のようの形でアクセスで来るようにしないと面倒です（一つの文字の四分の一だけ取得しても使いようが有りません）。

どれも大した事ではないんですが、塵も積もればで面倒です。Linuxだとマシですが、面倒なのは同じです。
外部ライブラリーを使わずに日本語処理をするのは、趣味以外では避けたほうが無難です。貴方が初心者で、文字列処理がしたいなら、pythonかルビーを覚えましょう。どうしてもC++でやりたければ、ライブラリーを探しましょう。
基礎を勉強している段階ではANSI character以外の使用はお薦めできません。

恥ずかしくて見せたくないものですが、勉強用に昔触った時に拾ったり書いたりしたencoding変換用のコードです（Mecab用だったのでsmart_ptrもRAIIも無しの汚いコードですが）：

コード:

#ifndef ECONV
#define ECONV
#include <iostream>
#include <string>
#include <Windows.h>//has kernel32.dll, WideCharToMultiByte

using namespace std;
//author 
//http://sayahamitt.net/utf8%E3%81%AAstring%E5%85%A5%E3%82%8C%E3%81%9F%E3%82%89shiftjis%E3%81%AAstring%E5%87%BA%E3%81%A6%E3%81%8F%E3%82%8B%E9%96%A2%E6%95%B0%E4%BD%9C%E3%81%A3%E3%81%9F/


std::string utf8ToShistjis(std::string &srcUTF8);
std::string shiftjisToUtf8(std::string& srcSjis);
std::string eucjpToShiftjis(std::string& s);

#endif

コード:

#include "stdafx.h"
#include "EConv.h"
std::string utf8ToShistjis(std::string &srcUTF8){
	//Unicodeへ変換後の文字列長を得る 終端文字を得るために＋１

	//srcUTF8.size() + 1 is ok because size does not include
	//the terminating null character but -1 is ok too
	int lenghtUnicode = MultiByteToWideChar(
		CP_UTF8, 0, srcUTF8.c_str(), 
		srcUTF8.size() + 1, NULL, 0);
	//when the last arg is 0, this function returns
	//the number of chars (bytes) needed as a buffer to
	//include up to the terminating null character

		
	//必要な分だけUnicode文字列のバッファを確保
	wchar_t* bufUnicode = new wchar_t[lenghtUnicode];

	//UTF8からUnicodeへ変換　終端文字を得るために＋１
	MultiByteToWideChar(
		CP_UTF8, 0, srcUTF8.c_str(), srcUTF8.size() + 1, 
		bufUnicode, lenghtUnicode);

	//ShiftJISへ変換後の文字列長を得る　bufUnicodeは終端文字を含む
	int lengthSJis = WideCharToMultiByte(
		CP_THREAD_ACP, 0, bufUnicode, 
		-1, NULL, 0, NULL, NULL);

	//必要な分だけShiftJIS文字列のバッファを確保
	char* bufShiftJis = new char[lengthSJis];

	//UnicodeからUTF8へ変換
	//1: code point, 2:mode of conversion
	//3: pointer to be convereted
	//4: buffer size but -1 for a null terminated string
	//5: buffer into which to write,
	//6: target buffer size, if 0, returns the size buffer should be
	WideCharToMultiByte(
		CP_THREAD_ACP, 0, bufUnicode,lenghtUnicode, 
		bufShiftJis, lengthSJis, NULL, NULL);

	std::string strSJis(bufShiftJis);

	delete [] bufUnicode;
	delete [] bufShiftJis;

	return strSJis;
}


std::string shiftjisToUtf8(std::string &srcSjis){
	int numCharacters = srcSjis.length()+1;
	//Unicodeへ変換後の文字列長を得る

	int lengthUnicode = MultiByteToWideChar(
		CP_THREAD_ACP, 0, srcSjis.c_str(), -1, NULL, 0);
	//when the last arg is 0, this function returns
	//the number of chars (bytes) needed as a buffer
	//to include up to the terminating null character	

	//必要な分だけUnicode文字列のバッファを確保
	wchar_t* bufUnicode = new wchar_t[lengthUnicode];

	//ShiftJISからUnicodeへ変換
	MultiByteToWideChar(
		CP_THREAD_ACP, 0, srcSjis.c_str(), srcSjis.size()+1, 
		bufUnicode, lengthUnicode);


	//UnicodeからUTF8へ変換
	//1: code point, 2:mode of conversion
	//3: pointer to be convereted
	//4: buffer size but -1 for a null terminated string
	//but on windows, not all strings are null terminated
	//5: buffer into which to write,
	//6: target buffer size, if 0, returns the size buffer should be
	int lengthUTF8 = WideCharToMultiByte(
		CP_UTF8, 0, bufUnicode, -1, NULL, 
		0, NULL, NULL);

	//必要な分だけUTF8文字列のバッファを確保
	char* bufUTF8 = new char[lengthUTF8];

	//UnicodeからUTF8へ変換
	//1: code point, 2:mode of conversion
	//3: pointer to be convereted
	//4: buffer size but -1 for a null terminated string
	//5: buffer into which to write,
	//6: target buffer size, if 0, returns the size buffer should be
	WideCharToMultiByte(
		CP_UTF8, 0, bufUnicode, lengthUnicode, bufUTF8, 
		lengthUTF8, NULL, NULL);

	std::string strUTF8(bufUTF8);

	delete [] bufUnicode;
	delete [] bufUTF8;

	return strUTF8;
}

std::string eucjpToShiftjis(std::string &s){
	//from eucjp
	char *c = &s.front();
	int len = strlen(c);
	// get the table, the first bytes fir for columns, 
	//the second for rows
	for (char* it = c; it < c + len; ++it){
		it -= 10;
	}
	for (char*it = c; it < c + len; it += 2){
		//hell, shift-jis requires different calculations
		//for the first and second bytes! And then they are 
		//multiple of 2 and not multiple of 2!!
		*it = (*it < 63) ? (*it / 2) + 129 : (*it / 2) + 177;

		char *sit = it + 1;
		if (!*sit % 2){
			*sit + 224;
		}
		else{
			*sit = (*sit < 63) ? (*sit / 2) + 63 : (*sit / 2) + 64;
		}
	}
	return s;
}

追記：
やったこと無いんで自信はないんですが、ｃ＋＋のchar pointerならキャストしてintやlongとして比較できますよ（でもintやlong, charの長さってplatform毎に違ったかも）。そう言えば、日本語って、8ビットのchar型には収まらなかった気がするので、漢字が何ビットになってるか調べたほうがいいかも。多分漢字がchar2個くらいに分割されて入ってるんじゃないかな。

通しすがりの猫 · #7

Cって見てませんでした、スイマセン。もう寝ます。

#8

コード:

scanf("%19s%*[^\n]%*c", sex);

か

コード:

fgets(sex, 20, stdin);

である必要があります。

働かない暇人 · #9

＆を除き、strstrを引用したらできました。
皆さんの回答感謝です。。。

		4月 2024
日	月	火	水	木	金	土
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

文字の一致と、部分一致について

文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について

Re: 文字の一致と、部分一致について